Comparing Asynchronous vs Synchronous Program Execution Time in .NET

Most of today's computing devices already have built-in multiple CPU cores. Multiple CPU cores open up the possibility of a device to run multiple processes at the same time. Those tasks might be in a form of several different applications running at the same time or a single application running several logical operations at the same time. Developers could utilize Task Parallel Library (TPL) to manage parallel and asynchronous processes in .NET. This article will not provide technical details of how TPL can be used, but instead, giving some perspective of how much performance benefits we could gain from TPL to significantly reduce program execution time.

Do more CPU cores means faster execution time for every program?

It has often become a marketing buzzword that the more CPU-core of a computing device has, the more powerful and faster it is. But it's not always the case. It's sure one of the most contributing factors for how fast a program may perform, but it's just part of the story. Not all code that we've written will then instantly run faster on a multi-core device, it's the responsibility of the developers to take that advantage or not. In fact, more often some developers are not even aware that they have those options on the table. A properly designed code utilizing parallel programming on a dual-core device may run significantly faster than a poorly designed code which runs on a device with 5 or 7 CPU cores where all of the codes run sequentially. How is that possible?

Use Resource Monitor to view each of your CPU core workloads

You can open Resource Monitor window on your PC to see the total and each CPU core usage on your system. As you can see from the screenshot above, my PC has a total of 7 CPU cores with each one of them has a different workload. A program which doesn't utilize TPL may only be utilizing the main CPU (CPU 0) while leaving the other cores idle.

Point understood. Should we use TPL all over our code then? So that we can have the best performance with the least execution time?

The answer is no. Never assume that by going parallel you will have a faster code. In certain cases, a parallel process might run slower than its sequential equivalent. We will look at some of those examples and explain why such thing could possibly happen.

ExecutionTimeStopwatch Application

To demonstrate and compare how synchronous (sequential) and asynchronous (parallel) execution type might affect time performance, I have created a WPF application called ExecutionTimeStopwatch.exe. The purpose of the app is very simple; it will inform how much time it needs to actually complete the tasks based on user preferences. 

You can download the app executables from this link.

You have several options to choose:

  • How many iterations you would like to perform
  • Which execution type to perform: synchronously or asynchronously
  • Which task workload to perform: heavy, medium, or easy

Task Workload Definition

A task is a piece of work to be done or undertaken. In software development context, a task is a set of instructions or code to be executed. It can be anything from long-running background task to a simple task consisting only a few lines of code. For this application, I just want to make the CPU busy, so I implemented nested loops to simulate task workload. 

CancellationTokenSource cts;
CancellationToken token;
string currentTime = string.Empty;
long max, current;

const int HEAVY_TASK_WORKLOAD = 1000000;
const int MEDIUM_TASK_WORKLOAD = 1000;
const int EASY_TASK_WORKLOAD = 10;

enum ExecutionType
{
   Synchronous,
   Asynchronous
}

Task DoHeavyWork(long iteration, ExecutionType executionType)
{
   max = iteration;

   return Task.Factory.StartNew(() =>
   {
      if (executionType == ExecutionType.Synchronous)
      {
         for (long i = 0; i < iteration; i++)
         {
            for (long j = 0; j < HEAVY_TASK_WORKLOAD; j++) { }
               ++this.current;
               if (token.IsCancellationRequested) break;
         }
      }
      else
      {
          Parallel.For(0, iteration, ((index, state) =>
          {
             for (long j = 0; j < HEAVY_TASK_WORKLOAD; j++) { }
             Interlocked.Increment(ref current);
             if (token.IsCancellationRequested) state.Stop();
          }));
      }
   }, token);
}

Task DoMediumWork(long iteration, ExecutionType executionType)
{
   max = iteration;

   return Task.Factory.StartNew(() =>
   {
       if (executionType == ExecutionType.Synchronous)
       {
           for (long i = 0; i < iteration; i++)
           {
              for (long j = 0; j < MEDIUM_TASK_WORKLOAD; j++) { }
              ++this.current;
              if (token.IsCancellationRequested) break;
            }
         }
         else
         {
            Parallel.For(0, iteration, ((index, state) =>
            {
               for (long j = 0; j < MEDIUM_TASK_WORKLOAD; j++) { }
               Interlocked.Increment(ref current);
               if (token.IsCancellationRequested) state.Stop();
            }));
          }
     }, token);
}

Task DoEasyWork(long iteration, ExecutionType executionType)
{
   max = iteration;

   return Task.Factory.StartNew(() =>
   {
      if (executionType == ExecutionType.Synchronous)
      {
         for (long i = 0; i < iteration; i++)
         {
            for (long j = 0; j < EASY_TASK_WORKLOAD; j++) { }
            ++this.current;
            if (token.IsCancellationRequested) break;
          }
       }
       else
       {
          var loopResult = Parallel.For(0, iteration, ((index, state) =>
          {
             for (long j = 0; j < EASY_TASK_WORKLOAD; j++) { }
             Interlocked.Increment(ref current);
             if (token.IsCancellationRequested) state.Stop();
           }));
        }
     }, token);
 }

As you can see from the code snippet above, I define heavy workload as doing 1 million loops, medium workload as doing 1 thousand loops, and easy workload as doing as few as ten loops. The application will perform those loops multiplied by iteration value submitted by the user. The application will either execute loop with the conventional for-loop or Parallel.For loop based on user's selected execution type.

So, for example, if a user enters 1000 for the iteration value, asynchronous for the execution type, and heavy for the task workload, the application will execute 1.000 x 1.000.000 = 1.000.000.000 loops asynchronously.

Execution Time Comparison Based on Task Workload

Below are the charts illustrating the time needed for completing tasks based on task workload type.

As you can see, TPL significantly reduces execution time for heavy task workload. Doing 80.000 heavy task synchronously requires about 50 seconds to complete while it only requires 12 seconds asynchronously. That's 76% decrease rate in execution time and over 400% increase in the amount of task completed per second!

For the medium task workload, we realize that there aren't any significant differences with the heavy task performance. 80 million task requires about 51 seconds to complete synchronously and 13 seconds to complete asynchronously, reducing execution time by 74% and almost 400% increase in the amount of task completed per second

The problem starts to arise when you are using TPL for doing some very light tasks. Executing 500 million easy task synchronously requires just about 4 seconds while it requires a whopping 64 seconds to do it asynchronously. Strange isn't it? Why it requires 15x times longer for doing easy tasks asynchronously? 

Context Switching 

It happened because of a term called "Context Switching". TPL does the task and threading management for you. It's not guaranteed that every task will have its own dedicated thread to execute. Some might run on the different thread some might not. But it will absolutely have some threads running in parallel, and CPU needs time to queue, process, and switch between those threads. In this case, the CPU spends too much time handling and managing the threads than to complete the tasks.

When to and not to Use Task Parallelism?

Below are some of the tips of when to use and not to use TPL on your code.

Use TPL when:

  • You are sure that your program will be deployed mostly on devices with multiple CPU cores
  • You are executing a lot of heavy and time-consuming tasks
  • The tasks can be executed independently of each other
  • The tasks don't access shared memory locations very frequently

Do not use TPL when:

  • Your program will be deployed mostly on devices with single CPU core
  • You are executing very light tasks
  • The tasks are dependent on each other
  • The tasks frequently access shared memory locations

I recommend you to read Potential Pitfalls in Data and Task Parallelism on MSDN for other considerations whether to use or not to use TPL.

Source Code

If you want to play around with ExecutionTimeStopwatch, you can download or clone the repository at https://github.com/sangadji/ExecutionTimeStopwatch