Introduction
Nowadays, all personalcomputer and workstations come with multiple cores.
Most .NET applications failto harness the full potential of this computing power. Even when developers attempt to do so, it isgenerally be means of writing low level
manipulation of threads and locks. Thisoften leads to a situation, where the code becomes either un-readable or fullof potential threats. These threats are often not
detected if running on asingle Core machine.
The task parallel library allows you to write code which is human readable, less error prone, and adjusts itself with the number of Cores available.
So you can be sure that your software would auto-upgrade itself with the upgrading environment.
What kind of Performance Boost are we
talking about?
What is the first thing
that you try to do, when you see parts of your code not performing well. Lazy
load, Linq queries, Optimizing For loops, etc. We often overlook parallelization
in the time consuming independent units of work.
Most often the CPU will
show you the following story during your performance intensive routines.
Shouldn’t your CPU be utilized more like this?
Task Parallel Library
The Task Parallel
Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks
namespaces in the .NET Framework 4.0. The TPL scales the degree of concurrency
dynamically to efficiently use all the cores that are available. By using TPL,
you can maximize the performance of your code while focusing on the work that
your program is designed to accomplish.
The Task Parallel
Library introduces the concept of “Task”.
Task parallelism is the process of running
these tasks in parallel. A Task is an
independent unit of work, which runs within a program. Benefits of identifying
tasks within your system are:
- More efficient and more scalable use of system
resources.
- More programmatic control than is possible with
a thread or work item.
The task parallel
library utilizes the Threads under the hood to execute these tasks in parallel.
The decision and number of Threads to use is dynamically calculated by the
runtime environment.
Why Tasks? Why not threads?
The creation of a
thread comes with a huge cost. Creating a huge number of Threads within your
application also comes with an overhead of Context Switching. In a single core
environment, it might lead to a bad performance as well, since we have a single
core which serves various threads.
The task on the other
hand, dynamically calculates if it needs to create different threads of
execution or not. It uses the ThreadPool under the hood, in order to distribute
the work, without going through the overhead of Thread creation/or un-necessary
context switching if not required.
Fig 1. The time difference between a traditional Thread based approach, and a task based approach.
The following code
snippet shows the creation of parallel tasks using Threads and Task.
You can download the
sample used above.
So how is this different from creating a thread
again? Well, one of the first advantages of using Tasks over Threads is that it
becomes easier to guarantee that you are going to maximize the performance of
your application on any given system. For example, if I am going to fire off
multiple threads that are all going to be doing heavy CPU bound work, On a
single core machine we are likely to cause the work to take significantly
longer. It is clear, threading has overhead, and if you are trying to execute
more CPU bound threads on a machine than you have available cores for them to
run, then you can possibly run into problems. Each time the CPU has to switch
from thread to thread, there is a bit of overhead, and if you have many threads
running at once, then this switching can happen quite often, causing the work
to take longer than if it had just been executed synchronously. This diagram
might help spell that out for you a bit better:
As you can see, if we
aren’t switching between pieces of work, then we don’t have the context
switches between threads. So, the total cumulative time to process in that
manner is much longer, even though the same amount of work was done. If these were
being processed by two different cores, then we could simply execute them on
two cores, and the two sets of work would get executed simultaneously,
providing the highest possible efficiency.
Why Tasks? Why not ThreadPools?
Now when we have a
slight idea of Tasks and their capacity, let us look into these Tasks in a
little more detail and how they are different from ThreadPools.
Let us see how you can
start a new execution on a ThreadPool
Let us see what you
will have to do if you wish to Wait () for the thread to finish.
Messy! Isn’t is?.
What if you have to
wait for 15 threads to finish?
How do you capture the
return values from multiple threads?
How do you return the
control back to GUI thread?
There are answers to
it. Delegates, Raising events but this leads to an error prone situation when
we drill into a chain of multi threaded actions.
Let us see how Tasks
handle this situation elegantly:
Creation of a new Task
Waiting on Tasks:
Execute another Async task when the current task is done:
In real world scenarios, we often have multiple operations which we want to perform asynchronously. Look at the following code snippet and see how you can model it alternatively.
Parallel Extensions:
Parallel extensions have been introduced along with the Task Parallel Library to achieve data Parallelism. Data parallelism refers to scenarios
in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. The .NET provides new constructs to
achieve data parallelism by using Parallel.For and Parallel.Foreach constructs.
Let us see how we can use these:
The above mentioned Parallel.ForEach construct utilizes the multiple cores and thus enhances the performance in the same fashion.
The following graph shows, how parallel extensions improve the performance of the system:
Fig 1.
Matrix multiplication running on a Dual Core machine. The parallel extensions consume less time.
Fig 2. Matrix multiplication running on a Quad Core machine. The same code consume far less time without any modifications
Fig 3. Matrix multiplication running on a single core machine. The execution time remains identical.
You can download the
code from the following link [Download]
Conclusion
The parallel extensions
and the task parallel library helps the developers to leverage the full
potential of the available hardware capacity. The same code can adjust itself
to give you the benefits across various hardware. It also improves the
readability of the code and thus reduces the risk of introducing nasty bugs
which drives developers crazy.