James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 153 , comments - 1267 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

.NET

CSharp

Little Wonders

Little Wonders

vNext

C#/.NET Little Wonders: The ConcurrentDictionary

Once again we consider some of the lesser known classes and keywords of C#.  In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns. 

Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>.  Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!).  Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>.

For more of the "Little Wonders" posts, see the index here.

Recap

As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member.  While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much.

With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization.  This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work.

With .NET 4.0, we get the best of both worlds in generic collections.  A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace.  These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access.  They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms.

Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist.  As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()).

Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here.

ConcurrentDictionary – the fully thread-safe dictionary

The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection.  Obviously, both are designed for quick – O(1) – lookups of data based on a key.  If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go.

Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively.  While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search.

Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking.  Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference.  In such cases locking is completely non-productive.

However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates.  This is where the ConcurrentDictionary really shines!  It achieves its thread-safety with no common lock to improve efficiency.  It actually uses a series of locks to provide concurrent updates, and has lockless reads!  This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have.

ConcurrentDictionary and Dictionary differences

For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences.  Some notable examples of which are:

  • Add() does not exist in the concurrent dictionary.
    • This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd().  It also means that you can’t use a collection initializer with the concurrent dictionary.
  • TryAdd() replaced Add() to attempt atomic, safe adds.
    • Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock.
  • TryUpdate() was added to attempt atomic, safe updates.
    • If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be.  If all these are true, we can update the item under one atomic step.
  • TryRemove() was added to attempt atomic, safe removes.
    • To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock.
  • AddOrUpdate() was added to attempt an thread-safe “upsert”.
    • There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does.  This allows you to make a thread-safe add-or-update.
  • GetOrAdd() was added to attempt an thread-safe query/insert.
    • Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it.  This allows you to get the value if it exists and insert if not.
  • Count, Keys, Values properties take a snapshot of the dictionary.
    • Accessing these properties may interfere with add and update performance and should be used with caution.
  • ToArray() returns a static snapshot of the dictionary.
    • That is, the dictionary is locked, and then copied to an array as a O(n) operation. 
  • GetEnumerator() is thread-safe and efficient, but allows dirty reads.
    • Because reads require no locking, you can safely iterate over the contents of the dictionary.  The only downside is that, depending on timing, you may get dirty reads.

Dirty reads during iteration

The last point on GetEnumerator() bears some explanation.  Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated.  This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly. 

The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update.  Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them.

Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this:

   1: // load up a dictionary.
   2: var dictionary = new ConcurrentDictionary<string, int>();
   3:  
   4: dictionary["A"] = 1;
   5: dictionary["B"] = 2;
   6: dictionary["C"] = 3;
   7: dictionary["D"] = 4;
   8: dictionary["E"] = 5;
   9: dictionary["F"] = 6;

Then you have one task (using the wonderful TPL!) to iterate using dirty reads:

   1: // attempt iteration in a separate thread
   2: var iterationTask = new Task(() =>
   3:      {
   4:         // iterates using a dirty read
   5:          foreach (var pair in dictionary)
   6:          {
   7:              Console.WriteLine(pair.Key + ":" + pair.Value);
   8:          }
   9:      });

And one task to attempt updates in a separate thread (probably):

   1: // attempt updates in a separate thread
   2: var updateTask = new Task(() =>
   3:     {
   4:         // iterates, and updates the value by one
   5:         foreach (var pair in dictionary)
   6:         {
   7:             dictionary[pair.Key] = pair.Value + 1;
   8:         }
   9:     });

Now that we’ve done this, we can fire up both tasks and wait for them to complete:

   1: // start both tasks
   2: updateTask.Start();
   3: iterationTask.Start();
   4:  
   5: // wait for both to complete.
   6: Task.WaitAll(updateTask, iterationTask);

Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6).  However, because the reads are dirty, we will quite possibly get a combination of some updated, some original.  My own run netted this result:

   1: F:6
   2: E:6
   3: D:5
   4: C:4
   5: B:3
   6: A:2

Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered.  Also note that both E and F show the value 6.  This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively).

If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead:

   1: // attempt iteration in a separate thread
   2: var iterationTask = new Task(() =>
   3:                                  {
   4:                                     // iterates using a dirty read
   5:                                      foreach (var pair in dictionary.ToArray())
   6:                                      {
   7:                                          Console.WriteLine(pair.Key + ":" + pair.Value);
   8:                                      }
   9:                                  });

The atomic Try…() methods

As you can imagine TryAdd() and TryRemove() have few surprises.  Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary:

   1: // try add attempts an add and returns false if it already exists
   2: if (dictionary.TryAdd("G", 7))
   3:     Console.WriteLine("G did not exist, now inserted with 7");
   4: else
   5:     Console.WriteLine("G already existed, insert failed.");

TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key:

   1: // attempt to remove the value, if it exists it is removed and the original is returned
   2: int removedValue;
   3: if (dictionary.TryRemove("C", out removedValue))
   4:     Console.WriteLine("Removed C and its value was " + removedValue);
   5: else
   6:     Console.WriteLine("C did not exist, remove failed.");

Now TryUpdate() is an interesting creature.  You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false.  Well, note quite...

It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update. 

If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true.  If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned.

   1: // attempt to update the value, if it exists and if it has the expected original value
   2: if (dictionary.TryUpdate("G", 42, 7))
   3:     Console.WriteLine("G existed and was 7, now it's 42.");
   4: else
   5:     Console.WriteLine("G either didn't exist, or wasn't 7.");

The composite Add methods

The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get.

The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does.  For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed:

   1: public sealed class SubscriptionManager
   2: {
   3:     private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>();
   4:  
   5:     // adds a new subscription, or increments the count of the existing one.
   6:     public void AddSubscription(string tickerKey)
   7:     {
   8:         // add a new subscription with count of 1, or update existing count by 1 if exists
   9:         var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1);
  10:  
  11:         // now check the result to see if we just incremented the count, or inserted first count
  12:         if (resultCount == 1)
  13:         {
  14:             // subscribe to symbol...
  15:         }
  16:     }
  17: }

Notice the update value factory Func delegate.  If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary.  The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated).

Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value.  This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time:

   1: public sealed class PriceCache
   2: {
   3:     private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>();
   4:  
   5:     // adds a new subscription, or increments the count of the existing one.
   6:     public double QueryPrice(string tickerKey)
   7:     {
   8:         // check for the price in the cache, if it doesn't exist it will call the delegate to create value.
   9:         return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol));
  10:     }
  11:  
  12:     private double GetCurrentPrice(string tickerKey)
  13:     {
  14:         // do code to calculate actual true price.
  15:     }
  16: }

There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same.

Oddities with the composite Add methods

The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic.  It is important to note that the methods that use delegates execute those delegates outside of the lock.  This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads.

This is not necessarily an issue, per se, but it is something you must consider in your design.  The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned! 

Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate.  Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary.  Now A is done and goes to get the lock, and now sees that the item now exists.  In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it.

Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int.  And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc).  So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked):

   1: public static class Program
   2: {
   3:     private static int _nextNumber = 0;
   4:  
   5:     // the holder of the char to ordinal
   6:     private static ConcurrentDictionary<char, int> _dictionary
   7:         = new ConcurrentDictionary<char, int>();
   8:  
   9:     // get the next id value
  10:     public static int NextId
  11:     {
  12:         get { return Interlocked.Increment(ref _nextNumber); }
  13:     }

Then, we add a method that will perform our insert:

   1: public static void Inserter()
   2: {
   3:     for (int i = 0; i < 26; i++)
   4:     {
   5:         _dictionary.GetOrAdd((char)('A' + i), key => NextId);
   6:     }
   7: }

Finally, we run our test by starting two tasks to do this work and get the results…

   1: public static void Main()
   2: {
   3:     // 3 tasks attempting to get/insert
   4:     var tasks = new List<Task>
   5:                     {
   6:                         new Task(Inserter),
   7:                         new Task(Inserter)
   8:                     };
   9:  
  10:     tasks.ForEach(t => t.Start());
  11:     Task.WaitAll(tasks.ToArray());
  12:  
  13:     foreach (var pair in _dictionary.OrderBy(p => p.Key))
  14:     {
  15:         Console.WriteLine(pair.Key + ":" + pair.Value);
  16:     }
  17: }

If you run this with only one task, you get the expected A:1, B:2, ..., Z:26.  But running this in parallel you will get something a bit more complex.  My run netted these results:

   1: A:1
   2: B:3
   3: C:4
   4: D:5
   5: E:6
   6: F:7
   7: G:8
   8: H:9
   9: I:10
  10: J:11
  11: K:12
  12: L:13
  13: M:14
  14: N:15
  15: O:16
  16: P:17
  17: Q:18
  18: R:19
  19: S:20
  20: T:21
  21: U:22
  22: V:23
  23: W:24
  24: X:25
  25: Y:26
  26: Z:27

Notice that B is 3?  This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3.  However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded. 

This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time.  As such, it is probably a good idea to keep those generators as stateless as possible.

Summary

The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection.  It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently.

Print | posted on Thursday, February 17, 2011 6:00 PM |

Feedback

Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

I am reading a book on beginning iphone development and at one point they create a dictionary from a plist. What exactly is this dictionary? How does it store information? What are they commonly used for?
3/2/2011 3:18 AM | electrician maine
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

Dictionaries vary based on the language. Since you are talking about iPhone it is most likely objective-C and so you'd need to look up how their Dictionary works in particular.

In .NET, however, the Dictionary is a hashtable which is a O(1) -- that is, constant-time -- lookup container. This means that regardless of how big the container gets, the time it takes to lookup the item in the dictionary is roughly the same.

They are great for storing data where speed of lookup is the most important factor and you don't need the data maintained in any particular order.

There are other dictionaries as well (SortedList, SortedDictionary) but Dictionary is the most commonly used.
3/2/2011 3:32 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

Thanks for a great article!

Could you also please describe how does the this[T key] operator behave in the concurrent dictionary? Is it also atomic, thread-safe. How does it relate to AddOrUpdate() ?

Thanks!
3/7/2011 6:39 AM | Marcin
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

Depends on if you are doing a get or set operation on the indexer.

Using this[T key] as a get operation performs attempts to get the item, and throws if the item doesn't exist.

// if key doesn't exist, throws exception.
var x = myConcurrentDict[key];

Using this[T key] as a set operation performs an AddOrUpdate() which adds the item if it doesn't exist, or updates the value if it does:

// if key exists, updates value, otherwise adds value:
myConcurrentDict[key] = x;
3/7/2011 10:22 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

How about another interpretation of why B holds the value 3?

Both threads saw that A did not exist and called the generator. One thread got back 1 and the other got back 2. The one that generated the 1 won and the 1 was inserted and the 2 got discarded.

So the next number generated for B was 3.
3/21/2011 11:30 AM | Jo
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

@Jo:

Yep, that's another possibility.
3/21/2011 1:32 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

very informational read, thanks!
7/25/2011 5:18 AM | Rowan
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

"The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic. It is important to note that the methods that use delegates execute those delegates outside of the lock. This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads.

This is not necessarily an issue, per se, but it is something you must consider in your design. The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned! "

Thanks for the informative article.

I think what you say above can be a major issue if a ConcurrentDictionary is being used to hold accumulated values for an aggregation. The simple example you show is one such case. If the function that produces the new value is dependent on the current value, it can return an erroneous result,

Of course, that is not as bad as the fact that this is not formally-documented in any place I've seen thus far.
3/1/2012 1:47 AM | caddzooks
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

@caddzooks: It is an issue in the case that you mention, but not one that can't be surpassed. The designers of the concurrent collections decided that in those cases it's up to the user of the concurrent collection to synchronize their logic if it needs synchronization (for example by using an Interlocked increment/etc or lock mechanism).

So basically, the AddOrUpdate()/GetOrAdd() give you great performance, and if you don't mind two processes creating an item at the same time, one of which may be ignored, you can use them as is. If that is an issue, you can synchronize it.

But yes, it can be quite confusing :-)
3/2/2012 10:10 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

The AddOrUpdate and GetOrAdd with delegates are completely ill-designed. PERIOD. The delegate must execute inside the lock, i.e. the whole point of having such functions is to provide the ATOMICITY, otherwise I end up doing everything myself, without using these containers.

There is NO advantage in finding scenarios where they are useful the way they are ... in all concurrency scenarios such fundamental functions MUST ensure atomicity. The fact that I may do bad things and dead-lock myself cannot be an explanation for an ill-designed implementation. And even if some smart-ass at MS thought that there is greatness in using this behaviour, he/she has got it plain wrong, and in my opinion - if you're in charge of designing concurrent containers and you end up with a GetOrAdd function which doesn't compute the value inside the lock, then they're not up for that job! PERIOD.

And yes I can dead-lock myself ... but that's MY JOB to ensure that I know to use a concurrent container. Using Lazy<T> I can dead-lock, using probably 99% of .NET code I can dead-lock myself. One CANNOT remove the need for knowledge by designing such kind of function. It's of no use to anyone!!!

Similarly, using the functionality as is can lead to arguably bigger mistakes. This is one - the value computed depends on the container key - it reads from database based on this key and caches the results. Using GetOrAdd leads to invoking the delegate every single time, so it reads from database every single time. And yes it may take a second to read from DB, but THERE IS NO OTHER WAY TO AVOID that second. I cannot use GetOrAdd now, I'll still have to lock the container myself and every access to the container must wait until the DB read is done. THERE IS NO WORKAROUND that. And instead, this fundamental function now draws the whole container useless to me.

But looking at how often the delegate executes (I see it ALL the time), I have a feeling that the delegate is actually called FIRST to prepare the value, before the lock is taken. And this leads me to conclude that the whole implementation IS A fucking BUG - stop looking for smart reasons to explain higher purposes some bonkers MS programmers had when they've done this.

Wish I'd have time to explain HOW WRONG and HOW MANY BAD DECISIONS can be taken by inexperienced programmers when dealing with such ill-conceived interfaces - exactly those inexperienced programmers they are trying to protect.
1/6/2013 10:55 AM | Florin Neamtu
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

Thanks for the article however, it was good to know that I'm not alone wondering around the stupid GetOrAdd implementation (sorry, I just can't use your polite "odd" attribute in this case).

People might consider casting a vote to change this behaviour or for a new overload here: http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2535774-additional-overload-to-system-concurrent-concurren
1/6/2013 11:01 AM | Florin Neamtu
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

I said earlier that the delegate appears to be called every single time GetOrAdd is called. Apparently this is true but only if the delegate captures an external variable!!!

If the delegate is written like below (without capturing the external variable), it appears to work as expected, i.e. it executes only when [key] doesn't exist in dictionary.

CachedValue = concurrentDictionary.GetOrAdd(key, k => computeCachedValueFromDB(k));

As I don't know the real implementation, probably the chance to execute the delegate multiple times still exists. But it's clear it many of my tests that when the delegate doesn't capture variables it never executed multiple times for the same key!

Would the implementation check whether the delegate captured external variables and if so will execute it outside the lock to minimize the chance for side effects? And eventually execute it inside the lock when it doesn't capture external variables?

I can't explain this really weird behaviour ... but I'm not bothered any more, I can't rely on it, I will do my own locking around a regular dictionary. This GetOrAdd behaviour makes the whole container practically unusable to me.
1/6/2013 11:30 AM | Florin Neamtu
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

@Florin:

Disagree partially.

a) it's not a bug, it was a design decision and obviously with *any* design decision there are trade-offs to be considered.

b) AddOrUpdate()/GetOrAdd() - you argue the whole point is for atomicity, this is partially correct. The whole point is for atomicity of the get or add, not for the generation of the item. The ConcurrentDictionary is built to maximize concurrency.

c) inexperienced programmers working with concurrency are going to have issues regardless even beyond the scope of ConcurrentDictionary. Arguing that CD should be simplified to protect inexperienced developers is problematic. The documentation clearly states the delegates are executed outside of the lock, and that any additional synchronization should be implemented by the user.

Now, where I *do* agree is that an overload in this case may be useful. For example, if you look at the Lazy<T> constructor, it gives you a thread-safety option you can use whether you wish to making publishing, reading, or both publishing and reading of the lazy instance, something similar could be done with CD so that you have a construction option or parameter on GetOrAdd()/AddOrUpdate() to execute the delegates under lock as well.

That would be a useful enhancement that would allow for what you are looking for, yet also keep the default behavior the same for existing uses.

Finally, keep in mind that in many concurrency models, creating a duplicate may be preferable to holding a lock too long. So once again, these were probably many design decisions and back-and-forth discussions that led to the current implementation. I doubt this was done blindly or without regard to its consequences.
1/6/2013 11:43 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

Maybe they did spent time thinking, but their solution is still unworkable :)

And well, if the container's designer wishes to say that "I don't care how you compute the values" then don't provide the delegate overload. GetOrAdd taking the value only is the perfect way of saying that. But if the delegate overload is provided, the function must guarantee it executes inside the lock. Otherwise, how can I use it safely? I spent already some time debugging the issue and trying to understand why my delegate executed always, then I spent more time trying to understand how I can actually use it the way it is designed ... and I end up not using it at all - simply because the code resulting from using it became so hard to read and unmaintainable as it depends on the foreknowledge the reader needs about the internal implementation of this function.

I'm still puzzled however by the behaviour I observed earlier - if the delegate doesn't capture variables it appears to never execute multiple times for the same key. If the decision to execute the delegate inside the lock is taken based on whether the delegate captures variables, then this is even worse ... as again, in order to use it safely one must have knowledge about the actual implementation. Which is really bad decision for a function called GetOrAdd on a concurrent container.

Regards!
1/6/2013 1:09 PM | Florin Neamtu
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

@Florin:

Still disagree completely, the solution works fine for what it does: optimizes concurrency with minimal locking. To lock as you propose by default would negate the whole original design purpose of the CD.

And again, no it was not a bad design decision. The documentation states quite clearly not to expect synchronization on the delegate, so if it occasionally behaves as if synchronized, but doesn't in other circumstances, that would fit that model.

I'd have to look at the source and try an experiment to see what you are saying happens, I can't think of why it would behave any differently offhand.

Regardless, I've used CD with great success *including* the GetOrAdd() and AddOrUpdate() methods and it has increased the throughput quite a bit.

1/6/2013 1:46 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

I would really like a 2nd implementation of ConcurrentDictionary, where the delegates are atomic.

The biggest problem with the current implementation is the bad documentation for ConcurrentDictionary. The non-atomic comments for AddOrUpdate is not part of the documentation of AddOrUpdate, it is at the bottom. I remember when MS had good documentation, but that era ended with .NET.

My values are objects, where AddOrUpdate update attributes of that object. In this case, the current implementation of AddOrUpdate is useless. Resharper and StyleCop can probably add warning saying that if you use AddOrUpdate for cases where the value is a class, the program will not do what you expect.

However, the solution is rather easy, since I am used to functional programming languages, MS should say that the delegates should be side-effect free.

So I changed my program so that the AddOrUpdate-update-delegate returns a NEW object each time, where the old attribute values are copied into the new one + the changes of the ones I want to change.

Now it works as expected. The GC will have a little more work.

However,
8/12/2013 9:33 PM | Mattias
Gravatar

# re: C#/.NET Little Wonders: The ConcurrentDictionary

@Mattias: I can agree with having a choice in the matter. Kind of like how the Lazy<T> class gives you the option of how thread-safe you want the lazy construction to be.
8/26/2013 7:55 PM | James Michael Hare
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: