Monday, January 27, 2014

Introducing NOtherLookup - even better lookups

Recently I've been discussing how awesome is the ILookup data structure provided by .NET Framework as a part of LINQ library. I've looked through some of its key features, but I've also mentioned its few imperfections, like no way to create it directly and no easy way to do any operations on a Lookup as a whole, promising to take a look how to fix it.

So, how to fix it? By using NOtherLookup library I've just pushed to NuGet. It is a set of extension methods and utility classes designed to make Lookup-based operations as easy and as natural as in plain LINQ-to-objects.

Consider for example merging two Lookup instances into a single one that has the keys from both instances and for each key that is present in both Lookups, the union of sets should be produced. When doing it in plain LINQ, we probably need to convert both Lookups to IDictionary<TKey, List<TValue>>, then merge corresponding keys and finally produce a new Lookup manually. With NOtherLookup, we can just use LINQ in the same way as we do on IEnumerable:

var result = firstLookup.Union(secondLookup)

In this example, result is still typed as ILookup<TKey, TValue>, with no more conversions needed. And here is what it does:

firstLookup:
"a" => 1,2
"b" => 3,4

secondLookup:
"b" => 4,5
"c" => 6,7

result:
"a" => 1,2
"b" => 3,4,5
"c" => 6,7

NOtherLookup offers support for several operators known from LINQ. There are some operators that combine two Lookups - besides Union shown above, there is Concat, Except, Intersect, Join and Zip. There are some that manipulate a single Lookup - Select and Where. There is also quite universal OnEachKey method that allow running arbitrary LINQ query on each lookup element, maintaining ILookup type.

ILookup<int, string> transformed = lookup.OnEachKey(g => g.Select(x => x + g.Key).Reverse());

Example:

lookup:
"a" => 1,2
"b" => 3,4

transformed:
"a" => "2a", "1a"
"b" => "4b", "3b"

NOtherLookup contains also a few features that make obtaining ILookups easier. The most important one is Lookup.Builder - a class that allows creating lookup from the scratch (as a solution for the lack of Lookup public constructor), with support for custom comparators, if needed.

ILookup<int, string> lookup = Lookup.Builder
    .WithKey(1, new[] { "a", "b" })
    .WithKey(2, new[] { "c", "d" }) 
    .WithComparer(new CustomIntComparer()) // can be omitted
    .Build();

There is also empty lookup available for your convenience.

Last but not least, there are easy conversions between ILookup and IDictionary available - it's a simple method call in both ways:

ILookup<int, string> lookup = dict.ToLookup();
Dictionary<int, List<string>> backToDict = lookup.ToDictionary();

For the detailed descriptions with examples, see README on GitHub project page, where the code lives. NOtherLookup is licenced with very liberal MIT licence, so feel free to grab the code and use it any way you want.

Sunday, January 12, 2014

The downsides of .NET's Lookups

In the previous post I was admiring the wonderful features of .NET's Lookup - its immutability, null-safeness and clean and readable presence. I've also promised to look for its downsides or pitfalls one may encounter when using it.

Are there any? Well, to be honest, I can't see any serious problems with using Lookups. If you need a collection to be used as a lookup table by some identifier, with no modifications support, with multiple values per key - you won't find anything better suited than ILookup.

There are just two things I find to be a slight impediments in ideal Lookup's world.

The first is that the only way to create an ILookup instance is via LINQ's ToLookup method. Although the framework provides the default public ILookup implementation - a Lookup class - it has no public constructor available. It means that in order to create our lookup, we need to prepopulate some other collection just to convert it later using ToLookup().

The second thing is that the beautiful clean API provided by ILookup<TKey, TValue> goes away immediately if we need to do anything other than just read our lookup. If we need to have a lookup created from existing lookup but with one more value, or have our lookup filtered, or joined with another, or whatever, it breaks into IEnumerable<IGrouping<TKey, TValue>> - and the readability hurts a lot. IGrouping represents a single lookup item which is a values collection with its associated key. Still nice to access the data, but not nice to operate on. There's no public implementation of IGrouping available in the framework - the implementation is internal to LINQ. There's no way to modify the grouping, what is understandable as it is part of immutable ILookup, but there's also no way to create new grouping based on existing one - the workaround is to convert our whole lookup back to some mutable collection like Dictionary<TKey, TValue>, mutate it and then convert to new lookup using the convoluted LINQ projections first, something like this:

var temporaryDictionary = lookup.ToDictionary(x => x.Key, x => x.ToArray());
temporaryDictionary.Add("new key", new[] { "new value" });
var newLookup = temporaryDictionary
    .SelectMany(kv => kv.Value, (kv, v) => new { kv.Key, Value = v })
    .ToLookup(x => x.Key, x => x.Value);

Or alternatively, if we don't mind losing existing key-values mappings and we are able to recreate it once again, we can flatten our lookup to List<TValue>, modify the list and then use the simplest ToLookup() overload.

var temporaryList = lookup.SelectMany(x => x).ToList();
temporaryList.Add("new value");
var newLookup = temporaryList.ToLookup(x => SomethingThatRecreatesKey(x));

No matter which way we choose, it's quite a lot of work for such a simple thing.

Next time I'll try to propose a solution for both the issues. Stay tuned!

Monday, January 6, 2014

Lookup - .NET's hidden gem

.NET Framework has a lot to offer when comes to generic data structures, especially collections. We have several lists, arrays, sets, dictionaries - each of them has its performance and functional characteristics and scenarios where it fits much better than the others. As I'm observing in the project I work in, one of the most often forgotten yet powerful type of collection is Lookup. I think it needs more attention.

ILookup was introduced to the framework as a part of LINQ and is exclusively used in LINQ context, as a result of ToLookup calls. Logically, it is a key-value collection like IDictionary, but it is designed to hold multiple values in a single key. Also, unlike Dictionary<K, V>, it is immutable, what means that after creating the lookup, we can only read it - no modifications are possible.

Generally one may say that ILookup<K, V> is no better than IDictionary<K, IEnumerable<V>> and as such is redundant. But I'd strongly disagree. First of all, let's think how are dictionaries of collections used. From my experience, it's most often used to keep predefined static "dictionaries" that are loaded once and live long to provide easy access to some rarely changing values like what counties are there in each country etc. Well, these are even often called "lookup tables". Note that in most cases we don't ever change the stored values - it is immutable. We replace it as a whole periodically or when something changes. Doesn't it sound exactly like ILookup<K, V> key features discussed above? We should always be using the tools that are best suited for the job to avoid reinventing the wheel and avoid the problems others already solved.

Moreover, ILookup's definition is clearly more readable than nested generics in its dictionary-based equivalent. When I look at the code never seen before and I see ILookup<K, V>, I instantly know that it is the lookup table and I feel its read-only, one-way nature. When I see IDictionary<K, IEnumerable<V>> I need to consider the possibility that it is used as a read-write store for some values and I, as a user of this code, can (or should?) modify that collection. Using Lookup for lookups just makes sense.

IDictionary-based lookups, as mutable data structures may kick us badly. I've once experienced quite an ugly error caused by passing around the plain List<T> read from what was intended to be static lookup shared between users and was based on Dictionary<K, List<V>>. The intention was to pass around the immutable lookup data, but the receiving code actually got mutable list being exactly the same reference that was sitting in the global lookup. The value passed was eventually customized for each user and there were miracles happening to the global lookup. All that would be just impossible with lookups.

With dictionaries that are open for modification and not thread-safe by default, we are also subject to multiple kinds of failures in multi-threaded scenarios. All of it is gone with immutable data structure like ILookup.

And last but not least - null checks. Who likes to litter his code with all that boring null checks? When using IDictionary as a lookup table, every piece of code that consumes its data need to check for key existence first or have a conditional statement with TryGetValue method, because calling dictionary[nonExistingKey] would throw dreadful KeyNotFoundException. Moreover, even if the key exists, the value may still legally be null, so to read the collection contained under the key we need to have two checks:

if (dictionary.ContainsKey(theKey))
{
     var collectionOrPossiblyNull = dictionary[theKey];
     if (collectionOrPossiblyNull != null)
     {
          DoSomethingWithValues(collectionOrPossiblyNull);
     }
}

or, with more compactness but same complexity:

T collection;
if (dictionary.TryGetValue(theKey, out collection) && collection != null)
     DoSomethingWithValues(collection);

ILookup<K, V> was designed totally differently around key existence and null values. It just returns an empty collection for non-existent keys. And thanks to its strictly controlled way of creating (by LINQ's ToLookup method only), we can also be sure that no null collection is possible. And the code above just looks as it should like:

DoSomethingWithValues(lookup[theKey]);

Isn't it beautiful?

Next time I'll try to find some pitfalls or inconveniences of using lookups.