Sorting and Grouping – organizing data with LINQ
Last week I introduced LINQ from the perspective of a C# game developer completely unfamiliar with the framework. Today I would like to continue exploration of LINQ by focussing on a particular set of its functionality: methods to arrange and organize data.
In particular we will look into how we can sort and group our collections of items.
Sorting with OrderBy() and OrderByDescending()
Sorting is a feature of any good framework dealing with collections. LINQ is no exception.
The two LINQ extension methods we can use for sorting are OrderBy()
and OrderByDescending()
. They do the exact same thing, except that they sort the given sequence in ascending (1, 2, 3) and descending (3, 2, 1) order respectively. To avoid redundancy we will only look at the former here.
OrderBy()
has two overloads with the following signatures:
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector
)
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer
)
Both take the input sequence source
, which can be any implementation of IEnumerable<TSource>
, and a keySelector
function, that specifies by what key the items are to be sorted – more on that in a second.
The return type of both methods is IOrderedEnumerable<TSource>
which is essentially the same as IEnumerable<TSource>
, except that it guarantees that the contents are sorted. While this is an important detail, it is not something we usually will have to care about. For the purposes of this post we will ignore it and be satisfied with the fact that the method returns another sequence.
The difference between the two overloads is that the second takes an IComparer<TSource>
as third parameter. This object can be used to replace the default comparing behaviour of the sorting algorithm. Again, illuminating the possibilities with this parameter will be the topic of a future post.
An example
To understand both how to use these methods, and also how to use the keySelector
function, let us use an example.
Imagine we have a simple class with width
and height
properties representing a rectangle.
class Rectangle
{
public int Width { get; private set; }
public int Height { get; private set; }
public Rectangle(int width, int height)
{
this.Width = width;
this.Height = height;
}
}
Now, let us say we are given a list of rectangles of random sizes, and we would like to sort them by their width.
Using OrderBy()
we can do so as follows.
var rectanglesByWidth = rectangles.OrderBy(r => r.Width);
Remember that OrderBy()
is an extension method, so by calling it like a member method on rectangles
, the input collection is automatically passed as first parameter to the method.
Where it becomes interesting is the lambda function r => r.Width
that we specify as the keySelector
function. This function takes a single rectangle, and returns its width – which in this context is taken as the key for the sorting algorithm.
OrderBy()
sorts the input by the key, which means that our line of code will sort the list of rectangles by their width.
I am sure you can see how the key selector is necessary, since otherwise the computer would have no idea how rectangles are supposed to be sorted.
In addition, this gives us virtually unlimited freedom to sort our collection – all without having to write a single line of actual sorting algorithm code.
For example, we could easily sort our rectangles by their area as follows.
var rectanglesByArea = rectangles
.OrderBy(r => r.Width * r.Height);
Note that similar to the Select()
and Where()
methods we discussed last week, these sorting methods use deferred execution. That means that the lines in our examples above do not actually enumerate the input collection, until we start enumerating the output.
That means that changes to the input collection before enumerating the output will be reflected, even if they are made after our lines of code above.
Similar to the other methods, we can think of OrderBy()
as returning a query, instead of a collection.
Grouping with GroupBy()
Another common application when trying to organize large amounts of data is the need to group them by some property – meaning to extract exclusive sub-sets of items such that the items in each sub-set share some property.
For example, we may want to group our rectangles in groups in such a way that each group or sub-set contains rectangles of a specific width.
We can do this easily by using the GroupBy()
extension method.
This method has eight different overloads, so I will refrain from listing them all here. Instead we will look at the simplest of them.
public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector
)
As you can see, this method again takes an input sequence, and a key selector function as inputs and returns a sequence (IEnuerable<>
) of IGrouping<TKey, Source>
. Each of these groupings represents one of our groups, such that all elements inside it share the same key, as specified by the key selector function.
For example, to split our list of rectangles into a list of groups of same width, we can do:
var groupsWithSameWidth = rectangles.GroupBy(r => r.Width);
Again, the use of a lambda function means that we can group our items by any property – or combination of properties – imaginable.
Note that this function also uses deferred execution, which means that it will only start grouping the input items when we actually enumerate its output, and will represent changes to the input collection appropriately.
Immutability of input collections
Since both methods presented here – similar to many others of the LINQ framework – return objects more akin to queries than actual collections, they do not modify the input collection in any way.
This means that our original list of rectangles will be in the same order, even after we call OrderBy()
on it, and even after we enumerate the method’s output.
This immutability is a key feature of LINQ, which often makes it much easier to reason about our code, and allows for things such as multi-threaded access to collections without having to worry about potential problems caused by mutating collections.
In fact, the IEnumerable<T>
interface is a pure read-only interface that only allows to enumerate the contained elements, but has no way of adding, deleting, or otherwise modifying items.
Conclusion
This post has given an overview of how to use LINQ to sort and group sequences.
I hope that I was able to illustrate how easily LINQ can perform these operations, and how their flexibility – mainly through the use of lambda functions – can be used to arrange collections in almost any order.
So far – in our series on LINQ – we have only looked into extension methods returning deferred IEnumerable<T>
. While this is a great start – and arguably the core of LINQ – we have not yet discussed how to store our modified collections, other than in their query representation. How to do this will be the topic for a future post.
Make sure to leave a comment if this post was useful for you, or if there are other areas of LINQ you would like me to cover soon.
Enjoy the pixels!
Reference: | Sorting and Grouping – organizing data with LINQ from our NCG partner Paul Scharf at the GameDev<T> blog. |