What you don’t know about sorting algorithms

This article is part of the sequence The Basics You Won’t Learn in the Basics aimed at eager people striving to gain a deeper understanding of programming and computer science.

Last time, we delved into bitwise operations. This time, we will look at a more high level computer science concept – algorithms.

When we first get introduced to algorithms, we normally start with learning sorting algorithms. In comparison to other algorithms, they are easier to grasp. And if we pay attention in class, we will do a good job at understanding them. However, what we don’t learn in these classes is when can they be useful.

When I first got introduced to sorting algorithms, I learnt the implementation details and the run-time complexities of the different algorithms. By the end of the lesson, the lecturer said “Nowadays, Quicksort is most often used. We use the other ones only for learning purposes”. So that was when I started wondering when can all the other algorithms we learnt be useful.

The purpose of this article is to reveal some useful properties of those condemned as useless algorithms, which can prove to be useful in a real environment.

The popular sorting algorithms

First things first. How do we actually implement these algorithms? In this article, I am not going to focus on implementation details and code, but rather, we shall focus on the general idea of the different algorithms.

However, if you do not feel confident, or you have forgotten how all the algorithms we are going to discuss today work, check this out.

Normally, we classify algorithms as the slow ones – Selection, Insertion and Bubble sort. And the fast ones – Quick, Merge, Heap sort.

The slow ones are slow, because their run-time complexity is O(N*N). In comparison, the fast ones have a complexity of O(NlogN). Comparing the fast ones, however, it seems that Quicksort is the favorite since Merge sort requires additional memory for doing its magic and Heapsort needs a preprocessing before performing the actual operation. Quicksort does a fast sorting and it does it in-place. That is why, most standard libraries of modern languages use Quicksort.

And truly, for most cases, Quicksort will do a wonderful job. But when can it fail?

Quicksort – When is it slow?

The big drawback of Quicksort is that it relies on choosing a good pivot. Which means that is it based on luck.

So, if you always choose the first element as pivot, an already sorted list will be a worst case scenario as you will always partition into a 1 element left side. If you choose the last element pivot, the case is similar. That is why, one of the often used techniques when dealing with Quicksort is to pick 3 elements and choose the most appropriate for a pivot in order to avoid these worst case scenarios. Although this reduces the probability of a very bad case for the algorithm, it is still based on randomness.

This is the reason why an airplane software won’t use Quicksort as it can work perfectly well in most cases, but there might be a pretty rare case that would run extremely slow and that is enough to be a concern for an airplane software. Embedded systems avoid using Quicksort for the same reason as well.

That being said, there are situations in which some of the other fast sorting algorithms are to be preferred. That is, Merge sort and Heap sort. But for now, we are excluding the simple ones out. When can they be useful?

When can the slow sorting algorithms be fast

Let’s now delve into the simple sorting algorithms. The popular belief is that the only reason to use these algorithms is when you are doing something quick and dirty. The only exclusion to this is the Insertion sort as it tends to be fast for small data. Check out this stackoverflow answer, for example.

But these algorithms hold some pretty useful properties. There are some real-world situations in which they can prove to be extremely useful.

Selection sort

This is one of the first sorting algorithms we learn. It has the O(N*N) complexity.

However, this algorithm has the following interesting property: if you run it N times on an array of data, it will always get the first N elements of the final sorted array. Where can you apply this?

Open google and search for cats. What you get is around 10 results in one page, but in the upper left corner it says that a total of 700 000 000 results were found. And still, it ran in less than a second. Now, I am sure you don’t think that google bothers to sort all the 700 million cat records before showing you a page of 10 cats? Of course not! What you need is the first 10 entries. And you can run selection sort only 10 times over the array and get your result.

A side note, however, is that there are some standard solutions for the described problem above. This is one of the solutions, but there are better ones when you want to extract more than just 10 elements.

This property is pretty simple, but as you see – useful. Neither of the fast algorithms can get the same results for this problem.

Insertion sort

There is some magic in this algorithm as well.

First of all, this is the algorithm which deals best with an already sorted array. That is the best case scenario for an insertion sort and a worst case for quicksort. So, whenever you think that your data set is almost sorted, Insertion sort can be useful.

Apart from that, it has the property of keeping all the elements you have already traversed sorted. If you run it on the first 10 elements of an array, they will be sorted. However, in comparison with selection sort, they will not be the first 10 sorted elements of the whole array.

This, however, can be extremely useful whenever you have a problem for sorting elements which you receive from a stream of data.

If you are constantly receiving elements from some kind of data stream and you want to keep them sorted real-time, then Insertion sort is a great fit. If you just run the quicksort every time you get a new element from the stream, then you will always get the worst case scenario for it.

Bubble sort

This one holds the same property as selection sort but the difference is, that it gives you the first K elements last (not first). With a little bit of customization, though, you can make it behave similarly to selection sort.

Apart from that, I have seen a work candidate being tested with implementing bubble sort. The reason being – it is so simple, that everyone should know how to implement it.

Other considerations

There is one more notable property important for sorting algorithms other than getting the array in sorted order. That is to not reorder elements with identical keys. This property is called Stable sort. Quicksort, sadly, isn’t a stable sort.

You have a data set of People objects and you want to sort it by age. You use a non-stable sort algorithm such as Quicksort. If in the initial data set, there is a James, who is before Charles and they have the same age, it is not guaranteed that their order will be preserver after applying the algorithm. In cases, in which this is a required constraint, a stable sort algorithm should be used. Among the fast algorithms, Merge sort is the only stable sort option. All of the slower sorting algorithms are stable sort (As long as you implement them that way).

There are also some sorting algorithms, which can be even faster than O(NlogN). However, they can be used only when certain limited conditions are met. These are counting, bucket and radix sort. It’s worth to give them a look.

Conclusion

As mentioned earlier, many of the sorting algorithms discussed are underestimated. But with this article, I wanted to show you what are the hidden properties of the neglected ones, which can come in handy. Next time you want to sort a set of data, think about what is the problem which needs to be solved and don’t just blindly use the standard library sorting algorithm.

Next time, we will discuss a topic which is not understood pretty well on first run, but is fundamental not only in programming, but in using the operating system as well – Standard Input/Output.