# Modern Maintainable Code

## Code reuse series: Two fundamental implementations for one conceptual task (solution)

Introduction:

Today we will be answering the fundamental question: When writing code that can be instantiated with many types, how do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"?

In particular, we'll be writing two overloads of a function: one that is O(1) time whenever it is given a type that supports random access iterators, and another that is O(N) that works with any iterator and selecting between them with something called tag dispatch. This is the optimization technique that is leveraged in the STL when writing algorithms like std::binary_search. In particular: std::binary_search will work with any forward iterator, but runs much faster for random access iterators. I actually already wrote about how this optimization technique is used in std::copy so that it leverages std::memcpy internally for efficiency whenever it is safe to do so.

We will be enhancing a function we wrote in the last article of this code reuse series. The function is getNthElement, which returns the element at position N of a container (not to be confused with std::nth_element, which has entirely different behavior). Unlike the last article, which only selected an O(1) implementation for vector, we will write [only] 2 overloads so that any container that is capable of performing this task in O(1) time, does (and the rest use the O(N) version).

Last time I gave you all 3 questions to ponder. Today we answer them.

Q: What features of a container determine whether or not you can get the Nth element of that container in O(1) time? What commonalities would I expect in the interfaces of such containers?

A:
A container must have a random access iterator to support an O(1) time implementation of getNthElement. You'll note that it is not sufficient for a container to merely support O(1) time subscripting (unordered_map is a counter-example).

What does random access iterator mean? Well, iterators are types that support traversing a range of data. So if I've got the numbers { 1, 2, 3, 4, 5 } in a container, an iterator lets me visit each of those elements (numbers) in the container exactly once. The order we visited them in would depend on the container. The most basic iterators only allow you to do that much (visit in some order), but more complicated ones (like random access iterators) support "jumping around" while iterating. I can go directly from the first element to the last, or anywhere in between. I could even traverse backwards by two elements at a time with a random access iterator.

Iterators are designed to look like pointers in how they're used. A pointer actually is a random access iterator (because you can perform operations like myPointer + 5 on them. The most basic operations supported by all iterators are dereferencing (*myIterator) to get an element out, incrementing (++myIterator) to look/point at the next element, and testing equality (myIterator1 == myIterator2) to see if you've reached the end of the sequence.

For our purposes, having a random access iterator means we can say:

And expect that operation to be completed in amortized O(1) time [1].

Q: How can I write [only] two getNthElement function overloads so containers that can perform the task in O(1) time, do, and those that can't still compile and output the same conceptual result but take O(N) time?

A:
As with last time, we're going to need a mix of templates and overloading. The two overloads will determine the fundamental implementations we use. The templates will contribute to making our code more general. They let us match patterns of types for use in each of our overloads.

The question, especially if you read the last article of the series, essentially boils down to: How do I select the right overload? How do I programmatically say: Use this overload if the container supports random access iterators and otherwise use the other overload?

The more general problem of selecting implementations has a few different answers, but today we'll be using tag dispatch, as it is probably the cleanest solution here. Tag dispatch essentially boils down to adding an extra argument of different types to the end of your function whose sole purpose is to select an overload. These tag types are often just empty structs (as is the case for the ones we'll be using); their only purpose is to be different types with meaningful names.

For getNthElement, we'll use the std::random_access_iterator_tag to select our O(1) implementation and the std::forward_iterator_tag to select anything else [2]. We'll use std::iterator_traits<SomeIteratorType>::iterator_category{} to get an instance of the tag that corresponds to the iterator our container provides. For other cases, you might find looking at the STL's type traits helpful when it comes to selecting a tag.

Here's what this looks like:

Watch it work on ideone!

Here are some key things to pay attention to:

1) getNthElement doesn't actually "do the work", it forwards its arguments along with a tag to one of the implementation function overloads, getNthElementImpl. These "Impl" functions do the work in a manner that is both safe and efficient for the type of iterator specified by the tag argument. This layer of indirection is necessary to allow us to both add the tag argument used for overload resolution "automatically", in a controlled manner and also to insulate our users from knowledge of this optimization (a good API designing skill in general).

2) In the Impl functions, the tag parameter has no variable name. We only use the tags for overload resolution. Leaving out the name is a hint to both the programmer and compiler that this is the case. It helps the compiler optimize out this variable entirely so that the tag dispatch part of the code incurs zero runtime overhead! [3]

3) We had to use overload resolution to select between Impl functions. We could not have used an if statement with type traits to do so. The reason is that if we tried to call one Impl in the if part of the branch and the other Impl in the else part of the branch, we would end up instantiating both, and for containers that don't support random access iterators, the code would fail to compile.

Here's proof that the code used for tag dispatch incurs no runtime overhead on modern compilers: The assembly of the tag dispatch version vs. the assembly of having your user manually select a specific implementation. (You'll see that they're exactly the same other than the error message for the assertion).

Now getNthElement will run in O(1) time on vector and deque, but will still work (in O(N) time) for all other containers, like list and map.

Q: Bonus points: How could I implement one overload of getNthElement that also optimized for containers like vector and deque by making use of a C++ STL function? Is this implementation better than our answer to Q2? Why?

A:
Leverage the STL function std::advance in our implementation, which does the above overload trick internally, on our behalf:

Is it better? I'd say yes, for two reasons:
1) We wrote less code. The STL is well-tested and provides strong guarantees for both efficiency and correctness. Writing less code and achieving the same amount of performance is always a win.

2) We reduced the problem to an even smaller form. The complicated overloading techniques were applied to the smallest portion of the overall problem as was possible in this new version. We're not conflating the three fundamental things going on in an implementation of getNthElement (getting an iterator from the container, advancing that iterator as quickly as possible N times, and dereferencing the iterator) when we apply our optimization tricks. Because the STL reduced the optimization problem down to its smallest form, we can reuse that code in more places. std::advance is a great primitive operation.

These two points are always worth considering when dealing with advanced code.

Recap:

We answered the fundamental question: When writing code that can be instantiated with many types, how do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"?

We reused the same basic techniques from the first article in this series, templates and overloading and added one new technique: tag dispatch. Overloading lets us have multiple distinct implementations for a problem, tag dispatch let us programmatically select an implementation based on type patterns, and templates made our code general so that it worked with as many types as possible.

Along the way we reinforced iterator concepts and highlighted introspection strategies for determining iterator properties. We also examined the performance implications of tag dispatch, and much to our delight, found that the technique results in zero runtime overhead.

Footnotes:

[1] The general philosophy in C++ is to only implement iterator operations that can be completed quickly, so even though you could write a random access iterator for a linked list, it doesn't exist, because it wouldn't be O(1) time for that interface.

[2] Note that using forward_iterator_tag for "everything other than random_access_iterator_tag" works because bidirectional_iterator_tag inherits from forward_iterator_tag [3]. In the alternative, I could've been even more general by using input_iterator_tag or templating the tag type in the O(N) overload, which would allow it to accept input iterators as well; I chose not to do this because it doesn't make much sense to get the Nth element of an input range, as it would change results when called multiple times.

[3] It is not a problem that we take our iterator tag parameters by value rather than reference. Even though we do end up slicing off the polymorphic components, overload resolution happens before that, and so a bidirectional_iterator_tag will still match an overload on forward_iterator_tag, even though I slice off the bidirectional_iterator_tag portion of the inheritance when the variable is actually passed. Furthermore, these structs have no data members or virtual functions to slice. There is no difference in behavior or the outputted assembly if you pass by const reference; passing by value is just shorter.