In the last article we talked about how to have multiple fundamental implementations for a single conceptual task. The notion being, if we can use type information to optimize a particular use case of our function, then why not do so and hide that benefit under our existing interface? Thus far we have limited our discussion primarily to functions, now we are going to turn our attention to structs and classes.
Having multiple implementations can be useful for optimization in the context of structs (just as we showed it was useful for functions in the last article). One example of this is std::vector<bool>, which is space-optimized compared to other versions of std::vector so that each boolean value it contains only actually occupies one bit of memory. We might also be interested in less dramatic variations in our struct, such as changing the deleter on a std::unique_ptr.
For the most part, the code reuse techniques we've discussed already will transfer over nicely to structs and classes, so we'll spend most of our time talking about a new technique that offers more options with structs/classes.
Questions for next time:
1) What was the goal of overloading/tag dispatch? Why was it useful for implementing multiple fundamental implementations of a particular function?
2) Can we apply what we learned in the last article to provide multiple fundamental implementations of a struct? What are the limitations of overloading/tag dispatch when it comes to writing structs? You may find it helpful to revisit the first article of the series.
3) What technique allows us to similarly divide one concept, in this case, a type, into multiple implementations?
4) In what ways can you select implementations using the technique from question 3? For example, how might I apply the technique to single cases that must be handled specially (like std::vector<bool> being different from all other implementations of std::vector) vs. patterns of differences (as in how the implementation of std::unique_ptr<T> varies from std::unique_ptr<T[]>)?
5) What are the limitations of the scope of the technique from question 3? It may be useful to review your answer to question 1 for this. Overcoming this apparent limitation will be the subject of future blog article.
Review:
By this point in the series, you should be able to answer these fundamental code reuse questions:
1) How do I reuse the same implementation code with different types? (Review here)
2) How do I use the same code to invoke different fundamental implementations selected by type? (Review here)
3) How do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"? (Review here)
If you struggle with these concepts, I highly recommend going back and (re)reading the relevant articles, then playing around with the concepts in your own code until they start to click and make sense.
Today we will be answering the fundamental question: When writing code that can be instantiated with many types, how do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"?
In particular, we'll be writing two overloads of a function: one that is O(1) time whenever it is given a type that supports random access iterators, and another that is O(N) that works with any iterator and selecting between them with something called tag dispatch. This is the optimization technique that is leveraged in the STL when writing algorithms like std::binary_search. In particular: std::binary_search will work with any forward iterator, but runs much faster for random access iterators. I actually already wrote about how this optimization technique is used in std::copy so that it leverages std::memcpy internally for efficiency whenever it is safe to do so.
We will be enhancing a function we wrote in the last article of this code reuse series. The function is getNthElement, which returns the element at position N of a container (not to be confused with std::nth_element, which has entirely different behavior). Unlike the last article, which only selected an O(1) implementation for vector, we will write [only] 2 overloads so that any container that is capable of performing this task in O(1) time, does (and the rest use the O(N) version).
Last time I gave you all 3 questions to ponder. Today we answer them.
Q: What features of a container determine whether or not you can get the Nth element of that container in O(1) time? What commonalities would I expect in the interfaces of such containers?
A:
A container must have a random access iterator to support an O(1) time implementation of getNthElement. You'll note that it is not sufficient for a container to merely support O(1) time subscripting (unordered_map is a counter-example).
What does random access iterator mean? Well, iterators are types that support traversing a range of data. So if I've got the numbers { 1, 2, 3, 4, 5 } in a container, an iterator lets me visit each of those elements (numbers) in the container exactly once. The order we visited them in would depend on the container. The most basic iterators only allow you to do that much (visit in some order), but more complicated ones (like random access iterators) support "jumping around" while iterating. I can go directly from the first element to the last, or anywhere in between. I could even traverse backwards by two elements at a time with a random access iterator.
Iterators are designed to look like pointers in how they're used. A pointer actually is a random access iterator (because you can perform operations like myPointer + 5 on them. The most basic operations supported by all iterators are dereferencing (*myIterator) to get an element out, incrementing (++myIterator) to look/point at the next element, and testing equality (myIterator1 == myIterator2) to see if you've reached the end of the sequence.
For our purposes, having a random access iterator means we can say:
And expect that operation to be completed in amortized O(1) time [1].
Q: How can I write [only] two getNthElement function overloads so containers that can perform the task in O(1) time, do, and those that can't still compile and output the same conceptual result but take O(N) time?
A:
As with last time, we're going to need a mix of templates and overloading. The two overloads will determine the fundamental implementations we use. The templates will contribute to making our code more general. They let us match patterns of types for use in each of our overloads.
The question, especially if you read the last article of the series, essentially boils down to: How do I select the right overload? How do I programmatically say: Use this overload if the container supports random access iterators and otherwise use the other overload?
The more general problem of selecting implementations has a few different answers, but today we'll be using tag dispatch, as it is probably the cleanest solution here. Tag dispatch essentially boils down to adding an extra argument of different types to the end of your function whose sole purpose is to select an overload. These tag types are often just empty structs (as is the case for the ones we'll be using); their only purpose is to be different types with meaningful names.
For getNthElement, we'll use the std::random_access_iterator_tag to select our O(1) implementation and the std::forward_iterator_tag to select anything else [2]. We'll use std::iterator_traits<SomeIteratorType>::iterator_category{} to get an instance of the tag that corresponds to the iterator our container provides. For other cases, you might find looking at the STL's type traits helpful when it comes to selecting a tag.
Here's what this looks like:
//Forward declarations of Impl (implementation) functions not shown.
//The user-facing interface
template<typenameContainer>consttypenameContainer::value_type&getNthElement(constContainer&container,size_tn){autotag=typenameiterator_traits<typenameContainer::iterator>::iterator_category{};returngetNthElementImpl(container,n,tag);}//The O(1) implementation
template<typenameContainer>consttypenameContainer::value_type&getNthElementImpl(constContainer&container,size_tn,random_access_iterator_tag){autoiteratorToNthElement=cbegin(container)+n;return*iteratorToNthElement;}//The O(N) implementation
template<typenameContainer>consttypenameContainer::value_type&getNthElementImpl(constContainer&container,size_tn,forward_iterator_tag){autoitr=cbegin(container);for(autoi=0u;i<n;++i){++itr;}return*itr;}
1) getNthElement doesn't actually "do the work", it forwards its arguments along with a tag to one of the implementation function overloads, getNthElementImpl. These "Impl" functions do the work in a manner that is both safe and efficient for the type of iterator specified by the tag argument. This layer of indirection is necessary to allow us to both add the tag argument used for overload resolution "automatically", in a controlled manner and also to insulate our users from knowledge of this optimization (a good API designing skill in general).
2) In the Impl functions, the tag parameter has no variable name. We only use the tags for overload resolution. Leaving out the name is a hint to both the programmer and compiler that this is the case. It helps the compiler optimize out this variable entirely so that the tag dispatch part of the code incurs zero runtime overhead! [3]
3) We had to use overload resolution to select between Impl functions. We could not have used an if statement with type traits to do so. The reason is that if we tried to call one Impl in the if part of the branch and the other Impl in the else part of the branch, we would end up instantiating both, and for containers that don't support random access iterators, the code would fail to compile.
Now getNthElement will run in O(1) time on vector and deque, but will still work (in O(N) time) for all other containers, like list and map.
Q: Bonus points: How could I implement one overload of getNthElement that also optimized for containers like vector and deque by making use of a C++ STL function? Is this implementation better than our answer to Q2? Why?
A:
Leverage the STL function std::advance in our implementation, which does the above overload trick internally, on our behalf:
Is it better? I'd say yes, for two reasons:
1) We wrote less code. The STL is well-tested and provides strong guarantees for both efficiency and correctness. Writing less code and achieving the same amount of performance is always a win.
2) We reduced the problem to an even smaller form. The complicated overloading techniques were applied to the smallest portion of the overall problem as was possible in this new version. We're not conflating the three fundamental things going on in an implementation of getNthElement (getting an iterator from the container, advancing that iterator as quickly as possible N times, and dereferencing the iterator) when we apply our optimization tricks. Because the STL reduced the optimization problem down to its smallest form, we can reuse that code in more places. std::advance is a great primitive operation.
These two points are always worth considering when dealing with advanced code.
Recap:
We answered the fundamental question: When writing code that can be instantiated with many types, how do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"?
We reused the same basic techniques from the first article in this series, templates and overloading and added one new technique: tag dispatch. Overloading lets us have multiple distinct implementations for a problem, tag dispatch let us programmatically select an implementation based on type patterns, and templates made our code general so that it worked with as many types as possible.
Along the way we reinforced iterator concepts and highlighted introspection strategies for determining iterator properties. We also examined the performance implications of tag dispatch, and much to our delight, found that the technique results in zero runtime overhead.
Footnotes:
[1] The general philosophy in C++ is to only implement iterator operations that can be completed quickly, so even though you could write a random access iterator for a linked list, it doesn't exist, because it wouldn't be O(1) time for that interface.
[2] Note that using forward_iterator_tag for "everything other than random_access_iterator_tag" works because bidirectional_iterator_tag inherits from forward_iterator_tag [3]. In the alternative, I could've been even more general by using input_iterator_tag or templating the tag type in the O(N) overload, which would allow it to accept input iterators as well; I chose not to do this because it doesn't make much sense to get the Nth element of an input range, as it would change results when called multiple times.
[3] It is not a problem that we take our iterator tag parameters by value rather than reference. Even though we do end up slicing off the polymorphic components, overload resolution happens before that, and so a bidirectional_iterator_tag will still match an overload on forward_iterator_tag, even though I slice off the bidirectional_iterator_tag portion of the inheritance when the variable is actually passed. Furthermore, these structs have no data members or virtual functions to slice. There is no difference in behavior or the outputted assembly if you pass by const reference; passing by value is just shorter.
1) You can extend or adapt other people's code with overloading.
2) It's easy to accidentally eliminate overloads from overload resolution (the set of overloads the compiler considers when calling a function), especially when doing the above. We'll discuss a simple technique to ensure that you don't leave your users in an unfortunate position by accidentally being too specific.
We'll also discuss a few other things along the way, like one reason why non-member functions are more flexible than member functions. We'll highlight why the modern C++11 guideline: 'prefer using the non-member std::begin, std::end, and std::swap functions' exists and why things like non-member std::size are in our near future [1].
Extending/adapting other people's code:
One thing people often forget is that overloading lets you extend your use of other people's code. For example: Suppose I'd like to pull in some random library from the internet, and because the designer of that library is "particular" (read: silly), they wrote a container Foo that correctly implemented the forward iterator interface, but provided only a begin method on the Foo class, without an accompanying end method.
I'd love to be able to write templated functions that take containers (including both Foo and STL containers like vector) and run STL algorithms on them, but because this class is weird, it's not apparent how to make that happen.
Let me back up a minute, show what I mean, and explain why someone might do this. This is what's going on:
structFoo{...FooIteratorbegin()const;...};voidsampleUsage(){Foof;autoitr=f.begin();cout<<*itr<<endl;//We can dereference FooIterators
++itr;//We can increment FooIterators
assert(itr!=f.begin());//We can compare FooIterators
}
The question, is how do we know when it's no longer safe to dereference the iterator? How do we know that we've exhausted all of the elements inside the Foo container? The standard C++ way would be to call myContainer.end() and compare the iterator to that; if the iterator equals myContainer.end(), we've exhausted the range.
The fundamental issue: This developer chose to get what would be the equivalent of the "end" iterator by having it be the result of a default-constructed FooIterator type.
This design decision isn't actually so far-fetched, as it's exactly what is done with std::istream_iterator (though stream classes also don't have a begin member).
The original author got away with this decision because all they ever did was call functions directly, like so: std::find(f.begin(), FooIterator{}, 5);, where they specified the "end" iterator directly.
Unfortunately, although their choice works out ok at that level, it doesn't compose well:
We could write a function bar, that takes in a templated Container type and runs find over that container. Again, not so farfetched, why not reuse code between several containers? The problem is that I can either write an implementation that works with STL-like containers (things that implement container.end()) or an implementation that works with Foo and knows how to construct its equivalent to the "end" iterator.
I could write an overload of bar to solve this, one for each style of container, but what if I've also got another function baz that has the same fundamental problem? I'd have to duplicate most of baz too in another overload.
I could also write a couple overloads of find that take the containers directly (rather than iterators), one version for the STL containers, one for Foo, and that's a bit better than our first idea, but if I wanted to also call std::transform on my container, I again need to do more work.
No. Boil the problem down to its smallest component: There is no "end" method on Foo. Solve that problem.
Now get frustrated because you "can't". The code is a library from the internet. Either you don't have access to the source code, and therefore cannot add an end member function, or you do have access, but know that you're going to have to remember to add an end member function every time they release a new version.
There's a better way. C++11 introduced a non-member version of the begin and end functions for this exact reason. I cannot ever add a member function to a primitive type like a fixed-size array (like int x[20];), but it would be meaningful to have a begin and end function on them for use with standard algorithms. Similarly, I cannot add an end member function to Foo, but it still makes sense to talk about that concept. The way we do that is via a normal (non-member) function.
The implementation of the non-member function, std::end, is simple. There are 2 overloads in the STL, one which is templated over any type T and returns the result of calling instanceOfT.end() and another overload for fixed-size arrays that returns the arrayName + sizeOfArray [2]. Why not add one of our own that takes a Foo?
This looks like:
FooIteratorend(Foo&foo){returnFooIterator{};}
We can use the existing non-member std::begin() function for free, because our container implements a begin method, so we're golden there, the STL's non-member begin function will just call that.
The one thing we still need to fix is bar: bar calls the member version of begin and end. We should change that so that it uses the more general/extensible non-member form:
Yep. Just add the std:: prefix to begin and end and this won't compile when instantiated with the Foo type. Why? Because we wrote our overload of end in a different namespace (possibly no namespace at all). By saying "look for endin the std namespace (which is exactly what we do with the code std::end), we don't look at any overloads in other namespaces. We effectively restrict overload resolution (the process of looking at all overloads in scope and selecting the best match) to the std namespace.
The solution here is NOT to write our overload into the std namespace (never do that with anything!), but rather, to ensure that all appropriate namespaces are considered for overload resolution.
How do we do that?
Well, step 1 is to avoid fully-qualifying names (in other words, don't prefix with someNamespace::).
Step 2 is to bring all overloads you want/need into the current scope. Any overloads declared in your current namespace (or no namespace at all) will already be in scope. You'll need to somehow bring the STL's ones into scope though (it'd be just as bad to be unable to compile bar with vector because we didn't have the std::begin overloads in scope). We do this with using statements. Add the lines:
usingstd::begin;usingstd::end;
to the top of your cpp file if you're calling them in a cpp file. Using statements bring symbols into the current scope. In this case, they bring the STL's overloads of begin and end into scope so that they will participate in overload resolution (along with the one you wrote for Foo).
If you're working in a header file, where the guidance is to never use using statements to avoid both pollution and unexpected behavior changes, fear not: there is an answer. The guidance I spoke of is somewhat incomplete: You should never use using statements in a header file at file or namespace scope. It's perfectly fine to use them inside a class or inside a function, even when inside a header, like so:
We only pull in std::begin and std::end for overload resolution in the scope of bar this way. The using statements will not impact anything else.
So the general rules are:
1) Avoid fully qualifying function names (specifying the namespace) when calling a function. Doing so potentially removes candidates from overload resolution.
2) Use using statements to bring additional overload candidates into overload resolution.
3) Never use a using statement in a header file unless it is done in the scope of a class or function.
Make these general habits, as the same guidance applies to functions like swap too! You never know who's going to end up calling your code; by following these rules you can make your generic code/algorithms work with other people's user-defined types.
Conclusions:
Overloading is [perhaps surprisingly] a great way to facilitate the generalization of your code when used with templates. It lets us adapt foreign interfaces to work in familiar ways so that we can reuse more code (like the STL's algorithms).
In order to "play nice" with overloads that may not all be defined in the same namespace, it's important to avoid fully-qualifying functions when calling them, and to ensure that all overloads you need have been pulled into scope with a using statement.
[2] There are actually a few other overloads of std::end for const-correctness. You'd probably also want to implement the const overloads specific to Foo in real code. See cppreference for details.
I recently gave a tech talk on "The Set of Natural Code": code that is simple, natural, and expressive. I try to highlight modern trends as seen in the D programming language and compare with C++14. Long story short: We're getting more efficient and generally, more powerful, code that is at the same time elegant and easy to maintain, just by changing our primitives.
While this talk explores these trends with D language primitives, it should be noted that versions of most of the key concepts discussed in the talk will be relevant in C++ very soon (probably by C++17). Things like formally defined ranges and universal calling syntax have proposals in committee right now.
No prior D language experience is necessary to understand this talk.
We saw in the last article that it is desirable sometimes to have different implementations of the same conceptual task. In particular, we wrote a function getNthElement, which returns the element at position N of a container (not to be confused with std::nth_element, which has entirely different behavior). We wrote two overloads of this templated getNthElement function, one so that it worked with all C++ STL containers, and another "optimized" overload for vector so that it would use a faster implementation for that case. In particular, we noticed that we can get to a vector's Nth element quite easily in O(1) time by saying myVector[2] rather than walking through the iterators, one element at a time.
In this next article, we're going to take things a step further. We noticed a problem last time: calling getNthElement with a deque should probably also use our faster O(1) implementation, but it doesn't, it uses an O(N) version to step through the iterators. The subject of the next article is fixing this problem (and generalizing our solution for other containers).
So, for next time:
1) A lemma to Q2: What features of a container determine whether or not you can get the Nth element of that container in O(1) time? What commonalities would I expect in the interfaces of such containers?
2) How can I write [only] two getNthElement function overloads so that containers that can perform the task in O(1) time, do, and those that can't, still compile and output the same conceptual result, but take O(N) time?
3) Bonus points: How could I implement one overload of getNthElement that also optimized for containers like vector and deque by making use of a C++ STL function? Is this implementation better than our answer to Q2? Why?
Review:
By this point in the series, you should be able to answer these fundamental code reuse questions:
1) How do I reuse the same implementation code with different types? (Review here)
2) How do I use the same code to invoke different fundamental implementations selected by type? (Review here)
If you struggle with these concepts, I highly recommend going back and (re)reading the relevant articles, then playing around with the concepts in your own code until they start to click and make sense.