Summary:
I had a realization today: std::pair should only be used when writing templated classes/functions; a custom struct will always stand up better under maintenance.
Interestingly, this does not seem to directly generalize to std::tuple, for which std::tie can be used quite gainfully to leverage the tuple swap and comparison functions.
Why:
Let’s begin by discussing what a pair actually is. A pair is a way of referring to exactly two things.
It’s important that we don’t start to add words or implied meanings to that definition. It does not tell us what those things are, either as a group or as components, beyond saying: “here’s two things”.
So let’s dive into:
Reason #1: It’s not self-documenting
If I give you a std::pair<string, int>
, there’s no way of knowing what that is. The string might represent a number, or a spaceship. The int might be an address or a timestamp. Whatever these values are, we don’t know what they are individually, or why we grouped them together.
You might argue, hey, be nice, there’s going to be context! I would then direct you to a common (albeit awful) use case of pair: “returning multiple things” from a function. If string getPlayerData();
just isn’t cutting it anymore because you need their score too, not just their name, throw them in a pair! pair<string, int> getPlayerData();
.
Method signatures like that are terrifying. At least here the function is named something reasonable, I can infer that the pair holds ‘player data’, but that gets lost the minute I take the return value, store it in a variable, and pass it to someone else. With pair, I’m relying on someone using Hungarian notation to describe the meaning of the type as part of the variable name.
Even if we concede that there is context, that it’s all nice and closed off inside a single class, I ask you, tell me what reads better, a pair of string and int or a struct ‘Player’ with members ‘name’ and ‘score’. With the struct, I have a sense of what’s going on and why these two values are being grouped together. I have descriptive names of what values those types hold, not just ‘first’, and ‘second’.
This is all to say:
1) The types of the elements in the std::pair are not enough! It tells me what values are syntactically legal to populate those members with, it does not tell me what values are semantically legal or what they represent.
2) The name ‘pair’ doesn’t tell me why we are grouping its members, just that it holds two things. Even if we did have a struct that had members ‘name’ and ‘score’, what am I talking about? A Player? A horse? A presidential election? The name of the class is just as important as the name of its members!
A custom struct gives you the fantastic opportunity to name the struct and the members. Do not underestimate the power that this gives you. Good naming is the first, and arguably most important, component of good documentation. Especially in the world of auto, where types are being omitted from our code (more so in C++14), member names and variable names are of increasing importance.
Reason #2: It hinders code evolution
The scenario:
Suppose I want to map student names to exam scores. I might write the code:
std::map<string, int> studentToScore;
Cool. Now suppose that I’d like to also keep track of who signed the honor code on their test; I’ll want to hunt down anyone who didn’t later and see what’s up, but if they sign it after the fact, I still want to have their score around. I could add another map from string to bool ‘studentToSignedExam’, but I’m thinking to myself that I’d really like to just iterate over a single container and all of their information - it’s both convenient and more cache efficient. So I use a pair:
std::map<string, pair<int, bool>> studentToScore;
Grand. We’ll ignore the items from Reason #1, lie to ourselves, say naming isn’t an issue, I’m the only one using this program, surely I’ll remember what that is in a month.
The problem:
Now I decide, you know, it’d be great if I also knew what room each student took the exam in. That would be useful for finding people who cheated. Well darn. I can’t squeeze that into a pair. I could change it to a tuple<int, bool, int>
, which seems fine on its face (barring any comments on self-documentation), but this actually has a ripple effect. If I didn’t use auto in my code, I now need to go and fix the types everywhere where someone gets an entry out of my map; that not only requires me to do a find/replace (possibly in code that I don’t have access to, if I’m developing a library), but now the type name just keeps growing, causing line bloat.
Imagine instead, I do the following:
struct ExamInfo {
int score_;
bool signedHonorCode_;
ExamInfo() = default; //Allow subscript operations on the map
ExamInfo(int score, bool signedHonorCode)
: score_{score}, signedHonorCode_{signedHonorCode} {}
};
map<string, ExamInfo> studentToExamResults;
This is nice. It’s self-documenting (yay), and furthermore, it’s easy to add information. If I decide I want to store what exam room the student was in, I just add a data member and a constructor. Existing code doesn’t need to change! Clients are insulated from our maintenance. They might need to recompile, but their code is fine as written; that’s a win!
A custom struct also allows me to add members functions whenever I feel like.
Exceptions:
There’s one general case in which I can see std::pair being useful: When you truly have no idea what it means or what’s inside beyond its type.
That’s it.
Let that sink in for a moment. The exception to the rule says: “It’s just as awful and evil as every other example, for exactly the same reasons as before, we just can’t do anything about it sometimes”.
When can that happen? I can think of one realistic case and a group of less probable ones I think most people won’t encounter.
Case 1:
Writing templated classes or functions. A good example of where a std::tuple (a generalization of std::pair) might shine, is if you were trying to save arguments to pass to a function at a later time (like std::bind without placeholders). Usage might look like:
void printSum(int x, int y) {
cout << (x + y) << endl;
}
PreparedFunctionCall p{printSum, 5, 3}; //Nothing printed yet.
p.execute(); //Now we pass ‘5’ and ‘3’ to printSum and see the output
To implement this, you could store all of the function arguments as a tuple as one member and the function as another member. Later on, you could then pass the items inside the tuple to the function as the implementation execute().
This is a good example because as long as we name the tuple that holds the arguments ‘arguments’, we’re not really going to be able to get any more descriptive without making the function less generic (limiting ourselves to calling one specific function, etc). I can’t think of a name for the type of the variable ‘arguments’ that would be more descriptive of its contents. I can’t think of names for the members of the tuple that would be more descriptive (again, without making the function less generic).
Tuple is fine here; it’s actually exactly the sort of thing it’s meant for.
Aside: I realized half-way through coding up the implementation of PreparedFunctionCall that it would take a full page to show, and another 4 pages to explain the variadic template metaprogramming. Look here if you’re curious.
Caution:
Just because you’re writing a templated class or function does not mean that using pair or tuple is ok! It is only ok if you truly cannot come up with a more descriptive type.
In particular, I’d like to take this opportunity to complain about std::map. std::map lets you iterate in sorted order over the contents of the map. The type of a dereferenced map iterator is std::pair
for (auto& element : myMap) {
cout << element.second << endl;
}
The name ‘second’ does not describe what is contained! If you show this code to someone who has never written in C++ before, they’re probably going to be confused, and yet maps are fundamental! This is madness.
I think we could all agree that:
for (auto& element : myMap) {
cout << element.value << endl;
}
Is more self-documenting and more readable.
I would write a struct, call it ‘KeyValuePair’, with members ‘key’, and ‘value’. The name of the class is still vague, but we can allow that because we can’t really make a more precise name because we're still trying to be generic. What we bought ourselves was readability with regards to the names of the members.
Case 2:
If you’re trying to represent a mathematical concept, like a relation, which is defined simply as a ‘set of pairs’, here too, it is not possible to give more precise names to the class or its members.
An example of this would be defining ‘less than’ for a type where you couldn’t define the operator in terms of arithmetic, but only by putting the cases where ‘less than’ would be true into a set.
Thoughts on std::tuple:
So a good question at this point, is that if std::pair is bad in most cases, and std::tuple is a generalization of pair that just has N members instead of only 2, should I be worried about std::tuple?
Yes - but not quite as much.
Tuple is the wrong choice if ever you violate any of the reasons listed above - but it has a few features that pair doesn’t, that make it a clever way to implement a few things. In particular, std::tuple can be used to ‘automatically’ generate swap and comparison operations that make sense for a wide variety of types.
Before I get to that though, I need to talk about std::tie. Tie says “give me a bunch of named variables[1] as arguments. I’m going to construct a tuple where each of its members is a reference to one of the arguments”. In other words, we haven’t copied any of the arguments, but we can now refer to them via the aliases (references) in the tuple and use tuple operations on them.
What that means, is that if I have a simple struct as follows, I can efficiently implement swap and less-than like this:
struct Client {
string firstname_;
string lastname_;
};
void swap(Client& a, Client& b) {
using std::swap; //See footnote [2]
auto aComponents = tie(a.firstname_, a.lastname_);
auto bComponents = tie(b.firstname_, b.lastname_);
swap(aComponents, bComponents);
}
bool operator<(const Client& lhs, const Client& rhs) {
auto lhsComponents = tie(lhs.firstname_, lhs.lastname_);
auto rhsComponents = tie(rhs.firstname_, rhs.lastname_);
return lhsComponents < rhsComponents;
}
That’s pretty nifty! With that less-than operator, we can sort by firstname, and break ties by sorting by lastname, and it saves us the trouble of writing out all those if statements while not incurring any copies.
One thing you’ll note here, is that I don’t keep the tuples around for very long. I make a tuple of references, I exploit its operators, and then I let it fall out of scope. It doesn’t cost us much (if anything) to make the tuple, so there’s no point in keeping it around when my Client struct is more descriptive. I just leverage the tuple operators to save me some work.
Footnotes:
[1] std::tie only accepts 'named variables' because its signature requires that you pass the arguments by l-value reference. This means that calling std::tie(5, 3) or std::tie(foo()) is illegal, because 5, 3, and the result of foo() are values, and you cannot have an l-value reference to a value, only variables with names. You can bind 5, 3, and foo() to r-value references however, and if your plan is to throw those values away in a minute anyway, std::forward_as_tuple might be worth looking at.
[2] This is an arcane but useful thing to remember: If you have using std::swap;
in your function, it puts std::swap in for scope resolution, but anyone else who defines a swap function that is a better match (one that doesn’t use templates) will be prefered and called instead. This means we don’t force std::swap on anyone if they can do something smarter.