Modern Maintainable Code

Code reuse series: The basics of type-dependent code reuse (solution)

| Comments

Introduction:

Last week we started a journey. Over the course of several articles, we'll be looking at first simple, then advanced, techniques for code reuse. We'll be mostly identifying code-reuse as it applies to running all or some of the same code on different types, but the answers may surprise you: some of the solutions may not involve templates; for others, we'll be dragging out inheritance, but there won't be an "Is-A" relationship at play in the interface, nor any runtime hit. While in this first article things are probably going to seem pretty simple, you should expect things to ramp up quickly; brace yourselves.

Last week I gave you all 4 questions to ponder. Today we answer them.

Q: Why is it beneficial to sometimes have the same function/struct name do different things depending on their parameters? For example:

assert(abs(-5) == 5); //calling abs with an int
assert(abs(-2.3) == 2.3); //calling abs with a double

A:
1) Absolute value is a concept that is not inherently dependent on type. It makes sense to ask for the absolute value of both an int and a double, maybe even your user-defined Matrix class. It should not be limited to any of these types.

2) Our users don't want to have to remember different function names for use with different types. User code should be insulated from such distractions. If you've ever needed to get the absolute value of a floating-point number in C (ewwwe, C), you might be familiar with fabs, or "floating-point abs". This is ugly at best, and at worst, hangs users out to dry when their custom types need an absolute value function.

3) Having functions with different names for different types doesn't scale. You'll exhaust all meaningful names pretty quickly, but more importantly, find that it's easy to forget to add a new one when you add a new type. Say your language/compiler decides to add a new 128-bit integer type to it, they shouldn't have to remember to write an abs-style function too.

Q: What techniques allow us to create such differences? How can I write code so that calling a function foo(5) may produce different results than calling foo("hello")?

A:
1) Templates. [Basic] templates are best used when your function performs a single conceptual action that is implemented the same way over different types, as in our absolute value function. Here's what that would look like [1] [2]:

template<typename T> T abs(const T& value) {
  return value >= 0 ? value : -value;
}

What follows until the second solution to this question is a review of how templates work. Feel free to skim or skip the next 3 paragraphs, but if this is not your strong-suit, the analogies might prove helpful.

About templating: The template keyword says that we'll be varying the implementation on some compile-time parameters, specified as comma separated values between the < and > symbols. The typename T says that one of those compile-time parameters will be a type, like int, or double, or some user-defined type, Foo. Furthermore, typename T also says we will refer to this parameter, via an alias, T. The compiler will stamp out a copy of the implementation for each type T you invoke the function with, replacing "T" with the appropriate type.

Type deduction vs. manually specifying arguments: If I call the function as: abs(5), T is automatically deduced as int, because that's what you passed in at the call site, and so the compiler makes a version of the code where it sees int instead of T. I can also manually specify the template arguments as in: abs<int>(5.3), which says, you must use the version of abs that is templated over int, that is, where all occurrences of T are treated as occurrences of int. In this case, because 5.3 is not an int, the compiler will try to perform an implicit conversion and succeed via truncation (if it failed, as would be the case between strings and ints, you'd get a compiler error). Anything that is not manually specified, the compiler will attempt to deduce from the call-site context; if it's not possible to deduce the unspecified arguments, you'll get an error.

Duck typing: There's a saying, "If it looks like a duck, walks like a duck, quacks like a duck, then it is a duck". This is the approach to templates in C++. You can instantiate (compile with a particular type) my absolute value function template up there with any type for which there would be no compiler errors. Because an int has a negation operator and therefore "looks like a duck", it works. Because a string does not implement the negation operator, trying to invoke abs(string{"hi"}) will result in a compiler error.

2) Overloading. Overloading is best used when you want to do different conceptual implementations of a function depending on type. By contrast, sometimes you do want to perform the same conceptual task, but in different ways, say to optimize a particular case. Maybe the same conceptual task is being done, but the implementation just has to be fundamentally different from one type to the next, as is the case for C++ STL's to_string function. Other times, you might use different overloads to perform tasks that are entirely conceptually distinct. With overloading you have total flexibility to do whatever you like for each type.

A technical definition: Overloading is when you create functions with the same name that differ either in the number of, ordering of, or types of parameters (ignoring the names and only looking at types).

For example, imagine a function "get the Nth element of my container", I might have an overload for vector<int> and an overload for list<int>:

int getNthElement(const vector<int>& container, size_t n) {
  return container[n];
}

int getNthElement(const list<int>& container, size_t n) {
  auto itr = cbegin(container);
  for (auto i = 0u; i < n; ++i) {
    ++itr;
  }
  return *itr;
}

Why? Because I want my function to work with multiple containers, but to be fast when it can be. In a vector I can get to any element in O(1) time, in a list, O(N). I could template the code and have both containers use the O(N) implementation I used on the list<int>, and it would work, but that would make the vector code unnecessarily slow.

A notable difference: If there is no overload of my function for a particular type, I cannot use that type to call it (unless there's an implicit conversion to a type used in one of the overloads). In other words, I cannot call getNthElement(deque<int>{1, 2, 3, 4 ,5}, 3);. It just won't compile; There's no matching overload. With a templated solution that looked like the implementation I used for list<int>, it would accept a deque<int> too, without extra work (I could even make it work over more than just ints!).

3) Bonus: Both! Templates and overloading, together! There's no reason we can't have our cake and eat it too. We can solve the problem I described in the previous paragraph so that everybody wins. We'll make it so that I can call getNthElement on all of the STL containers, but also ensure that it's optimized/fast for vector. Here's what that looks like:

template<typename Container>
const typename Container::value_type& getNthElement(const Container& container, size_t n) {
  auto itr = cbegin(container);
  for (auto i = 0u; i < n; ++i) {
    ++itr;
  }
  return *itr;
}

template<typename T>
const T& getNthElement(const vector<T>& container, size_t n) {
  return container[n];
}

Let's break that down:

The first overload is the one that works over all STL containers. All STL containers present the forward iterator interface necessary to call that code successfully (including vector). I templated over the Container we will be looking at, implicitly including the type of the elements inside the container (now our function works with more than just ints! Hooray! Reuse!).

The second overload specifies precisely what container is being used: it takes a vector. It's still templated because we would like it to work over a vector of anything; there's no conceptual difference between getting the third element of a vector<int> and a vector<string>, so why write different code? The vector's overload does an optimized/fast implementation.

Overload preference: A more specialized (specific) overload is always preferred by the compiler. So if I try to call getNthElement(vector<int>{1, 2, 3, 4 ,5}, 3);, even though both overloads would match, the one templated over only the element type would match better.

Still imperfect: You might notice that although we can call this with deque now (getNthElement(deque<int>{1, 2, 3, 4 ,5}, 3);), that would invoke the slower implementation, and yet, it's actually possible to write an O(1) implementation for deques too! I leave it as an exercise for the reader to fix this problem for deque using only the techniques shown in this article. Solving this problem more generally is the subject of the next article in the series. As is often best in an interview scenario, I recommend coming up with a "not perfect, but working" solution before trying to boil down to 2 overloads, or even 1 (both of which I expect are beyond the skill level of most readers thus far). Feel free to post your answers below!

Q: For each technique that you identify, in what ways is the code allowed to vary? Is it possible for me to sneak in some optimization code for a particular case when I use one technique vs. another?

A:
1) In [basic] templates, the only things that are allowed to vary in your implementation are the types. The code is the same; it just gets "stamped out" with different types filling in the blanks. With what we've seen so far, having optimizations for certain cases/types is not possible. We’ll explore what it takes to have variable implementations in future articles on advanced template techniques.

2) In each overload of a function you have total freedom. You can perform completely different actions or do entirely different conceptual tasks. There are no limitations on how different the functions can be. Because we can vary anything, we can optimize for any type we like.

Q: Do the techniques you've identified work equally well for functions and structs?

A:
No. You cannot overload a struct (or class). The language simply doesn't allow you to overload here. As for why: just think about what that would look like. Would it mean differentiating based on parameters to constructors? If so, how can I have different constructors that use the same implementation (vector, for instance, has several constructors)? It just doesn't make a lot of sense, so, no, you cannot overload structs. No worries, we'll explore techniques that give us the same freedom with structs that overloading does for functions a bit later in this series.

Good news: Templating does work equally well with structs:

template<typename T> struct Foo {
  Foo(const T& value) : value_{value} {}
  void print() { cout << value_ << endl; }
private:
  T value_;
};

Our Foo struct will take any type, store a copy inside, and facilitate printing it whenever we like.

Recap:

Reusing the same names for functions is a great way to talk about a conceptual action. This insulates users from type differences and any optimizations we might add for specific cases.

From a different perspective, we've answered a couple of fundamental questions you might want to ask yourself when programming:
Q: How can I reuse the same implementation code with different types?
A: Templates! The implementation is consistent, we just plug-in different types into the same conceptual code.

Q: How can I use the same code to invoke different fundamental implementations selected by type?
A: Overloading! You have the power and flexibility to implement functions as differently as you like, depending on the types of the parameters.

We learned how to write code that accepts any type that "quacks like a duck" (templating). We also learned how to optimize implementations for the types that can be faster (overloading). We even looked at combining these two things to get the best of both worlds.

Lastly, we highlighted some of the limitations of each technique, some of which we will be able to overcome in future articles, some, which we cannot.

Footnotes:

[1] Perhaps the implementation of abs that first comes to mind is absValue = absValue >= 0 ? value : value * -1. As it happens, this is more complicated than it needs to be: we're invoking multiplication, which may not make sense (and thus, not be implemented) in the general case for user-defined types. Leveraging the negation operator instead makes our code more general in this subtle way.

[2] Especially after looking at the implementations I provide, you might ask "Why do I even care that abs can be called for a particular type"? I'd remind you that 1) abs is not the only function for which we seek code reuse; not all functions have a trivial implementation. 2) Names are exceedingly important in maintainable code. Having abs be a named action makes it more clear what's going on and actually enables other forms of code reuse in templated code.

Code reuse series: A Roadmap

| Comments

I'm taking the blog in a different direction for a while. We'll be looking at Maintainable Code in the context of general problems you might want to solve. In particular, we'll be looking at various ways to reuse code for the next few articles.

I will also try borrowing from the style of Herb Sutter, posing questions to you in one blog entry to be answered in the next. I'm hoping you'll post your answers in the comments!

So, for next time:

1) Why is it beneficial to sometimes have the same function/struct name do different things depending on their parameters? For example:

assert(abs(-5) == 5); //calling abs with an int
assert(abs(-2.3) == 2.3); //calling abs with a double

2) What techniques allow us to create such differences? How can I write code so that calling a function foo(5) may produce different results than calling foo("hello")?

3) For each technique that you identify, in what ways is the code allowed to vary? Is it possible for me to sneak in some optimization code for a particular case when I use one technique vs. another?

4) Do the techniques you've identified work equally well for functions and structs? Why or why not?

What lies ahead:

Here's a rough preview of what I expect to be writing about over the next several blog entries. Everything is subject to change. We will be finding solutions to these fundamental problems:

1) How do I reuse the same implementation code with different types?
2) How do I use the same code to invoke different fundamental implementations selected by type?
3) How do I reuse code for some types but select different fundamental implementations for others? What if there are patterns to the "exceptions" to the "default code"?
4) How do I select between different implementations when only a small piece of the implementation varies between types?
5) How can I add the same set of features (capabilities) to completely unrelated types? Part 2: How can I do this selectively, at an instance by instance granularity?
6) How can I design my code so that it is easy to test and try different data structures to determine which is fastest?

Along the way, we'll explore ideas like how we can insulate our users from such problems, so that we can provide a higher level of abstraction that "just works", and doesn't look like a mess from the outside. We'll also talk about deliberately bringing our users into the conversation to let them selectively change the behavior of our code. Lastly, we'll be sure to keep a close eye on the performance and maintainability implications involved in the solutions to these problems.

A Reminder:

Don't forget to post your answers to today's questions in the comments! (They'll help me both provide more complete answers and also explore other fundamental questions we might ask about code reuse).

Valgrind is *NOT* a leak checker

| Comments

Summary:

Valgrind is one of the most misunderstood tools I know of in my community. Valgrind is not a leak checker. It has a leak checking tool. I'd argue that this tool happens to be the least useful component.

Without changing the way you invoke Valgrind, you get so much more useful information than most people realize. Valgrind finds latent bugs even when they don't cause your program to fail/crash; it doesn't just tell you where the bug happened, it tells you why it happened, in English. Valgrind is an undefined behavior checking tool first, a function and memory profiler second, a data-race detection tool third, and a leak checking tool last.

There's a reason why this is the first thing I tell students to do at office hours.

First things first:

To run valgrind, simply go to the directory where your program is and run:

valgrind ./myProgram myProgramsFirstArg myProgramsSecondArg

No special arguments.

You'll see both your program's output as well as the debugging output generated by Valgrind (which is prefixed with ==). The output is most helpful (and includes line numbers) if you compile your program with -g before running valgrind over the executable.

For the purposes of this article, please, Ignore all Valgrind output after the "HEAP SUMMARY" line. This is the part we don't care about: the memory leak summary.

What can it detect?:

1) Misuse of uninitialized values. At its most basic:

bool condition;
if (condition) {
  //Do the thing
}

This is a fun one. A lot of time your code is just going to keep going and fail silently if you run this. It might even do exactly what you hoped it would do... most of the time. In a perfect world, when your code is wrong, it fails every time. Hard and fast errors, not silent, latent, and long-running. Knowing that there is a bug is the first step to fixing it. The problem here is that that bool has no value assigned to it. It is NOT automatically initialized to false (or true). Its value is whatever garbage happened to be in memory at that time.

The valgrind output for the example is of the form:

==2364== Conditional jump or move depends on uninitialized value(s)
==2364==    at 0x400916: main (test.cpp:106)

Notice: This tells us why the code exhibits undefined behavior, not just where. What's more, Valgrind catches it even if the undefined behavior wouldn't cause your program to crash.

I doubt something quite so obvious as the above example is written often, but it'd be much harder to see this mistake in code of the form:

bool condition;
if (foo) {
  condition = true;
}
if (bar) {
  condition = false;
}
if (baz) {
  condition = true;
}
if (condition) {
  //Do the thing
}

Here we initialize properly some of the time... but not all of the time. Valgrind still catches it if you have a test that exhibits the undefined behavior.

For what it's worth, you can use defensive coding practices to avoid this type of bug in the first place. Prefer to always initialize your variables with a value. Use the auto keyword to require that you do so (you cannot deduce a type without a value to deduce it from). Take a look at the articles on auto on Herb Sutter's blog to find out more.

2) Accessing memory you shouldn't. Touching memory that was never allocated, memory that's been freed, access past the end of allocated memory (so, off by one errors), and inaccessible parts of the stack.

An example:

  vector<int> v { 1, 2, 3, 4, 5 };
  v[5] = 0; //Oops

Do you see it?

If I run this code normally on my computer, it actually seems to run just fine. No crashes over 20 runs... but it's definitely wrong. Even if I did manage to have it open in GDB (another debugging tool) when it crashed, the best I'd get is a stack trace, and it might not be where the problem was caused, but rather, where it manifested, at the symptom, if you will.

Here's the corresponding Valgrind output:

==2710== Invalid write of size 4
==2710==    at 0x400961: foo() (test.cpp:85)
==2710==    by 0x4009A2: main (test.cpp:89)
==2710==  Address 0x5a1d054 is 0 bytes after a block of size 20 alloc'd
==2710==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2710==    by 0x400EDF: __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) (new_allocator.h:104)
==2710==    by 0x400DCE: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (in /home/mark/test/a.out)
==2710==    by 0x400C5F: void std::vector<int, std::allocator<int> >::_M_range_initialize<int const*>(int const*, int const*, std::forward_iterator_tag) (stl_vector.h:1201)
==2710==    by 0x400AF4: std::vector<int, std::allocator<int> >::vector(std::initializer_list<int>, std::allocator<int> const&) (stl_vector.h:368)
==2710==    by 0x400943: foo() (test.cpp:84)
==2710==    by 0x4009A2: main (test.cpp:89)

That's a little unwieldy if you're not used to looking at stack traces through the STL. Let's break it down.

First line tells you why your code exhibited undefined behavior. There was an "Invalid write of size 4". Size 4 means I wrote something 4 bytes big. On my machine, that's probably an int. Invalid write means that I touched memory I shouldn't have. As it happens, this was an off by one error: I wrote past the end of my vector.

Now let’s look at the 2nd and 3rd lines. These are Valgrind's best guess at the part of the stack trace that you care about. Indeed, in my case, foo is where the troubled code was, and main is the function that called foo.

The 4th line is more detail on the matter of "you ran off the end of the memory you were using".

And the rest is a more detailed stack trace that includes the STL. For what it's worth, the problem is never in the STL (ok, almost never).

3) Misuse of std::memcpy and functions that build on top of it whereby your source and destination arrays overlap (be sure to read my article about why std::memcpy is deprecated, then remember that you'll still invoke it under the hood of a better abstraction)

Not including an example on this error type or the next; I don't think they're especially common in modern code and if you do run into these, running Valgrind normally, without arguments, will expose both types of problems.

4) Invalid freeing of memory (minimal in modern code where you should be using smart pointers anyway)

5) Data races:

If I run:

  auto x = 0;
  thread([&] {
    ++x;
  }).detach();
  ++x;

with:

valgrind --tool=helgrind ./myProgram

I get some useful information:

==2872== Possible data race during read of size 4 at 0xFFEFFFE8C by thread #1
==2872== Locks held: none
==2872==    at 0x401081: main (test.cpp:96)
==2872== 
==2872== This conflicts with a previous write of size 4 by thread #2
==2872== Locks held: none
==2872==    at 0x40103A: main::{lambda()#1}::operator()() const (test.cpp:94)
==2872==    by 0x401F2D: void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1732)
==2872==    by 0x401E84: std::_Bind_simple<main::{lambda()#1} ()>::operator()() (functional:1720)
==2872==    by 0x401E1D: std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() (thread:115)
==2872==    by 0x4EEEBEF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==2872==    by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==2872==    by 0x535F181: start_thread (pthread_create.c:312)
==2872==    by 0x566FEFC: clone (clone.S:111)

It tells me that I'm not protecting my data properly. I'm sharing data without synchronizing with a mutex. Bam.

I should mention that although this did find the bug in the code, it also included a ton of false positives on the std::shared_ptr used internally to std::thread. It seems they need to do a bit more work on that front. You could probably write a simple D or python script to scrape helgrind output for only the useful bits.

6) And yeah... it finds leaks, if you're still not using smart pointers.

Run:

valgrind --leak-check=full ./myProgram

(If you forget that flag, just run valgrind normally once; it'll remind you in the text in the summary area)

On:

auto x = new int(5);

And you'll see:

==2881== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==2881==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2881==    by 0x400966: main (test.cpp:92)

Valgrind as a function and memory profiler:

In addition to being able to tell you where you've introduced bugs into your program, Valgrind can also help you optimize. Too often people assume that they know what's eating up their runtime or what their big memory problems are... and they're wrong. Use your time wisely: measure!

Run your program with:

valgrind --tool=callgrind ./myProgram

And it'll spit out a file in the same directory whose name is something like callgrind.out.2887. Download the program KCachegrind to get a GUI visualization of the flow of your program, what functions are eating up your runtime, and generally, a better understanding of where to focus your efforts.

Here's what some of the most simple output looks like, showing the runtime cost of each function both in terms of wall time, number of times it was called, and percentage of the total runtime. You can Google for some of the more interesting graphs/flow diagrams it generates.

Similarly, I can evaluate where I'm allocating the most memory by running with --tool=massif. This is often useful for leak checking as well, as larger parts of your memory footprint may be indicative of leaks.

Conclusions:

Valgrind is much more than a leak checking tool. Change your perspective: Valgrind is an undefined behavior killer.

Valgrind should be your tool of first resort. It not only tells you where your bugs are happening, but why, and it'll tell you this even if your program doesn't crash (unlike GDB on both counts). For what it's worth, GDB is still a very useful tool for getting full stack traces on failed assertions and for debugging concurrent code, among other things.

You may also find it useful to always compile with -pedantic -Wall -Wextra. Your compiler is often smart enough to flag undefined behavior as well. What the compiler misses, Valgrind should catch.

If this interests you, you may want to take a look at some other tools that perform similar duties, often with less of a runtime hit:
Address Sanitizer for clang and g++
Undefined Behavior Sanitizer for clang and g++
Memory Sanitizer for clang
Thread Sanitizer for clang

Tech Talk: Modern Maintainable Code

| Comments

Last week I did a tech talk on Modern Maintainable Code. Specifically, I talked about the importance of self-documentation and techniques to make it clear what your code means, not just what it does. I also spoke about abstracting code organization, and did a quick tutorial on auto, move semantics, and smart pointers, with a focus on why they're not just convenient, they also make it harder for you to write certain classes of bugs.

The first 23 minutes of the talk (through abstracting code organization) is material that I expect will be valuable to all programmers in any domain and any language; your mileage may vary on the remaining sections depending on how actively you've been reading about those topics, though I guarantee I've thrown in some goodies that most people haven't seen yet.

Modern Maintainable Code

Slides are available here.
Errata have been noted in the presenter comments section. In particular, the decltype slides were no longer relevant (I removed an example) and I needed to call .get() on the unique_ptr's for my 3 line MyIntVector copy constructor. make_unique is also not available until C++14, but you can get an implementation that will compile in C++11 if you Google for it.

memcpy, memmove, and memset are obsolete!

| Comments

Summary:

While they won't produce incorrect results when used appropriately, the mem* family of functions is obsolete. In particular, std::copy, std::move, std::fill, and std::equal[1] provide type-safe, more general, and equally efficient interfaces to perform the same tasks as std::memcpy, std::memmove, std::memset, std::memcmp, and their wide variants.

This article explores why the standard algorithms are superior and covers one pseudo-exception to the rule.

Background: What are these functions?

For those who don't know, the mem* family of functions are basically a way to take advantage of special hardware instructions to quickly copy, set, or compare a range of values. Here are their signatures:

void* memcpy(void* dest, const void* src, std::size_t count);
void* memmove(void* dest, const void* src, std::size_t count);
void* memset(void* dest, int value, std::size_t count);
int memcmp(const void* lhs, const void* rhs, std::size_t count);

std::memset is probably the easiest to explain:

int arraySize = 20;
int a[arraySize];
std::memset(a, 0, sizeof(int)*arraySize);

The net result of running this code is that all of the elements of the array 'a' are initialized to 0. We have 'blasted the bytes' of everything from the first byte of that array to the last with the value '0'. memset sets each individual byte with our specified value (0).

The motivation? If we don't call memset here, the array 'a' holds garbage values - we're not sure what it contains.

The phrase 'blasting bytes' will show up several times in this article. It refers to using special, blazingly fast, hardware instructions that operate over multiple pieces of data, treating the data as nothing more than raw bytes. This is actually the appeal of the *mem functions: speed.

std::memcpy 'blasts bytes' from one location to another, copying the data. std::memmove does the same thing as memcpy, but can be used when the source and destination are the same array. std::memcmp quickly checks if two arrays contain the same byte content.

Ok - now that we understand what the mem* functions are, let's explore why the standard algorithms deprecate these functions we inherited from C:

Reason #1: The standard algorithms are type-safe

For the remainder of the article, we will restrict our discussion to memset, but our arguments apply to all of the functions.

Memset has an interesting function signature. We pass it a 'void*' as the destination - we can overwrite the bytes to anything. If the values you put in your destination don't fit the domain or don't make sense, oh well; it overwrites them anyway. In this manner, std::memset is as dangerous as a reinterpret_cast.

Meet std::fill and std::fill_n:

template<class ForwardIt, class T>
void fill(ForwardIt first, ForwardIt last, const T& value);
template<class OutputIt, class Size, class T>
OutputIt fill_n(OutputIt first, Size count, const T& value);

We'll talk about std::fill_n rather than std::fill, since it more closely resembles std::memset's signature, but std::fill and std::fill_n do roughly the same thing. We give it an iterator to start at, we tell it how many elements we will visit, and we pass it a value.

Some thoughts:
1) We can still pass in pointers, just like before. A pointer is actually a RandomAccessIterator, the most general kind of iterator.
2) 'count' refers to the number of elements, not the number of bytes. We've raised the level of abstraction! How many ints are in my array 'a'. How many strings are in my vector? The iterator's ++ operation will get us from one to the next safely thanks to the type information.
3) Because we're working with elements, not bytes, we pass in a value of type T. As long as you can assign something of type T to the dereferenced value of the 'first' iterator, this will compile and work.

We get type safety because of the second and third thoughts: We can only access one 'element' of our range at a time, and we can only set the value of it as something of the appropriate type, not arbitrary bytes. We're not reinterpreting the bytes.

Reason #2: The standard algorithms are more general

We can do more with std::fill's function signature.

1) We're not limited to pointers anymore; we can do a std::fill on a linked list and the call site will look exactly the same as a call to do the same thing on a vector or an array. That's awesome! Less mental overhead.

2) We're not limited to setting every byte to the same value anymore; we can set values of a range equal to the value of an entire struct:

struct GroceryItem {
  string name;
  double price;
};

void tomatoesEverywhere(vector<GroceryItem>& inventory) {
  GroceryItem tomatoes { "tomatoes", 2.0 };
  
  //Everything is a tomato now! (Muahahahaha)
  std::fill(begin(inventory), end(inventory), tomatoes);
}

Both the string and the double get copied into each of inventory's elements, safely.

3) We're not limited by what types we can set or copy. If you come from std::memset land, you may now be wondering: "Hey, wait! You can't call std::memset on a std::string!" and you'd be right! You can't. The reason you can't std::memcpy one string into another, is because memcpy simply 'blasts the bytes', with reckless abandon to what they represent. std::string has a pointer to the memory it allocated, if you std::memcpy it, you copy that exact pointer address too. Now you have two strings that both think they own that memory, and so your program will eventually crash.

std::fill is more general; it works safely with any type that supports copying.

Reason #3: The standard algorithms are equally efficient

Here's the real killer - you might be skeptical after reading the last paragraph from reason 2 (because std::fill handles a more general case), but std::fill and std::memset are actually equally efficient.

At compile-time the compiler can determine whether or not a type is what the standard calls 'trivially copyable'. In other words: the compiler can decide if it would be safe to simply 'blast the bytes' to make a copy, or, if like std::string, the class defines a non-trivial copy or move operation and a semantic (rather than bytewise) copy is necessary. There are a few other requirements as well, but there are standard type traits for all of them.

Given the ability to make that distinction at compile-time, the compiler can choose to do one of two things via overloading:
1) If it's safe to 'blast the bytes' - go ahead and do so. The code can invoke std::memset as an implementation detail of std::fill.
2) If it's not safe to 'blast the bytes', the compiler can instead loop over our range and manually invoke the copy constructor.

This means that std::fill is just as fast as std::memset in all the cases that std::memset is good for, but it will also work in cases where std::memset can't.

For the non-believers, I whipped up an implementation of std::copy that is optimized with std::memcpy whenever it is safe to do so:
Check it out, though be warned: it's a tough read.

You might still be wondering whether your standard library implementers actually make this optimization. I'm here to tell you that they, in fact, do. In particular, I learned this by watching a video by Stephan T Lavavej, Microsoft's standard library maintainer. And here's proof that this optimization is shipping to you in g++.

Lastly, a pseudo-exception:

While the standard algorithms are best suited in most cases, there's one situation where the mem* family is your only option, and you'll find it littered throughout C code.

If you've got a struct, say an 'addrinfo' class from the UNIX sockets API, and you need/want to zero it out before you start working with it, it's common practice to just call:

addrinfo info;
memset(&info, 0, sizeof(addrinfo));

This zeros out whatever happens to be in the struct. You don't need to know how big it is or what's inside, this zeros everything. std::fill is insufficient here, because we would need an element with the right value to copy into this, and we don't have that.

The key idea here: C doesn't have constructors, which are what should really be zeroing these things out, so instead programmers hack around the language and have to remember to use memset so that struct internals aren't filled with garbage. Yay. *crying*

Conclusions:

Leave the mem* family of functions in the past. They're relics. Stick with the standard algorithms. They work in more cases, they're type-safe, and they are just as fast whenever it'd be possible to call the equivalent mem* function.

The only 'reasonable' case to fall back on the old functions is when you're working with nasty code from the past that doesn't present a clean interface.

Footnotes:
[1] std::memcmp is actually most similar to std::lexicographical compare, in that its return value indicates not only if the two input items are equal, but in the alternative, which is lexicographically first. I'm leaving std::equal at the top because it is more likely to see common use.