Data Scientists: This is not your grandmother's C++

Tweet

2016.06.28

A quick note. I saw a few lines of antiquated C++ in a Python presentation for the obligatory "look-how-C++-sucks" bit. If you like Python, good for you, I'm not here to bash it. But for the love of Hopper, stop freaking people out with old, outdated C++. Many (most?) high-performance machine learning libraries are written in modern C++, often with a Python interface. If you want to understand how hot new tools like TensorFlow or CNTK work: you'll learn a lot by diving into their C++ core.

The scary C++ examples follow a common theme: show how something simple is absurdly contrived in C++ compared to Python. Hey let's create a vector (well, it'll be a list in Python) of sets and print the result in some format:

sets = [{1, 2, 3}, {4, 5, 6}, {42, 47, 15}]

for set in sets:
  print('{ ', end='')
  for i in set:
    print(str(i) + ' ', end='')
  print('}')

Clean and simple. Now the C++ version, Frankenstein's monster in code form, look how ugly it is:

#include <iostream>
#include <set>
#include <vector>

int main() {
  using namespace std;

  vector<set<int> > sets;

  {
    set<int> tmp;
    tmp.insert(1);
    tmp.insert(2);
    tmp.insert(3);
    sets.push_back(tmp);

    tmp.clear();
    tmp.insert(4);
    tmp.insert(5);
    tmp.insert(6);
    sets.push_back(tmp);

    tmp.clear();
    tmp.insert(42);
    tmp.insert(47);
    tmp.insert(15);
    sets.push_back(tmp);
  }

  vector<set<int> >::const_iterator it = sets.begin();
  for (; it != sets.end(); ++it) {
    cout << "{ ";
    for (set<int>::const_iterator j = it->begin(); j != it->end(); ++j)
      cout << *j << ' ';
    cout << "}\n";
  }

  return 0;
}

Kill it! Kill it with fire! Everything from creating the vector of sets to looping is awful... Except that's C++98. Nobody in their right mind is using C++98 unless they're forced to. I'm not even sure this is good C++98 code, I haven't written with this standard in ages. It's easy to forget not so long ago we couldn't write vector<set<int> > without adding a space between the two >>. In modern C++, the code looks like this:

#include <iostream>
#include <set>
#include <vector>

auto main() -> int {
  using namespace std;

  auto const sets = vector<set<int>>{{1, 2, 3}, {4, 5, 6}, {42, 47, 15}};

  for (auto const& set : sets) {
    cout << "{ ";
    for (int i : set)
      cout << i << ' ';
    cout << "}\n";
  }

  return 0;
}

That's C++11 in action. I prefer this code to the Python version: it type-checks, compiles to efficient code, and gives you better control over memory (using references vs copy). You may prefer the Python version, fine, but it's not that different. Plus, as Python and C++ follow similar paradigms (except for type checking), the things that tend to be annoying to write in C++, e.g. handling abstract syntax trees, are equally painful in Python. So stop what you're doing, grab a good book, and learn modern C++.

let world = "世界" in print $ "Hello " ++ world ++ "!"