Thoughts On C++

Austin Hicks

2014-06-22 11:45

I've been refactoring Libaudioverse for the last little while. Specifically, I've been making it C++11. The last time I looked at it, C++11 was this theoretically nice thing as far as I was concerned: if you were anywhere but windows, you could use it no problem. Otherwise, hello Boost and really long compile times. VC++ 2013 changed that. The refactor to Libaudioverse shall shortly be completed (I've got maybe 4 more hours of work). C++11 is solely responsible for a lot of headache-saving, both now and in future. C++11 has changed my opinion of the language. Here's what got me to go from "this is revolting" to "this isn't really my favorite, but it gets the job done when speed really does count". I spent a great deal of time looking at alternatives for projects that need native code. In so far as I can tell, C++ has a huge amount of libraries and is the only option that runs on everything.

First-Class Functions

The first feature that changed my mind is first-class functions. For those who don't know, a language with first-class functions lets you treat them like values: you can assign them to variables and you can return them from other functions. Until now, C++ had function pointers only. Function pointers are barely, barely first-class functions. As I see it, there were two problems with them. The first is that no extra information is captured: you can't easily bind parameters or do higher order stuff. The second is that you can't use anything "function-like", only functions themselves. In order to partially overcome this, C++ introduced pointer-to-members, an archaic not well supported feature of the language that almost no one uses. Somehow or other, Boost pulled a rabbit out of its hat and managed to write libraries to help with this, but the cost is long compile times and large executables (and those libraries had limits, anyway).

C++11 fixes the first problem with lambdas. Think of a lambda as a function literal. You can assign lambdas to variables. Most excitingly, lambdas have "capture specifiers". If your lambda wants, it can capture variables from the enclosing scope, including the this pointer. It is up to you whether or not these are by copy or by reference. If you do it by copy, the lammbda can safely outlive the lifetime of the enclosing scope--that is, you can safely return it from a function and expect it not to explode. By reference is more dangerous. If the captured variables go away before the last time the lambda gets called, the program will probably crash. But capture by reference lets you do very nice things. The new Libaudioverse graph planner uses them heavily to accumulate a list of objects that need processing: there is a function that can call a lambda on everything that needs processing in the right order, so we capture a vector by reference and append to it. If you need references that outlive the lifetime of the enclosing scope, then know that you can capture smart pointers by copy. Also, lambdas that don't capture anything are able to convert to function pointers for interfacing with older C++ or C code.

Here's an illustrative example from Libaudioverse. process_order is a class member, and [this] is a capture specifier that says to capture the this pointer:

    //okay, we're not paused.  Visit all objects in the order that they would be processed, and record it.
process_order.clear();
visitAllObjectsInProcessOrder([this] (LavObject* o) {
    process_order.push_back(o);
});

And visitAllObjectsInProcessOrder is declared as:

void LavDevice::visitAllObjectsInProcessOrder(std::function<void(LavObject*)> visitor) {
//implementation...
}

Behind the scenes, lambdas are classes that your compiler generates. The members of the class are the captured variables and the lambda is called through an overload of operator(). Each lambda is technically a different type. That's where std::function comes in. This fixes the second problem: why should we care about what type something is, so long as it "behaves like a function"? Specifically:

std::function<void(void)> x = foo;
x();

Will call foo, so long as foo takes no parameters and returns no value. Foo can be anything with operator() of the right prototype. This includes function pointers, lambdas, and classes you write that overload operator(). You can write functions that take and return std::function of various prototypes without being concerned what flavour of function it is, store heterogeneous function-like things in sets and vectors, and generally do all sorts of fun stuff with them like event handling.

Why is this important? Well, the two tools above give one literaly all the required pieces to write something very like Twisted. Deferreds, the whole 9 yards. With little effort, one could get as far as having deferreds that work on void(void) functions. With variatic templates, the new piece of the language that makes std::function and most of the other improvements to the standard library possible, you can probably go all the way and almost be Pythonic. I'm not going to even begin to pretend to understand variatic templates and suggest that they be avoided: they're a tool for library developers, not for everyday programming tasks. Regardless, one thing that can be done (and which I am going to be doing shortly for Libaudioverse) with just std::function and lambdas is multithreaded worker queues-give it std::functions, and it will execute them on a background thread "at some point in the future".

Smart Pointers

Lambdas were icing on the cake when I moved Libaudioverse over to C++. The reason I did it in the first place was solely for smart pointers. Here are two facts about C++ that you may or may not know:

You can overload the unary operator*.
You can overload the arrow operator ->.

That means you can make a type that "acts like a pointer", including sharing the same syntax.

Suppose that your constructor takes a pointer, creates a record somewhere that says "this pointer exists and there is one thing owning it", and then saves that pointer so that the overloaded operators can return what it points to. Suppose that there is a copy constructor that takes some other copy of this mythical object, finds its pointer record, copies off the pointer, and increments the ownership count. Finally, suppose that the destructor decrements the ownership count of this shared record and, if it goes to 0, deletes the pointer and the record that talks about the pointer. If you do these three things, you have implemented reference counting, the strategy used by CPython for most of its garbage collection.

This is now wrapped up in the standard library as a template: std::shared_ptr. A shared_ptr takes ownership of a raw pointer and manages it for you. You can make multiple shared_ptrs that point at the same pointer and the managed pointer will remain allocated until the last one goes away. Combine this with std::weak_ptr for breaking cycles or implementing caches, and you come very close to something that is garbage collection.

It's fairly simple. Boost implemented it before it was adopted into C++11, and most C++ programmers could do something similar for single-threaded applications in an hour or two. It's hard to get it fast and threadsafe, however. And now we don't have to-it's in the standard library.

Let's digress onto Libaudioverse for a moment. The current state of Libaudioverse's memory management is best summarized as the titanic after it hit the iceberg. In C, you have to manually call malloc and free all over the place, including when errors happen. My solution to this was a bunch of "magic" macros. For simple projects, the magic macros would have come close to doing it. But Libaudioverse's heart is the ability to connect bunches of objects in arbetrary ways. Freeing things becomes a moderately complex algorithm: determine all the places where this object connects and either break them or not. More damningly, it may sometimes not be possible to free an object. If I was going to only allow Libaudioverse to work from languages without garbage collecters, I could move this burden onto the programmer. But I want to use it from Python, among others. This means that not only can't I move this responsibility onto the programmer using my library, I don't even get to know exactly when the garbage collecter is going to get around to freeing the object. For this reason, the public freeing function may never fail, so long as it's passed a valid handle in the first place.

But suppose I hide these objects behind smart pointers and represent all the connections between them with smart pointers. When an object gets deleted, all I do is remove the user-facing smart pointer that I keep around to prevent the object from dying if it's completely disconnected. When the object is no longer used, it will automatically be freed. It gets even better than that. If I use smart pointers for everything else, I almost don't have to write destructors-huge arrays and data sets just go away as they need to. If something throws an exception, all the things I've allocated and wrapped up in smart pointers clean themselves up appropriately.

It is technically possible to do this in C, but there has to be a line drawn somewhere. I drew it at what would have probably been 200 lines of pure preprocessor macros. This trick can only work well in C++: you need destructors that are called with help from the compiler. For a nice interface, you also need templates. And as usual, you can shoot yourself in the foot with smart pointers. If you don't follow the smart pointer guidelines (basically, once you've created a smart pointer you should throw out the raw pointer you made it from), you will have crashes as things try to be freed twice. Even so, it's the lesser of many evils, especially since something like the Boehm collector stops the world. Libaudioverse is a real-time, multithreaded audio library. The world may not stop. If I was in a position where the world could stop, I'd use Python and save myself so much trouble.

The New Containers

C++11 adds three new things to the standard library. std::unordered_map and std::unordered_set are hash tables, which until now were nonstandard. std::tuple is exactly what it sounds like. Coupled with std::tie plus a heavy dose of magic you don't have to understand or work with, it even has convenient syntax for tuple unpacking.

Combine these with the new range-based for loops, the algorithms already built into the standard library, and lambdas. Out comes what is basically python's built-in container support, all be it with more verbose syntax. The only thing you can't do is slicing (I am discounting valarrays because no one seems to actually use them and they're limited to numbers).

I don't think this requires further explanation-you've got hash tables, a convenient way to work with multiple return values, and a convenient way to use all those fancy algorithms. Nothing more need be said.

Conclusion

I'm not talking about the bad parts of C++, but would like to remind everyone that they exist. There are probably ten ways to shoot yourself in the foot per line. There are huge oceans of "undefined behavior". There are architecture-specific issues hiding at the bottom of your integers and floats, and let's not forget about the subtle problems that using namespace std; can sometimes introduce.

The contrast between C++11 and C++ as it used to stand is astonishing. It doesn't look like C++11 adds much when you just read a feature list. When I wrote Camlorn_audio, I had access to some of these features through Visual Studio 2012, but not all of them-consequently, I used Boost a lot, suffered long compile times, and all that. C++11 wasn't common on other platforms, and most certainly not on phones. I avoided it.

Then I started refactoring Libaudioverse. Most of it was routine, vanilla pre-C++11 C++. It isn't now. I found out about it as I went and made good use of the above stuff plus a bunch of small things (auto, which figures out variable types for you, is amazing). Then I rewrote the graph processing code which figures out what order to process all the objects in. I cut its size in half and made it twice as powerful. It superficially resembles what I used to think of as C++. But after that, I realized something: C++ was rewritten when I wasn't looking.

These new features seem small taken individually. But consider. How long until we have a tutorial that doesn't teach new and delete, instead preferring to teach everything using shared_ptr and make_shared? Such a tutorial would give new C++ programmers a quick path to few memory leaks. What about one that stops teaching about begin() and end() until the very end? We've got range-based for loops after all and they're simple to use. We can even drop the function pointer now and just teach using std::function, which has all the functionality plus a bunch more. It's not C++11, it's more like C++ 2.0. And, if nothing else, it completely changes how I think about the language.

I wish it had a few more features and that we could get rid of the huge bodies of legacy code, but that won't happen. The alternatives to C++ work and work well, but you have to give up the ability to talk to half the programming libraries that exist in native code land. C++ is good at what I feel it's meant for: extremely fast, cross-platform code that does not need to do things like GUIs. There are better languages for prototyping algorithms, building web sites, etc. Do not use C++ for these. Use C++when you need to do 50 to 100 128-point convolutions in real-time, implement fast network code with hefty algorithms behind it and nothing else (i.e. forwarding or queues that manage workers-but not necessarily the worker itself), or be able to use the library from 50 programming languages.

I used to love C++ because it was all I knew, but then I came to hate it because I learned Python and realized that higher level languages are simply much faster in terms of producing good code. But C++11 gives me enough of those pieces back that I can at least give it its place in my toolbox again.