Avatar

Poking holes in C++

12 Jan 2020

Most people using modern C++ recommend using references instead of pointers. The compiler always validates references, so you won’t have to ensure it’s still pointing to something valid manually. At least, that’s what they “should” do.

Having a little experience with C++, it always seemed like the compiler is very limited in how much it can track across files/functions/scopes/etc.

So would it be possible to trick the compiler to let me use an invalid reference?

The Promise

References always point to valid values

Starting off it with something simple that the compiler should be easily able to detect

#include <iostream>
#include <string>

using namespace std;

struct User {
    string& name;
};

int main() {
    string name = "";
    User user = User{name};
    cout << "First user: " << user.name << endl;

    User other_user = User{"Unnamed"};
    cout << "Second user: " << other_user.name << endl;

    return 0;
}

Trying to compile this with g++ code.cpp , it seems like the compiler detects that the first reference is valid, but also that the second one is invalid.

Looking good so far

A First Attempt

Let’s try to add a bit of indirection so see if the compiler can keep track of the reference across function boundaries

#include <iostream>
#include <string>

using namespace std;

struct User {
    string& name;
};

string& get_name(User& user, string& default_name) {
  if (user.name.empty()) {
    return default_name;
  }
  return user.name;
}

int main() {
    string name = "";
    User user = User{name};
    cout << "First user: " << user.name << endl;

    User other_user = User{get_name(user, "Unnamed")};
    cout << "Second user: " << other_user.name << endl;

    return 0;
}

Compiling this again with g++ code.cpp , it looks like it can detect this one as well!

But how far can it keep track of this? What if we add another level of indirection

Another Attempt

Now let’s add a function that creates the value, but uses a different function to capture the reference

#include <iostream>
#include <string>

using namespace std;

struct User {
    string& name;
};

string& get_name(User& user, string& default_name) {
  if (user.name.empty()) {
    return default_name;
  }
  return user.name;
}

User copy_with_defaults(User& user) {
  string default_name = "Unnamed";
  return User{get_name(user, default_name)};
}

int main() {
    string name = "";
    User user = User{name};
    cout << "First user: " << user.name << endl;

    User other_user = copy_with_defaults(user);
    cout << "Second user: " << other_user.name << endl;

    return 0;
}

Trying to compile this again with g++ code.cpp we get…no errors!

It looks like we’ve managed to successfully fool the compiler into letting us use an invalid reference.

Let’s run the program, which should crash since it tries to access an invalid reference, ./code

Aaaand, it runs successfully! Without any issue.

That’s really strange. It tries to use an invalid reference, but it doesn’t crash! What’s going on?

Poking the hole

After doing a bit of digging and debugging on godbolt.org, I was still unable to find exactly why it wasn’t crashing. But looking at the code, it looks like it’s doing a bit of setup on the stack, and it’s not clearing the stack immediately. Which means that it’s probably veering into “undefined behavior” territory, but since it’s undefined, it doesn’t necessarily have to crash.

As one more go at it, I tried to compile the same code with g++ -O2 code.cpp. Re-running it this time and…it crashed!

Which confirms that it is trying to access an invalid reference. Without any optimizations, it was still undefined behavior, but it the reference was likely still accessible, so it wasn’t crashing. The behavior is exactly like that of a dangling pointer!

Safe C++

Although I managed to get the program to crash, for most cases references are still a much safer alternative.

Unfortunately, due to the nature of C++, I don’t think it’s possible to make them completely safe. But for the majority of cases, they provide adequate checks by the compiler. That’s good enough for me, and I assume that’s good enough for most people using them right now.

Now off to try some Rust, and see if I can poke holes there as well.