Do Not Optimize Away
Compilers are sneaky beasts. If you time code like this:
var total: u32 = 0;
for (0..N) |i| total += i;
print("total={}", .{total});
You will discover that LLVM is as smart as a little kid named Gauss, and replaces the summation with an equivalent formula .
What’s more, if you write something more complicated like total += i + 2*i*i - i*i*i,
you’ll see that LLVM figures out a closed-form expression for that as
well (a generalization of the Gauss trick I proudly figured out in
11th grade). See for yourself:
https://godbolt.org/z/T9EcTb8zq
Usually, this kind of thing is desirable — code runs faster! Except when you are trying to benchmark your code, and instead end up benchmarking an elaborate no-op.
There are two pitfalls with benchmarking. First, in
const start = now();
_ = computation()
const elapsed = now() - start;
a reasonable compiler can notice that computation’s
result is not used, and optimize the entire computation away.
Second, in
const parameter_a = 1_000_000;
const parameter_b = 1_000;
const start = now();
_ = computation(parameter_a, parameter_b);
const elapsed = now() - start;
even if the computation is not elided as a whole, compiler can constant-fold parts of it, taking advantage of the fact that values of parameters are known at compile time.
Time To Be Killing The Dragon Again
Usually languages provide some sort of an explicit “please do not
optimize this away” function, like Rust’s hint::black_box or Zig’s mem.doNotOptimizeAway,
but they always felt like snake oil to me:
- Their semantics is tricky. It is sort of impossible to explain what exactly can and can not be optimized: the whole compilation pipeline is based on erasing everything about the original form of the code, maintaining only the semantics.
- There’s a simpler and more direct way to achieve the desired result. Just open the box and check if the cat is there!
It’s easier to explain via an example. Let’s say I am benchmarking binary search:
fn insertion_point(xs: []const u32, x: u32) usize { ... }
I would use the following benchmarking scaffold:
fn benchmark(arena: Allocator) !void {
const element_count =
try parameter("element_count", 1_000_000);
const search_count =
try parameter("search_count", 10_000);
const elements: []const u32 =
make_elements(arena, element_count);
const searches: []const u32 =
make_searches(arena, search_count);
const start = now();
var hash: u32 = 0;
for (searches) |key| {
hash +%= insertion_point(elements, key);
}
const elapsed = now().duration_since(start);
print("hash={}\n", .{hash});
print("elapsed={}\n", .{elapsed});
}
fn parameter(comptime name: []const u8, default: u64) !u64 {
const value = if (process.hasEnvVarConstant(name))
try process.parseEnvVarInt(name, u64, 10)
else
default;
print(name ++ "={}\n", .{value});
}
On the input side, the parameter function takes a
symbolic name and a default value. It looks up the value among the
environmental variables, with fallback. Because the value can be specified at runtime, compiler can’t optimize assuming
a particular constant. And you also get a convenient way to re-run
benchmark with a different set of parameters without recompiling.
On the output side, we compute an (extremely weak) “hash” of the results. For our binary search — just the sum of all the indexes. Then we print this hash together with the timing information. Because we use the results of our computation, compiler can’t optimize them away!
Similarly to the parameter function, we also get a
bonus feature for free. You know who also
loves making code faster by deleting “unnecessary”
functionality? I do! Though I am not as smart as a compiler, and
usually end up deleting code that actually is required to
get the right answer. With the hash, if I mess my optimization work
to the point of getting a wrong answer, I
immediately see that reflected in an unexpected value of
the hash.
Consider avoiding black boxes for your next benchmark. Instead, stick to natural anti-optimizing-compiler remedies:
- Make input parameters runtime overridable (with compile time defaults),
- print the result (or the hash thereof).