Zig’s Lovely Syntax
It’s a bit of a silly post, because syntax is the least interesting detail about the language, but, still, I can’t stop thinking how Zig gets this detail just right for the class of curly-braced languages, and, well, now you’ll have to think about that too.
On the first glance, Zig looks almost exactly like Rust, because Zig borrows from Rust liberally. And I think that Rust has great syntax, considering all the semantics it needs to express (see “Rust’s Ugly Syntax”). But Zig improves on that, mostly by leveraging simpler language semantics, but also through some purely syntactical tasteful decisions.
Integer Literals
How do you spell a number ninety-two? Easy, 92. But what type is that? Statically-typed languages
often come with several flavors of integers: u32, u64, u8. And there’s often a syntax for
literals of a particular types: 92u8, 92l, 92z.
Zig doesn’t have suffixes, because, in Zig, all integer literals have the same type: comptime_int:
const an_integer = 92;
assert(@TypeOf(an_integer) == comptime_int);
The value of an integer literal is known at compile time and is coerced to a specific type on
assignment
const x: i32 = 92;
or ascription:
@as(i32, 92)
To emphasize, this is not type inference, this is implicit comptime coercion. This does mean that
code like
var x = 92;
generally doesn’t work, and requires an explicit type.
String Literals
Raw or multiline strings are spelled like this:
const raw =
\\Roses are red
\\ Violets are blue,
\\Sugar is sweet
\\ And so are you.
\\
;
This syntax doesn’t require a special form for escaping \\ itself:
const still_raw =
\\const raw =
\\ \\Roses are red
\\ \\ Violets are blue,
\\ \\Sugar is sweet
\\ \\ And so are you.
\\ \\
\\;
\\
;
It nicely dodges indentation problems that plague every other language with a similar feature. And,
the best thing ever: lexically, each line is a separate token. As Zig has only line-comments, this
means that \n is always whitespace. Unlike most other languages, Zig can be correctly lexed in a
line-by-line manner.
Raw strings is perhaps the biggest improvement of Zig over Rust. Rust brute-forces the problem with
r##""## syntax, which does the required job, technically, but suffers from the mentioned
problems: indentation is messy, nesting quotes requires adjusting hashes, unclosed raw literal
breaks the following lexical structure completely, and rustfmt’s formatting of raw strings tends to
be rather ugly. On the plus side, this syntax at least cannot be expressed by a context-free grammar!
Record Literals
For the record, Zig takes C syntax (not that C would notice):
const p: Point = .{
.x = 1,
.y = 2,
}
The .{ feels weird! It will make sense by the end of the post. Here, I want only to note .x = 1
part, which matches the assignment syntax obj.x = 1. This is great! This means that grepping for
".x =" gives you all instances where a field is written to. This is hugely valuable: most of
usages are reads, but, to understand the flow of data, you only need to consider writes. Ability to
mechanically partition the entire set of usages into majority of boring reads and a few interesting
writes does wonders for code comprehension.
Prefix Types
Where Zig departs from C the most is the syntax for types. C uses a needlessly confusing spiral rule. In Zig, all types are prefix:
u32 // An integer
[3]u32 // An array of three integers
?[3]u32 // An array of three integers or null
// A pointer to...
*const ?[3]u32
While pointer type is prefix, pointer dereference is postfix, which is a more natural subject-verb
order to read: ptr.* = 92;
Identifiers
Zig has general syntax for “raw” identifiers:
@"a name which a space"
It is useful to avoid collisions with keywords, or for exporting a symbol whose name is otherwise
not a valid Zig identifier. It is a bit more to type than Kotlin’s delightful
`a name with a space`, but
manages to re-use Zig’s syntax for built-ins (@TypeOf) and strings.
Functions
Like, Rust, Zig goes for fn foo function declaration syntax. This is such a massive improvement
over C/Java style function declarations: it puts fn token (which is completely absent in
traditional C family) and function name next to each other, which means that textual search for fn
name allows you to quickly find the function. Then Zig adds a little twist. While in Rust we write
fn add(x: i32, y: i32) -> i32
Zig is
fn add(x: i32, y: i32) i32
The arrow is gone! Now that I’ve used this for some time, I find arrow very annoying to type, and adding to the visual noise. Rust needs the arrow: Rust has lambdas with an inferred return type, and, in a lambda, the return type is optional. So you need some sort of an explicit syntax to tell the parser if there is return type:
|| expression;
|| -> Type { }
And it’s understandable that lambdas and functions would want to use compatible syntax. But Zig doesn’t have lambdas, so it just makes the type mandatory. So the main is
pub fn main() void {}
Related small thing, but, as name of the type, I think I like void more than ().
Locals
Zig is using const and var for binding values to names:
const mid = lo + @divFloor(hi - lo, 2);
This is ok, a bit weird after Rust’s, whose const would be comptime in Zig, but not really
noticeable after some months. I do think this particular part is not great, because const, the
more frequent one, is longer. I think Kotlin nails it: val, var, fun. Note all three are
monosyllable, unlike const and fn! Number of syllables matters more than the number of letters!
Like Rust, Zig uses
'name' (':' Type)?
syntax for ascribing types, which is better than
Type 'name'
because optional suffixes are easier to parse visually and mechanically than optional prefixes.
Conjunction Is Control Flow
Zig doesn’t use && and || and spells the relevant operators as and and or:
while (count > 0 and ascii.isWhitespace(buffer[count - 1])) {
This is easier to type and much easier to read, but there’s also a deeper reason why they are not
sigils. Zig marks any control flow with a keyword. And, because boolean operators short-circuit,
they are control flow! Treating them as normal binary operator leads to an entirely incorrect
mental model. For bitwise operations, Zig of course uses & and |.
Explicit return
Both Zig and Rust have statements and expressions. Zig is a bit more statement oriented, and requires explicit returns:
fn add(x: i32, y: i32) i32 {
return x + y;
}
Furthermore, because there are no lambdas, scope of return is always clear.
Relatedly, the value of a block expression is void. A block is a list of statements, and doesn’t have an optional expression at the end. This removes the semicolon problem — while Rust rules around semicolons are sufficiently clear (until you get to macros), there’s some constant mental overhead to getting them right all the time. Zig is more uniform and mechanical here.
If you need a block that yields a value, Zig supports a general syntax for breaking out of a labeled block:
const header_oldest = blk: {
var oldest: ?usize = null;
for (headers.slice, 0..) |*header, i| {
switch (Headers.dvc_header_type(header)) {
.blank => assert(i > 0),
.valid => oldest = i,
}
}
break :blk &headers.slice[oldest.?];
};
If
Rust makes pedantically correct choice regarding ifs: braces are mandatory:
if cond1 {
case_a
} else {
if cond2 {
case_b
} else {
case_c
}
}
This removes the dreaded “dangling else” grammatical ambiguity. While theoretically nice, it makes
if-expression one-line feel too heavy. It’s not the braces, it’s the whitespace around them:
if (a) b else c
if a { b } else { c }
But the ternary is important! Exploding a simple choice into multi-line condition hurts readability. Zig goes with the traditional choice of making parentheses required and braces optional:
.direction = if (prng.boolean()) .ascending else .descending,
By itself, this does create a risk of goto: fail; style bugs. But in Zig formatter
(non-configurable, user-directed) is a part of the compiler, and formatting errors that can mask
bugs are caught during compilation. For example, 1 -2 is an error due to inconsistent whitespace
around the minus sign, which signals a plausible mixup of infix and binary minus. No such errors are
currently produced for incorrect indentation (the value add there is relatively little, given zig
fmt), but this is planned.
NB: because Rust requires if branches to be blocks, it is forced to make { expr } synonym with
(expr). Otherwise, the ternary if would be even more unusable! Syntax design is tricky! Whether
you need returns and whether you make () or {} mandatory in ifs are not orthogonal!
Loops
Like Python, Zig allows else on loops. Unlike Python, loops are expressions, which leads to a
nicely readable imperative searches:
pub const Word = for (.{ u8, u16, u32, u64, u128, u256 }) |W| {
if (@bitSizeOf(W) >= bitset_capacity) break W;
} else unreachable;
Zig doesn’t have syntactically-infinite loop like Rust’s loop { or Go’s for {. Normally I’d
consider that a drawback, because these loops produce different control flow, affecting reachability
analysis in the compiler, and I don’t think it’s great to make reachability dependent on condition
being visibly constant. But! As Zig places comptime semantics front and center, and the rules for
what is and isn’t a comptime constant are a backbone of every feature, “anything equivalent to
while (true)” becomes sufficiently precise. Incidentally, these days I tend to write “infinite”
loops as
for (0..safety_bound) |_| {
} else @panic("loop safety counter exceeded");
Almost always there is an up-front bound for the number of iterations until the break, and its worth asserting this bound, because debugging crashes is easier than debugging hangs.
for, while, if, switch, and catch all use the same Ruby/Rust inspired syntax for naming
captured values:
for (slice) |element| {
use(element);
}
while (iterator.next()) |element| {
use(element);
}
I like how the iterator comes first, and then the name of an item follows, logically and syntactically.
Clarity of Names
I have a very strong opinion about variable shadowing. It goes both ways: I spent hours debugging code which incorrectly tried to use a variable that was shadowed by something else, but I also spent hours debugging code that accidentally used a variable that should have been shadowed! I really don’t know whether on balance it is better to forbid or encourage shadowing!
Zig of course forbids shadowing, but what’s curious is that it’s just one episode of the large crusade against any complexity in name resolution. There’s no “prelude”, if you want to use anything from std, you need to import it:
const std = @import("std");
There are no glob imports, if you want to use an item from std, you need to import it:
const ArrayList = std.ArrayList;
Zig doesn’t have inheritance, mixins, argument-dependent lookup, extension functions, implicit or
traits, so, if you see x.foo(), that foo is guaranteed to be a boring method declared on x
type. Similarly, while Zig has powerful comptime capabilities, it
intentionally disallows
declaring methods at compile time.
Like Rust, Zig used to allow a method and a field to share a name, because it actually is syntactically clear enough at the call site which is which. But then this feature got removed from Zig.
More generally, Zig doesn’t have namespaces. There can be only one kind of foo in scope, while
Rust allows things like
struct Point { x: i32, y: i32 }
fn Point(x: i32, y: i32) -> Point { Point { x, y } }
I am astonished at the relative lack of inconvenience in Zig’s approach. Turns out that foo.bar.baz
is all the syntax you’ll ever need for accessing things? For the historically inclined, see
“The module naming situation” thread in the
rust mailing list archive
to learn the story of how rust got its std::vec syntax.
Everything Is an Expression
The lack of namespaces touches on the most notable (by its absence) feature of Zig syntax, which deeply relates to the most profound aspect of Zig’s semantics. Everything is an expression. By which I mean, there’s no separate syntactic categories of values, types, and patterns. Values, types, and patterns are of course different things. And usually in the language grammar it is syntactically obvious whether a particular text fragment refers to a type or a value:
let PATTERN: TYPE = VALUE;
So the standard way is to have separate syntax families for the three categories, which need to be
internally unambiguous, but can be ambiguous across the categories because the place in the
grammar dictates the category: when parsing let, everything until : is a pattern, stuff between
: and = is a type, and after = we have a value.
There are two problems here. First, there’s a combinatorial explosion of sorts in the syntax, because, while three categories describe different things, it turns out that they have the same general tree-ish shape.
The second problem is that it might be hard to maintain category separation in the grammar. Rust
started with the three categories separated by a bright line. But then, changes happen.
Originally, Rust only allowed
VALUE = VALUE;
syntax for assignment. But today you can also write
PATTERN = VALUE;
to do unpacking like
(a, b) = (b, a);
Similarly, the turbofish used to move the parser from the value to the type mode, but now const parameters are values that can be found in the type position!
The alternative is not to pick this fight at all. Rather than trying to keep the categories
separately in the syntax, use the same surface syntax to express all three, and categorize later,
during semantic analysis. In fact, this is already happens in the VALUE = VALUE
example — these are different things! One is a place (lvalue) and another is a “true” value
(rvalue), but we use the same syntax for both.
I don’t think such syntactic unification necessarily implies semantic unification, but Zig does treat everything uniformly, as a value with comptime and runtime behavior (for some values, runtime behavior may be missing, for others — comptime):
const E = enum { a, b };
pub fn main() void {
const e: if (true) E else void = .a;
_ = switch (e) {
(if (true) .a else .b) => .a,
(if (true) .b else .a) => .b,
};
}
The fact that you can write an if where a type goes is occasionally useful. But the fact that
simple types look like simple values syntactically consistently make the language feel significantly
less busy.
Generics
As a special case of everything being an expression, instances of generic types look like this:
ArrayList(u32)
Just a function call! Though, there’s some resistance to trickery involved to make this work.
Usually, languages rely on type inference to allow eliding generic arguments. That in turn requires
making argument syntax optional, and that in turn leads to separating generic and non-generic
arguments into separate parameter lists and some introducer sigil for generics, like ::<> or
!().
Zig solves this syntactic challenge in the most brute-force way possible. Generic parameters are
never inferred, if a function takes 3 comptime arguments and 2 runtime arguments, it will always be
called with 5 arguments syntactically. Like with the (absence of) importing flourishes, a reasonable
reaction would be “wait, does this mean that I’ll have to specify the types all the time?” And, like
with import, in practice this is a non-issue. The trick are comptime closures. Consider a generic
ArrayList:
fn ArrayListType(comptime T: type) type {
return struct {
const ArrayList = @This();
fn init(gpa: Allocator) ArrayList {}
fn deinit(list: *ArrayList, gpa: Allocator) void {}
fn push(list: *ArrayList, item: T) !void {}
};
}
fn usage(gpa: Allocator) !void {
var xs: ArrayListType(u32) = .init(gpa);
defer xs.deinit(gpa);
try xs.push(92);
}
We have to specify type T when creating an instance of an ArrayList. But subsequently, when we
are using the array list, we don’t have to specify the type parameter again, because the type of
xs variable already closes over T. This is the major truth of object-orienting programming, the
truth so profound that no one even notices it: in real code, 90% of functions are happiest as
(non-virtual) methods. And, because of that, the annotation burden in real-world Zig programs is
low.
Declaration Literals
While Zig doesn’t have Hindley-Milner constraint-based type inference, it relies heavily on one
specific way to propagate types. Let’s revisit the first comptime_int example:
const x = if (condition()) 1 else 2;
This doesn’t compile: 1 and 2 are different comptime values, we can’t select between two at
runtime because they are different. We need to coerce the constants to a specific runtime type:
const x: u32 = if (condition()) 1 else 2;
const x = @coerceTo(
u32,
if (condition()) 1 else 2,
);
But this doesn’t kick the can sufficiently far enough and essentially reproduces the if with two
incompatible branches. We need to sink coercion down the branches:
const x = if (condition())
@coerceTo(u32, 1)
else
@coerceTo(u32, 2);
And that’s exactly how Zig’s “Result Location Semantics” works. Type “inference” runs a simple
left-to-right tree-walking algorithm, which resembles interpreter’s eval. In fact, eval is
exactly what happens. Zig is not a compiler, it is an interpreter. When zig evaluates an
expression, it gets:
- expression’s type (as a Zig value),
- expression’s value (if it can be evaluated at comptime),
- code to compute expression’s value otherwise.
eval("1 + 2") =
3
eval("f() + g()") =
$1 = call 'f'
$2 = call 'g'
$3 = add $1, $2
eval("f() + 2") =
$1 = call 'f'
$2 = add $1, imm 2
When interpreting code like
obj.field = if (condition()) 1 else 2;
the interpreter passes the result location (obj.field) and type down the tree of subexpressions.
If branches store result directly into object field (there’s a store inside each branch, as
opposed to one store after the if), and each coerces its comptime constant to the appropriate
runtime type of the result.
This mechanism enables concise .variant syntax for specifying enums:
const E = enum { a, b };
fn example(e: E) u32 {
return switch (e) {
.a => 1,
(if (true) .b else .a) => 2,
};
}
When zig evaluates the switch, it first evaluates the scrutinee, and realizes that it has type
E. When evaluating switch arm, it sets result type to E for the condition, and a literal .a
gets coerced to E. The same happens for the second arm, where result type further sinks down the
if.
Result type semantics also explains the leading dot in the record literal syntax:
const p: Point = .{
.x = 1,
.y = 2,
};
Syntactically, we just want to disambiguate records from blocks. But, semantically, we want to
coerce the literal to whatever type we want to get out of this expression. In Zig, .whatever is a
shorthand for @ResultType().whatever.
I must confess that .{} did weird me out a lot at first during writing code (I don’t mind
reading the dot). It’s not the easiest thing to type! But that was fixed once I added .. snippet,
expanding to .{$0}.
The benefits to lightweight record literal syntax are huge, as they allow for some pretty nice APIs. In particular, you get named and default arguments for free:
fn exec(argv: []const []const u8, options: struct {
working_directory: ?[]const u8 = null
}) !void {
// ...
}
fn usage() !void {
try exec(&.{ "git", "status"}, .{});
try exec(&.{ "git", "status"}, .{
.working_directory = "./src",
});
}
I don’t really miss the absence of named arguments in Rust, you can always design APIs without them. But they are free in Zig, so I use them liberally. Syntax wise, we get two features (calling functions and initializing objects) for the price of one!
Built-ins
Finally, the thing that weirds out some people when they see Zig code, and makes others reconsider
their choice GitHub handles, even when they haven’t seen any Zig: @divExact syntax for
built-in functions.
Every language needs to glue “userspace” code with primitive operations supported by the compiler.
Usually, the gluing is achieved by making the standard library privileged and allowing it to define
intrinsic functions without bodies, or by adding ad-hoc operators directly to the language (like
Rust’s as). And Zig does have a fair amount of operators, like + or orelse. But the release
valve for a lot of functionality are built-in functions in distinct syntactic namespace, so Zig
separates out @bitCast, @addrSpaceCast, @alignCast, @constCast, @ptrCast, @intCast,
@floatCast, @volatileCast, @ptrFromInt, and @intFromPtr. There’s no need to overload casting
when you can give each variant a name.
There’s also @as(i32, 92) for type ascription. The types goes first, because the
mechanism here is result type semantics: @as evaluates the first argument as a type, and then uses
that as the type for the second argument. Curiously, @as I think actually can be implemented in
the userspace:
fn as(comptime T: type, value: T) T {
return value;
}
In Zig, a type of function parameter may depend on values of preceding (comptime) ones!
My favorite builtin is @import(). First, it’s the most obvious way to import code:
const foo = @import("./foo.zig")
Its crystal clear where the file comes from.
But, second, it is an instance of reverse syntax sugar. You see, import isn’t really a function. You can’t do
const name = "./foo.zig";
const foo = @import(name);
The argument of @import has to be a string, syntactically. It really is
import "./path.zig"
syntax, except that the function-call form is re-used, because it already has the right shape.
So, this is it. Just a bunch of silly syntactical decisions, which add up to a language which is positively enjoyable to read. As for big lessons, obviously, the less features your language has, the less syntax you’ll need. And less syntax is generally good, because varied syntactic constructs tend to step on each other toes. Languages are not combinations of orthogonal aspects. Features tug and pull the language in different directions and their combinations might turn to be miraculous features in their own right, or might drag the language down.
Even with a small feature-set fixed, there’s still a lot of work to pick a good concrete syntax: unambiguous to parse, useful to grep, easy to read and not too painful to write. A smart thing is of course to steal and borrow solutions from other languages, not because of familiarity, but because the ruthless natural selection tends to weed out poor ideas. But there’s a lot of inertia in languages, so there’s no need to fear innovation. If an odd-looking syntax is actually good, people will take to it.
Is there anything about Zig’s syntax I don’t like? I thought no, when starting this post. But in the process of writing it I did discover one form that annoys me. It is the while with the increment loop:
var i: u32 = 0;
while (i < 10) : (i+=1) {
print("{d}", .{i});
}
This is two-thirds of a C-style for loop (without the declarator), and it sucks for the same
reason: control flow jumps all over the place and is unrelated to the source code order. We go from
condition, to the body, to the increment. But in the source order the increment is between the
condition and the body. In Zig, this loop sucks for one additional reason: that : separating the
increment I think is the single example of control flow in Zig that is expressed by a sigil, rather
than a keyword.
This form used to be rather important, as Zig lacked a counting loop. It has
for(0..10) |i|
form now, so I am tempted to call the while-with-increment redundant.
Annoyingly,
while (condition) {
defer increment;
body
}
is almost equivalent to
while (condition) : (increment) {
body
}
But not exactly: if body contains a return, break or try, the defer version would run the
increment one extra time, which is useless and might be outright buggy. Oh well.