GCC specs: an introduction
If you've ever used the GNU Compiler Collection (GCC) then you've
worked with the gcc binary. For those who don't know, gcc is a
driver, not a compiler. It runs the compiler/assembler/linker as
required and coordinates the input and output between them. Use of
the driver is so ubiquitous that everyone calls it the compiler and
tends to take the assembly and linking actions for granted.
I'm not here to be a nitpicky pedant and admonish you to use the
correct terms. There's no reason to start saying "well actually…"
about what gcc is and what it does. Keep calling it the compiler
and don't worry about the assembler and linker unless it matters to
you. The whole point of the driver is to abstract those
(annoying!) details away.
But then, how does the driver make it so you don't have to bother with those details? Somehow the driver has to take all the arguments you provide, organize them, then run the appropriate subprograms using those arguments stitched together in some way to process the input and provide some output.
This article is about how the driver sets up the argument vectors for subprograms. Arguments are determined using specification strings, or just specs. Specs are a rather obscure and unintuitive language that describe how and under what conditions the driver runs each subprogram. They are partially documented in the GCC manual in the section Specifying Subprocesses and the Switches to Pass to Them, which is probably not the name you would have looked for if I said to find the documentation for specs.
Lifting the curtain
Have you ever passed -v to the driver and studied the output?
There's a lot of stuff in there that is revealing. It's also a good
way to understand what specs are and why they exist.
Because the output from -v is so extensive I'm going to use one
example, broken up into parts. Not all of it is relevant for
understanding specs, but it may be of general interest to some.
To get the output, I ran the following command on an amd64 machine running Debian 12, which is using GCC 12.2.
gcc -v t.c
The content of the source file is not important, only that the program
is in a single file. (Use int main (void) {} if you have nothing
sitting around.) I've reformatted the output where needed. In some
cases I've omitted output and indicate it with (...).
Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v (...) Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (Debian 12.2.0-14) COLLECT_GCC_OPTIONS='-v' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-'
The first line says where the driver is getting its specs. It's possible to override the builtin specs but that is for a later discussion.
What follows are some environment variables that are set by gcc to
communicate with the subprograms and various other information. There
is also the "Configured with" line that tells you how GCC was
configured, which is useful if you want to build it yourself.
In our example the driver has to compile the source using the C
compiler, assemble it into an object file, then link the object into
an executable. The first subprogram run is the compiler. cc1 is
the C compiler, cc1plus is the C++ compiler. Note that subprogram
invocations in the output are lines that start with a single space.
/usr/lib/gcc/x86_64-linux-gnu/12/cc1 \ -quiet \ -v \ -imultiarch x86_64-linux-gnu \ t.c \ -quiet \ -dumpdir a- \ -dumpbase t.c \ -dumpbase-ext .c \ -mtune=generic \ -march=x86-64 \ -version \ -fasynchronous-unwind-tables \ -o /tmp/ccJOdUuR.s (...)
We only provided the arguments -v and t.c to the driver yet the
compiler has many more. Some are even provided twice. If you look at
the documentation for options you won't find the "dump" ones listed,
for example, so the compiler doesn't use the same options as the
driver. That said, some of them match.
-fasynchronous-unwind-tables is a code generation option helpful
when debugging if the target machine supports it. -march and
-mtune control what kind of code is generated for the processor.
These are options specific to the target or host. It would be tedious
to specify them every time.
I've trimmed all the compiler output because it's mostly version info, although it does print the C standard it is using, some of the compiler's heuristic data, and the compiler executable checksum. It will also print the search order for headers which can be very handy when debugging header problems or just knowing where system headers reside.
As a side note, it's worth pointing out that you can run the compiler
directly if you want. You can even pass it --help to see the
extensive set of options it accepts. Running the compiler directly
isn't something you need to do very often, even when debugging it,
because gcc has a -wrapper option to do this for you.
The compiler outputs assembler source, so the next step is for the driver is to run the assembler.
as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s (...)
Note that the input is a temporary file, as is the output. The driver
has to manage all these temporaries and coordinate them across the
subprograms. If you pass -save-temps then it will change the way
this is done and use more intuitive names.
Finally, we have the linker1 invocation, complete with the messiest and most complex set of arguments.
/usr/lib/gcc/x86_64-linux-gnu/12/collect2 \ -plugin /usr/lib/gcc/x86_64-linux-gnu/12/liblto_plugin.so \ -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper \ -plugin-opt=-fresolution=/tmp/cc7cMZRs.res \ -plugin-opt=-pass-through=-lgcc \ -plugin-opt=-pass-through=-lgcc_s \ -plugin-opt=-pass-through=-lc \ -plugin-opt=-pass-through=-lgcc \ -plugin-opt=-pass-through=-lgcc_s \ --build-id --eh-frame-hdr -m elf_x86_64 \ --hash-style=gnu --as-needed \ -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie \ /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o \ /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crti.o \ /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o \ -L/usr/lib/gcc/x86_64-linux-gnu/12 \ -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu \ -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib \ -L/lib/x86_64-linux-gnu \ -L/lib/../lib \ -L/usr/lib/x86_64-linux-gnu \ -L/usr/lib/../lib \ -L/usr/lib/gcc/x86_64-linux-gnu/12/../../.. \ /tmp/ccj7Fe2p.o \ -lgcc \ --push-state --as-needed -lgcc_s --pop-state \ -lc -lgcc \ --push-state --as-needed -lgcc_s --pop-state \ /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o \ /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crtn.o
What the options mean is beyond the scope of this article, but it's
worth noting what arguments are needed. There are object files (the
.o suffix) and libraries (-l options) and library search paths
(-L options). The order of these options matters, too, and the
driver has to get this right. Furthermore, which system object files
to use (found in /usr/lib in this example) can be non-obvious, so we
should appreciate all the work the driver does behind the scenes.
Manually specifying a link is tedious.
To round out the discussion of the files that the driver has to manage, all the temporary files created during the compile/assemble/link sequence are deleted before the driver exits.
All the arguments passed to the subprograms and the management of files are done with specs.
What is a spec?
The driver determines what subprograms to run based on the arguments
it is given. For example, if you give it a file name with the .c
suffix it will know to run the C compiler for that file; give it a
file with a .cpp suffix and it will use the C++ compiler. It
recognizes many file extensions and you can specify the language it
should use with the -x option. Some options tell it not to run the
assembler or the linker.
The driver determines how to run a subprogram based on a spec for the language. Each language has its own spec that has the name of the subprogram and the arguments to pass to it. A spec is a string made up of zero or more lines. Each line corresponds to a program to run so it's possible for you to run more than one program for each language or "stage" being handled.
Since the arguments needed for a subprogram can vary significantly
from one invocation to the next they are rarely given directly.
Instead, they are provided with directives that expand to arguments
based on what the user supplied. The syntax of these directives is
inspired by the ones used for printf, albeit with very different
semantics.
This may sound somewhat elegant but let me dissuade you of any notion of elegance right now. We are about to wade into an ad hoc mini-language where the only reliable way to get the result you want is to experiment to find out what happens, although admittedly the lack of consistent (and documented) semantics only tends to show up when you start doing the advanced stuff.
How to read a spec (the basics)
We're going to focus on the assembler invocation since it is the subprogram with the fewest arguments and is the easiest to follow. The goal is to trace through how it gets its arguments.
To do this you have to learn how to read a spec. This is accomplished
by looking at the spec documentation and casting your eyes on arguably
the ugliest output GCC can produce: gcc -dumpspecs.
The full output of -dumpspecs is a beast, so here's a snippet of the
dumped specs that provides one of the above options. I've reformatted
the output slightly to put each directive from the spec string on its
own line so it's easier to read, but keep in mind that you're not
going to get output this agreeable most of the time.
*asm:
%{m16|m32:--32} \
%{m16|m32|mx32:;:--64} \
%{mx32:--x32} \
%{msse2avx:%{!mavx:-msse2avx}}
The *asm: line declares this is a named spec with the name asm.
This happens to be the builtin spec name for options passed to the
assembler. Confusingly, this is not how all the options are passed,
but we'll get to that shortly.
Let's look closely at the following directive, which is how the --64
option is passed to the assembler. It's one pattern you're going to
see a lot.
%{m16|m32|mx32:;:--64}
This says that if any of the arguments -m16, -m32, or -mx32 has
been passed to the driver then substitute nothing, otherwise
substitute in the string --64. Let's go over it in more detail.
The general form is %{S:X;:D}. This says that if the argument -S
has been given then substitute X, otherwise substitute D.
(Clearly, S can also be a combination of arguments separated with
the pipe symbol |.) This is a conditional and is the primary way
you control the way arguments are passed.
In the above case X is empty which is why it looks so confusing.
The first : indicates the end of the condition. The ; indicates
the end of the substitution. Since there is nothing there, nothing
(well, the empty string) is substituted. The last : indicates the
start of the last substitution. You can add white space to make it
more readable, but it may or may not help. You can nest them and when
they nest deeply (which they will), they can be very difficult to
follow.
Our example at the beginning only passed the arguments -v and
t.c. Since none of -m16, -m32, or -mx32 was passed, --64
is passed to the assembler.
If you look at the other directives in the asm spec you'll see other
forms of conditionals.
-
%{m16|m32:--32}means that if either-m16or-m32was passed, substitute--32. Otherwise, substitute nothing. This is a conditional with only one clause and is very common. You could write it as%{m16|m32:--32;:}but that adds needless line noise so it's better not to. -
%{mx32:--x32}should be clear by now: substitute--x32if-mx32was given. -
%{msse2avx:%{!mavx:-msse2avx}}is interesting because it is how you do conjunctions. The other examples we've looked at were disjunctions. Conjunctions are done by nesting. This is where things can get really hairy. This one says that if-msse2avxwas given and not-mavxthen substitute-msse2avx. (You may think "oh, conjunctions aren't too bad". I invite you to take a look at thelink_commandspec. Come back and we can share a story or two.)
There are few other points to note about these directives.
- The options in the condition are written without leading hyphens. This is a good thing because adding them would make reading a spec even harder. Internally, the driver has already decoded the options when matching happens so the hyphens are gone.
- The substitution doesn't have to be an option, it can be any string. Usually you want it to be an option, but it can also be another directive. If it is another directive, it will be processed as one.
- You have negation with the
!syntax. There are other "operators" but they are not common. - You can substitute an option directly if it was given, or nothing if
it wasn't, by writing
%{S}whereSis the option. This is a simple short form for%{S:-S;:}. - The full conditional syntax allows for any number of clauses. For
example, you could write
%{m16|m32|mx32:;:--64}as%{m16:; m32:; mx32:; :--64}. I call this form the "maximal confusion" form because if it's used it's rarely short and always difficult to read. - There doesn't seem to be a way to match option patterns or option arguments. There is a way, but we haven't gotten there yet.
From spec to arguments
We've covered all this ground and so far we've only explained one
argument. And it seems to get there by magic. The documentation says
that the asm spec specifies the options to pass to the assembler.
Clearly there are more than just the one. What gives?
What gives is that the documentation does not tell the whole story. The full spec for the assembler is actually in the source code. It's not large so I'll reproduce it here.
%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A }}}}
Here you can see the messiness of specs starting to show up. If we strip away the conditionals2 we get something more manageable.
as %(asm_debug) %(asm_options) %i %A
The %(asm_debug) and %(asm_options) directives reference named
specs. They substitute in the value of that spec for further
processing. If you look at the output of -dumpspecs you should find
both asm_debug and asm_options. On my system they are long and
mostly of no consequence so I won't completely reproduce them here,
but we can go through the ones that actually produce the options.
Then we can look at a few other notable ones.
asm_debug doesn't substitute anything in our running example, so
let's turn our attention to asm_options.
Here's an abbreviated version of the spec. Parts that are omitted are
denoted with (...). You should see something similar if you are
using a recent version of GCC.
*asm_options:
(...) %{v} (...) %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}
Not all of these substitute anything, but they are notable so I have included them.
To see how we go from spec to assembler invocation, I will show the
conceptual version of the full assembler spec at this point with the
final assembler invocation. The conceptual version bypasses
asm_debug since it substitutes nothing and substitutes in the
abbreviated version of the asm_options spec.
as %{v} %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O} %i %A
as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s
We will go through each directive and show what happens.
-
%{v}should be clear from earlier discussion: substitute-vbecause-vwas passed. -
%aand%Yare special builtin specs.%asubstitutes theasmspec. (Why have a special one and not just say%(asm)? I don't know.) This is how we get the--64option as described in the previous section.%Yis used to substitute any assembler arguments that are passed via-Wa, which is the way you tell the driver that an argument is meant specifically for the assembler. It's how you pass through an argument without the driver validating it. We did not pass any of those so it substitutes nothing. - The next directive is conditional and only substitutes if
-cwas passed to the driver. It substitutes nothing in our example. I included it, however, because it is written directly beside the next spec. This may seem significant but it actually isn't. All directives, once substituted, have spaces around them so you can't piece together multiple arguments into one argument. Why was this not written as a conditional with an else clause? Probably as a matter of style. - The directive
%{!c:-o %d%w%u%O}is going to give us the arguments for the output file, but it looks like random gibberish so take it one step at a time and look them up in the documentation.-
%dmarks the argument as a temporary file which will be deleted when the driver is done. It substitutes nothing. -
%wmarks the argument as the designated output file and also substitutes nothing. -
%u%Osubstitutes the file name.%ugenerates and substitutes the temporary file name with the suffix substituted by%O.
-
- After the directives that give the output we have
%iwhich substitutes the input file. This is the assembler file generated by the compiler. - Finally,
%Asubstitutes theasm_finalspec. This lets you run some post processing on the assembler. It does nothing in our case, but if you're using the-gsplit-dwarfoption with a recent version of GCC you might want to check it out.
That's it! That's the (mostly) complete trace of how the driver determines how to call the assembler.
Experimenting with specs, the easy way
At this point we've traced through how the assembler got its arguments from the driver in some detail—and there is plenty more that we could cover. I'm going to do that in a later article, but in the meantime I can show you the easiest way to experiment with specs that doesn't involve writing them or changing the source.
If you pass the option -### to the driver it will print all the
commands it would run without actually running them. Using this "dry
run" mode you can see how options are substituted without having to
bother with making the subprogram actually accept them.
Earlier in the asm_options spec I omitted an interesting directive:
%{I*}. This is a spec that matches any option that starts with -I
and its option. In this case it will substitute all the -I
options that were passed. For example, if we had passed -I dir1 and
-Ipath/to/headers to the driver, both of them would be substituted
in for %{I*}. You can see this by running
gcc '-###' -I dir1 -Ipath/to/headers t.c
and looking for the assembler command. The -### command is quoted
to prevent the shell from possibly interpreting it as a comment or a
pattern.
% gcc '-###' t.c -I dir -Ipath/to/headers 2>&1 | grep '^ as' as -I dir -I path/to/headers --64 -o /tmp/ccNz0vob.o /tmp/ccVtbBuj.s
Experimenting with specs, the more interesting way
If you want to explore specs a bit more and don't want to start changing the compiler, a good way is to use a custom spec that will show you what gets substituted by defining a new file type.
You can do this easily with the following spec file.
.xx: ./test.sh %i
This will register a new file extension .xx and run the compiler
./test.sh passing it the input file as the argument. All test.sh
has to do is echo its arguments.
You can run the driver like this to use your custom specs and "compiler".
gcc -c -specs=custom.specs file.xx
You want to pass -c so that the driver does not try to run any other
subprograms. Change the custom spec to your liking. Here's an
example that will transform all options that start with -m to ones
that start with -k. See the documentation for more details.
.xx:
./test.sh %{m*:-k%*}
If you pass -m32 -march=blah it will be passed to your "compiler" as
-k32 -karch=blah.
Summary
The GCC driver coordinates and runs multiple subprograms. To manage
how it specifies all the arguments to these programs it uses something
called specs. These are strings with printf-like directives that
are processed and substituted with values based on the arguments
passed to the driver and various contextual information.
You can view most, but not all, of the builtin specs to the driver by
running gcc -dumpspecs. To get the full spec for a subprogram you
need to look in the driver source code.
Once you have these specs you can trace the logic behind an option, but it is tedious work. Specs are not known for their readability. And it's pretty uncommon for anyone to have to debug the driver.
Nevertheless, specs can help you understand what the driver is doing. It's one of those odd mysteries of a common tool that aren't really explained anywhere.
In a future article I'll show some of the more esoteric parts of spec usage, how to use specs to affect the driver itself, and the idiosyncrasies of writing spec functions.
(Thanks to Matt L. Holmes for reviewing this.)