Main

Blog (Atom feed)

GCC specs: an introduction

If you've ever used the GNU Compiler Collection (GCC) then you've worked with the gcc binary. For those who don't know, gcc is a driver, not a compiler. It runs the compiler/assembler/linker as required and coordinates the input and output between them. Use of the driver is so ubiquitous that everyone calls it the compiler and tends to take the assembly and linking actions for granted.

I'm not here to be a nitpicky pedant and admonish you to use the correct terms. There's no reason to start saying "well actually…" about what gcc is and what it does. Keep calling it the compiler and don't worry about the assembler and linker unless it matters to you. The whole point of the driver is to abstract those (annoying!) details away.

But then, how does the driver make it so you don't have to bother with those details? Somehow the driver has to take all the arguments you provide, organize them, then run the appropriate subprograms using those arguments stitched together in some way to process the input and provide some output.

This article is about how the driver sets up the argument vectors for subprograms. Arguments are determined using specification strings, or just specs. Specs are a rather obscure and unintuitive language that describe how and under what conditions the driver runs each subprogram. They are partially documented in the GCC manual in the section Specifying Subprocesses and the Switches to Pass to Them, which is probably not the name you would have looked for if I said to find the documentation for specs.

Lifting the curtain

Have you ever passed -v to the driver and studied the output? There's a lot of stuff in there that is revealing. It's also a good way to understand what specs are and why they exist.

Because the output from -v is so extensive I'm going to use one example, broken up into parts. Not all of it is relevant for understanding specs, but it may be of general interest to some.

To get the output, I ran the following command on an amd64 machine running Debian 12, which is using GCC 12.2.

gcc -v t.c

The content of the source file is not important, only that the program is in a single file. (Use int main (void) {} if you have nothing sitting around.) I've reformatted the output where needed. In some cases I've omitted output and indicate it with (...).

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v (...)
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14) 
COLLECT_GCC_OPTIONS='-v' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-'

The first line says where the driver is getting its specs. It's possible to override the builtin specs but that is for a later discussion.

What follows are some environment variables that are set by gcc to communicate with the subprograms and various other information. There is also the "Configured with" line that tells you how GCC was configured, which is useful if you want to build it yourself.

In our example the driver has to compile the source using the C compiler, assemble it into an object file, then link the object into an executable. The first subprogram run is the compiler. cc1 is the C compiler, cc1plus is the C++ compiler. Note that subprogram invocations in the output are lines that start with a single space.

 /usr/lib/gcc/x86_64-linux-gnu/12/cc1 \
   -quiet \
   -v \
   -imultiarch x86_64-linux-gnu \
   t.c \
   -quiet \
   -dumpdir a- \
   -dumpbase t.c \
   -dumpbase-ext .c \
   -mtune=generic \
   -march=x86-64 \
   -version \
   -fasynchronous-unwind-tables \
   -o /tmp/ccJOdUuR.s
(...)

We only provided the arguments -v and t.c to the driver yet the compiler has many more. Some are even provided twice. If you look at the documentation for options you won't find the "dump" ones listed, for example, so the compiler doesn't use the same options as the driver. That said, some of them match. -fasynchronous-unwind-tables is a code generation option helpful when debugging if the target machine supports it. -march and -mtune control what kind of code is generated for the processor. These are options specific to the target or host. It would be tedious to specify them every time.

I've trimmed all the compiler output because it's mostly version info, although it does print the C standard it is using, some of the compiler's heuristic data, and the compiler executable checksum. It will also print the search order for headers which can be very handy when debugging header problems or just knowing where system headers reside.

As a side note, it's worth pointing out that you can run the compiler directly if you want. You can even pass it --help to see the extensive set of options it accepts. Running the compiler directly isn't something you need to do very often, even when debugging it, because gcc has a -wrapper option to do this for you.

The compiler outputs assembler source, so the next step is for the driver is to run the assembler.

 as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s
(...)

Note that the input is a temporary file, as is the output. The driver has to manage all these temporaries and coordinate them across the subprograms. If you pass -save-temps then it will change the way this is done and use more intuitive names.

Finally, we have the linker1 invocation, complete with the messiest and most complex set of arguments.

/usr/lib/gcc/x86_64-linux-gnu/12/collect2 \
   -plugin /usr/lib/gcc/x86_64-linux-gnu/12/liblto_plugin.so \
   -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper \
   -plugin-opt=-fresolution=/tmp/cc7cMZRs.res \
   -plugin-opt=-pass-through=-lgcc \
   -plugin-opt=-pass-through=-lgcc_s \
   -plugin-opt=-pass-through=-lc \
   -plugin-opt=-pass-through=-lgcc \
   -plugin-opt=-pass-through=-lgcc_s \
   --build-id --eh-frame-hdr -m elf_x86_64 \
   --hash-style=gnu --as-needed \
   -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crti.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o \
   -L/usr/lib/gcc/x86_64-linux-gnu/12 \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib \
   -L/lib/x86_64-linux-gnu \
   -L/lib/../lib \
   -L/usr/lib/x86_64-linux-gnu \
   -L/usr/lib/../lib \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../.. \
   /tmp/ccj7Fe2p.o \
   -lgcc \
   --push-state --as-needed -lgcc_s --pop-state \
   -lc -lgcc \
   --push-state --as-needed -lgcc_s --pop-state \
   /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crtn.o

What the options mean is beyond the scope of this article, but it's worth noting what arguments are needed. There are object files (the .o suffix) and libraries (-l options) and library search paths (-L options). The order of these options matters, too, and the driver has to get this right. Furthermore, which system object files to use (found in /usr/lib in this example) can be non-obvious, so we should appreciate all the work the driver does behind the scenes. Manually specifying a link is tedious.

To round out the discussion of the files that the driver has to manage, all the temporary files created during the compile/assemble/link sequence are deleted before the driver exits.

All the arguments passed to the subprograms and the management of files are done with specs.

What is a spec?

The driver determines what subprograms to run based on the arguments it is given. For example, if you give it a file name with the .c suffix it will know to run the C compiler for that file; give it a file with a .cpp suffix and it will use the C++ compiler. It recognizes many file extensions and you can specify the language it should use with the -x option. Some options tell it not to run the assembler or the linker.

The driver determines how to run a subprogram based on a spec for the language. Each language has its own spec that has the name of the subprogram and the arguments to pass to it. A spec is a string made up of zero or more lines. Each line corresponds to a program to run so it's possible for you to run more than one program for each language or "stage" being handled.

Since the arguments needed for a subprogram can vary significantly from one invocation to the next they are rarely given directly. Instead, they are provided with directives that expand to arguments based on what the user supplied. The syntax of these directives is inspired by the ones used for printf, albeit with very different semantics.

This may sound somewhat elegant but let me dissuade you of any notion of elegance right now. We are about to wade into an ad hoc mini-language where the only reliable way to get the result you want is to experiment to find out what happens, although admittedly the lack of consistent (and documented) semantics only tends to show up when you start doing the advanced stuff.

How to read a spec (the basics)

We're going to focus on the assembler invocation since it is the subprogram with the fewest arguments and is the easiest to follow. The goal is to trace through how it gets its arguments.

To do this you have to learn how to read a spec. This is accomplished by looking at the spec documentation and casting your eyes on arguably the ugliest output GCC can produce: gcc -dumpspecs.

The full output of -dumpspecs is a beast, so here's a snippet of the dumped specs that provides one of the above options. I've reformatted the output slightly to put each directive from the spec string on its own line so it's easier to read, but keep in mind that you're not going to get output this agreeable most of the time.

*asm:
%{m16|m32:--32} \
  %{m16|m32|mx32:;:--64} \
  %{mx32:--x32} \
  %{msse2avx:%{!mavx:-msse2avx}}

The *asm: line declares this is a named spec with the name asm. This happens to be the builtin spec name for options passed to the assembler. Confusingly, this is not how all the options are passed, but we'll get to that shortly.

Let's look closely at the following directive, which is how the --64 option is passed to the assembler. It's one pattern you're going to see a lot.

%{m16|m32|mx32:;:--64}

This says that if any of the arguments -m16, -m32, or -mx32 has been passed to the driver then substitute nothing, otherwise substitute in the string --64. Let's go over it in more detail.

The general form is %{S:X;:D}. This says that if the argument -S has been given then substitute X, otherwise substitute D. (Clearly, S can also be a combination of arguments separated with the pipe symbol |.) This is a conditional and is the primary way you control the way arguments are passed.

In the above case X is empty which is why it looks so confusing. The first : indicates the end of the condition. The ; indicates the end of the substitution. Since there is nothing there, nothing (well, the empty string) is substituted. The last : indicates the start of the last substitution. You can add white space to make it more readable, but it may or may not help. You can nest them and when they nest deeply (which they will), they can be very difficult to follow.

Our example at the beginning only passed the arguments -v and t.c. Since none of -m16, -m32, or -mx32 was passed, --64 is passed to the assembler.

If you look at the other directives in the asm spec you'll see other forms of conditionals.

  • %{m16|m32:--32} means that if either -m16 or -m32 was passed, substitute --32. Otherwise, substitute nothing. This is a conditional with only one clause and is very common. You could write it as %{m16|m32:--32;:} but that adds needless line noise so it's better not to.
  • %{mx32:--x32} should be clear by now: substitute --x32 if -mx32 was given.
  • %{msse2avx:%{!mavx:-msse2avx}} is interesting because it is how you do conjunctions. The other examples we've looked at were disjunctions. Conjunctions are done by nesting. This is where things can get really hairy. This one says that if -msse2avx was given and not -mavx then substitute -msse2avx. (You may think "oh, conjunctions aren't too bad". I invite you to take a look at the link_command spec. Come back and we can share a story or two.)

There are few other points to note about these directives.

  • The options in the condition are written without leading hyphens. This is a good thing because adding them would make reading a spec even harder. Internally, the driver has already decoded the options when matching happens so the hyphens are gone.
  • The substitution doesn't have to be an option, it can be any string. Usually you want it to be an option, but it can also be another directive. If it is another directive, it will be processed as one.
  • You have negation with the ! syntax. There are other "operators" but they are not common.
  • You can substitute an option directly if it was given, or nothing if it wasn't, by writing %{S} where S is the option. This is a simple short form for %{S:-S;:}.
  • The full conditional syntax allows for any number of clauses. For example, you could write %{m16|m32|mx32:;:--64} as %{m16:; m32:; mx32:; :--64}. I call this form the "maximal confusion" form because if it's used it's rarely short and always difficult to read.
  • There doesn't seem to be a way to match option patterns or option arguments. There is a way, but we haven't gotten there yet.

From spec to arguments

We've covered all this ground and so far we've only explained one argument. And it seems to get there by magic. The documentation says that the asm spec specifies the options to pass to the assembler. Clearly there are more than just the one. What gives?

What gives is that the documentation does not tell the whole story. The full spec for the assembler is actually in the source code. It's not large so I'll reproduce it here.

%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A }}}}

Here you can see the messiness of specs starting to show up. If we strip away the conditionals2 we get something more manageable.

as %(asm_debug) %(asm_options) %i %A

The %(asm_debug) and %(asm_options) directives reference named specs. They substitute in the value of that spec for further processing. If you look at the output of -dumpspecs you should find both asm_debug and asm_options. On my system they are long and mostly of no consequence so I won't completely reproduce them here, but we can go through the ones that actually produce the options. Then we can look at a few other notable ones.

asm_debug doesn't substitute anything in our running example, so let's turn our attention to asm_options.

Here's an abbreviated version of the spec. Parts that are omitted are denoted with (...). You should see something similar if you are using a recent version of GCC.

*asm_options:
(...) %{v} (...) %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}

Not all of these substitute anything, but they are notable so I have included them.

To see how we go from spec to assembler invocation, I will show the conceptual version of the full assembler spec at this point with the final assembler invocation. The conceptual version bypasses asm_debug since it substitutes nothing and substitutes in the abbreviated version of the asm_options spec.

as %{v} %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O} %i %A
as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s

We will go through each directive and show what happens.

  • %{v} should be clear from earlier discussion: substitute -v because -v was passed.
  • %a and %Y are special builtin specs. %a substitutes the asm spec. (Why have a special one and not just say %(asm)? I don't know.) This is how we get the --64 option as described in the previous section. %Y is used to substitute any assembler arguments that are passed via -Wa, which is the way you tell the driver that an argument is meant specifically for the assembler. It's how you pass through an argument without the driver validating it. We did not pass any of those so it substitutes nothing.
  • The next directive is conditional and only substitutes if -c was passed to the driver. It substitutes nothing in our example. I included it, however, because it is written directly beside the next spec. This may seem significant but it actually isn't. All directives, once substituted, have spaces around them so you can't piece together multiple arguments into one argument. Why was this not written as a conditional with an else clause? Probably as a matter of style.
  • The directive %{!c:-o %d%w%u%O} is going to give us the arguments for the output file, but it looks like random gibberish so take it one step at a time and look them up in the documentation.
    • %d marks the argument as a temporary file which will be deleted when the driver is done. It substitutes nothing.
    • %w marks the argument as the designated output file and also substitutes nothing.
    • %u%O substitutes the file name. %u generates and substitutes the temporary file name with the suffix substituted by %O.
  • After the directives that give the output we have %i which substitutes the input file. This is the assembler file generated by the compiler.
  • Finally, %A substitutes the asm_final spec. This lets you run some post processing on the assembler. It does nothing in our case, but if you're using the -gsplit-dwarf option with a recent version of GCC you might want to check it out.

That's it! That's the (mostly) complete trace of how the driver determines how to call the assembler.

Experimenting with specs, the easy way

At this point we've traced through how the assembler got its arguments from the driver in some detail—and there is plenty more that we could cover. I'm going to do that in a later article, but in the meantime I can show you the easiest way to experiment with specs that doesn't involve writing them or changing the source.

If you pass the option -### to the driver it will print all the commands it would run without actually running them. Using this "dry run" mode you can see how options are substituted without having to bother with making the subprogram actually accept them.

Earlier in the asm_options spec I omitted an interesting directive: %{I*}. This is a spec that matches any option that starts with -I and its option. In this case it will substitute all the -I options that were passed. For example, if we had passed -I dir1 and -Ipath/to/headers to the driver, both of them would be substituted in for %{I*}. You can see this by running

gcc '-###' -I dir1 -Ipath/to/headers t.c

and looking for the assembler command. The -### command is quoted to prevent the shell from possibly interpreting it as a comment or a pattern.

% gcc '-###' t.c -I dir -Ipath/to/headers 2>&1 | grep '^ as'
 as -I dir -I path/to/headers --64 -o /tmp/ccNz0vob.o /tmp/ccVtbBuj.s

Experimenting with specs, the more interesting way

If you want to explore specs a bit more and don't want to start changing the compiler, a good way is to use a custom spec that will show you what gets substituted by defining a new file type.

You can do this easily with the following spec file.

.xx:
./test.sh %i

This will register a new file extension .xx and run the compiler ./test.sh passing it the input file as the argument. All test.sh has to do is echo its arguments.

You can run the driver like this to use your custom specs and "compiler".

gcc -c -specs=custom.specs file.xx

You want to pass -c so that the driver does not try to run any other subprograms. Change the custom spec to your liking. Here's an example that will transform all options that start with -m to ones that start with -k. See the documentation for more details.

.xx:
./test.sh %{m*:-k%*}

If you pass -m32 -march=blah it will be passed to your "compiler" as -k32 -karch=blah.

Summary

The GCC driver coordinates and runs multiple subprograms. To manage how it specifies all the arguments to these programs it uses something called specs. These are strings with printf-like directives that are processed and substituted with values based on the arguments passed to the driver and various contextual information.

You can view most, but not all, of the builtin specs to the driver by running gcc -dumpspecs. To get the full spec for a subprogram you need to look in the driver source code.

Once you have these specs you can trace the logic behind an option, but it is tedious work. Specs are not known for their readability. And it's pretty uncommon for anyone to have to debug the driver.

Nevertheless, specs can help you understand what the driver is doing. It's one of those odd mysteries of a common tool that aren't really explained anywhere.

In a future article I'll show some of the more esoteric parts of spec usage, how to use specs to affect the driver itself, and the idiosyncrasies of writing spec functions.

(Thanks to Matt L. Holmes for reviewing this.)

Footnotes:

1

Strictly speaking, collect2 is a wrapper for the actual linker. Explaining collect2 is something else entirely.

2

The options in the conditionals are those that tell the driver not to run the assembler.

January 2, 2024

comment@wozniak.ca

Generated on 2024-01-02