<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Curried Lambda</title>
    <link>https://wozniak.ca/blog</link>
    <atom:link href="https://wozniak.ca/blog/rss.xml" rel="self" type="application/rss+xml"/>
    <description>Geoff Wozniak's blog</description>
    <language>en-CA</language>
    <webMaster>comment@wozniak.ca (Geoff Wozniak)</webMaster>
    <item>
      <title>GCC specs for those deeply interested</title>
      <link>https://wozniak.ca//blog/2024/01/09/1/index.html</link>
      <guid>https://wozniak.ca//blog/2024/01/09/1/index.html</guid>
      <pubDate>Tue, 09 Jan 2024 00:47:00 -0500</pubDate>
      <description><![CDATA[<div id="content" class="content">

<p>
In the <a href="https://wozniak.ca/blog/2024/01/02/1/index.html">previous article</a> on GCC specs I covered the basics.  In this
article I'm going to get into advanced usage and clarify some
statements from the previous one that were (intentionally) misleading.
</p>

<p>
There will be links to the driver source here as it is helps in
understanding what is going on.  It will be helpful to at least glance
at it.  It is also a good idea to review the parts about experimenting
with specs in the previous article because I'm not going to
demonstrate as many behaviours and you may want to try things as you
go.
</p>

<p>
For convenience, here's a link to the <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Spec-Files.html">official documentation</a> on specs.
It's good to have that handy.
</p>

<div id="outline-container-orge3c9a3f" class="outline-2">
<h2 id="orge3c9a3f">Where the driver finds specs</h2>
<div class="outline-text-2" id="text-orge3c9a3f">
<p>
To understand how specs are found you have to look at the driver
source code.  Specifically the method <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l8311"><code>driver::set_up_specs</code></a>.
</p>

<p>
Near the beginning of the method you'll see that it looks for a file
called <code>specs</code> in the startfile prefixes.  You can find the startfile
prefixes by running <code>gcc -print-search-dirs</code> and looking at the
"libraries" entry.  See the code in <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l8643"><code>driver::maybe_print_and_exit</code></a> for a
reference.
</p>

<p>
The output of <code>-print-search-dirs</code> can be a hard to read (this is a
theme of GCC output) so you can use the following to simplify it,
preserving the search order.
</p>

<pre class="example" id="org746340b">
gcc -print-search-dirs \
  | sed -n -e '/^libraries: =/ {s/.*: =//; s/:/\n/g; p}' \
  | xargs realpath -q \
  | awk '{if (!seen[$0]) print; seen[$0] = 1;}'
</pre>

<p>
Once you have this list you might think that you can put a file named
<code>specs</code> in one of the directories and it will override the built-in
specs.
</p>

<p>
Is it really that simple?  Of course not.
</p>

<p>
First of all, the list of startfile prefixes at the time this spec
file is read is not the same as the one printed via
<code>-print-search-dirs</code>.  The directory that probably will work is going
to be the same one that shows up in the "install" entry.  You'll know
you've found it when the initial output from <code>-v</code> or <code>-###</code> starts
with "Reading specs from" instead of "Using built-in specs".
</p>

<p>
And more importantly, this file won't actually override the built-in
specs!
</p>

<p>
When a spec file is read the driver eventually calls <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l2044"><code>set_spec</code></a>.  In
that function if the global array of specs is not initialized it will
initialize it to the set of static built-in ones.  So you can't
actually override the built-in specs unless you override each entry.
</p>

<p>
This is a lot of work to do manually, so if you really want to change
them all it's best to save the built-in ones and alter them.
</p>

<p>
You can always pass specs using the <code>--specs</code> option (these are "user
specs"), but you may want to use the specially placed spec files if
you have some site-wide change you want to make, or you need to set
specs early and affect spec processing that happens before user specs
are processed.
</p>

<p>
Any argument to the <code>--specs</code> option will be searched for in the
library directories if the name of the file is not an absolute path.
This means you can ship spec files with the compiler and use them by
name instead of using full pathnames.  If you have the Arm GCC cross
compiler installed you probably use this feature when linking
bare-metal applications.  In that case you are likely passing
<code>--specs=nosys.specs</code> to link with a dummy syscall library.  So this
mechanism does get used in actual compiler distributions.
</p>

<p>
It's also worth noting that you can't use <code>-dumpspecs</code> to see the
specs that you've set.  When dumping specs the driver always uses the
built-in ones and exits before any spec files are read.  If you want
to view the specs that get set by any files you provide you have to
build the driver with <code>-DDEBUG_SPECS</code> added to CXXFLAGS and use <code>-v</code>
when running.
</p>
</div>
</div>

<div id="outline-container-orgba21c8e" class="outline-2">
<h2 id="orgba21c8e">Augmenting specs</h2>
<div class="outline-text-2" id="text-orgba21c8e">
<p>
Wholesale replacement of the built-in specs is generally not a good
idea.  What you'll likely want to do is alter an existing spec by
adding to the beginning or the end.  There are standard ways to do
this.
</p>

<p>
To add to the beginning you have to rename a spec.  There is an
example in the documentation.  Here is another that will add
<code>--special-option</code> before any other assembler arguments in the <code>asm</code>
spec.
</p>

<pre class="example" id="org9d09669">
%rename asm old_asm

*asm:
--special-option %(old_asm)
</pre>

<p>
To add to the end of a spec, use the plus sign (<code>+</code>) followed by a
space.  The documentation commits a sin of omission here because it
says that the spec only has to start with the <code>+</code> symbol for it to be
appended.
</p>

<p>
The example below would enable stack protector for every C file you
compile.
</p>

<pre class="example" id="org23fc4ea">
*cc1:
+ -fstack-protector
</pre>

<p>
Altering the full contents of a spec string without copying it to a
file and modifying it requires messing around with spec functions and
changing the source code.  It's not something you'll want to do unless
you have some really specific requirements.  (And if you think you
really need to do this, I recommend you keep thinking.)
</p>

<p>
Enterprising readers may have noticed that you can override the specs
for a suffix, which means you can change the spec used for file types.
If you want to do this effectively you need to study the default file
type specs used by the driver.  Those are found in the array
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l1412"><code>default_compilers</code></a>.
</p>

<p>
To get your imagination going, here's how you can change the assembler
to run a different assembler.  The conditionals that would prevent the
assembler from running have been omitted for clarity.
</p>

<pre class="example" id="org47c8ab8">
.s:
custom-as %(asm_debug) %(asm_options) %i %A 
</pre>

<p>
If you do this you may notice that compiling a <code>.c</code> file does not
result in your custom assembler being used.  This is because spec
processing doesn't actually go in "compiler-assembler-linker" order;
it goes in "compiler-linker" order.  It is up to the spec for the
language to invoke a series of commands to create an object file for
the linker.  This is what the driver considers to be a "compiler".
The assembler spec we have defined above is the compiler for files
given to the driver that end with <code>.s</code>.
</p>

<p>
If you look at the spec for the <code>.c</code> files in the <a href="https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l1438">driver source</a> it
will be associated with the <code>@c</code> entry which is the actual spec used.
Reading it will eventually reveal it uses the <code>invoke_asm</code> spec to
call the assembler.  Doing a full override of specs is tough: there
are many cases to consider!
</p>

<p>
(Side note: you can override <code>@</code> entries.  They are considered to be
special suffixes.  The above example could be written with
<code>@assembler:</code> as the first line.  In this case it would work for any
file suffix that is aliased to the assembler.)
</p>
</div>
</div>

<div id="outline-container-org479c588" class="outline-2">
<h2 id="org479c588">Altering the driver's arguments</h2>
<div class="outline-text-2" id="text-org479c588">
<p>
There is an interesting entry in the <code>-dumpspecs</code> output that will
almost certainly be empty in your installation: <code>self_spec</code>.
</p>

<p>
What is interesting about the self spec is that it is applied to the
arguments for the driver.  This means you can change the driver
arguments from within the driver itself.
</p>

<p>
The driver has built-in self specs that can only be altered by
rebuilding from source.  They are defined by the target backend.
However, you can define user self specs which are <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l8490">applied</a> after all
other specs have been read.
</p>

<p>
Use of self specs tend to be target-specific or needed for highly
customized situations.  Anything added via a self spec has to be an
argument that starts with a hyphen (<code>-</code>).
</p>

<p>
For an illustrative—but not really motivational—example, suppose
we changed the driver options such that all instances of <code>-m32</code> are
removed and we always add <code>-m64</code> to the arguments.  It can be done
this way.
</p>

<pre class="example" id="orge63111c">
*self_spec:
%&lt;m32 -m64
</pre>

<p>
Changing the driver arguments affects <i>all</i> subprocesses and must be
done with care.  You can cause a lot of trouble with this; it's much
like <code>eval</code> in dynamic languages.
</p>

<p>
When the driver runs a subprocess the environment variable
<code>COLLECT_GCC_OPTIONS</code> contains all the driver arguments after all the
self specs have been processed.  When running with verbose output use
it to debug any use of self specs.
</p>
</div>
</div>

<div id="outline-container-org066abcf" class="outline-2">
<h2 id="org066abcf">Subtleties around spaces</h2>
<div class="outline-text-2" id="text-org066abcf">
<p>
In the previous article I said this:
</p>

<blockquote>
<p>
All directives, once substituted, have spaces around them so you can't
piece together multiple arguments into one argument.
</p>
</blockquote>

<p>
That's not entirely true.  You can concatenate multiple arguments into
a single one, but it will not always work as expected.  (There were
multiple counterexamples in the previous article, if you looked
closely.)
</p>

<p>
The official documentation says this about directives.
</p>

<blockquote>
<p>
Note that spaces are not generated automatically around the results of
expanding these sequences. Therefore you can concatenate them together
or combine them with constant text in a single argument.
</p>
</blockquote>

<p>
Let's test that out using the following spec file, assuming I have a
file <code>test.sh</code> in the current directory that only prints its
arguments.
</p>

<pre class="example" id="org285466c">
.xx:
./test.sh /%{O1:1/opt}/%{g3:3/debug}
</pre>

<p>
Trying this out, I get
</p>

<pre class="example" id="orgc7635aa">
% gcc --specs=local.specs -c t.xx -O1 -g3
/1/opt/3/debug
</pre>

<p>
Great!  But this is specific to the <code>-O</code> and <code>-g</code> levels.  Let's say I
want to generalize it and base if off the values passed to those
options.
</p>

<pre class="example" id="org6ff92f5">
.xx:
./test.sh /%{O*:%*/opt}/%{g*:%*/debug}
</pre>

<p>
Now I get something different.
</p>

<pre class="example" id="orgb4706c5">
% gcc --specs=local.specs -c t.xx -O1 -g3
/1/opt /3/debug
</pre>

<p>
The <code>%*</code> substitution will cause a space to be added because of the
code in <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=fbcc9d03314db187f8342dd9eebd21072106fcf9;hb=8fc1a49c9312b05d925b7d21f1d2145d70818151#l7490"><code>give_switch</code></a>.
</p>

<p>
While it is true that directives are substituted without adding a
space (notwithstanding the example above), a spec could have leading
or trailing spaces. Consider the following example.
</p>

<pre class="example" id="orgec2796c">
# There is a space after the word "middle"
*stuff:
%{dummy} middle 

.xx:
./test.sh BEGIN%(stuff)END
</pre>

<pre class="example" id="org0473de9">
% gcc --specs=local.specs -c t.xx
BEGIN middle END
</pre>

<p>
In this case <code>%{dummy}</code> expands to the empty string which is followed
by a space, and the spec for <code>stuff</code> ends in a space.  (Note that
leading spaces in a spec are skipped when it is read but not when it
is processed.)  Sure, the directives are concatenated.  But that does
<i>not</i> mean that putting two specs directly beside each other will
result in a contiguous string with no space separating them.  This
means if you are trying to create pathnames using substitutions, for
example, it can be tricky.
</p>
</div>
</div>

<div id="outline-container-orgb562e7d" class="outline-2">
<h2 id="orgb562e7d">Spec functions</h2>
<div class="outline-text-2" id="text-orgb562e7d">
<p>
Spec functions are a generic mechanism for providing specs.  They must
be defined in source code.  They deserve their own article so I will
not be covering them much here.
</p>

<p>
Spec functions can be used in two ways: for direct substitution or
conditional substitution.  Most spec functions only make sense in one
of those contexts.  In a conditional context, if a spec function
returns NULL it is considered false.  If it returns any string (even
empty) it is considered true.
</p>

<p>
Target backends may define their own spec functions but these will not
be in the official documentation.  You have to go into the source to
find them.
</p>

<p>
Here's an example that adds the <code>-g3</code> argument for <code>cc1</code> if the last
<code>-O</code> argument is <code>-O2</code> or higher.
</p>

<pre class="example" id="org7a93afa">
*cc1:
%{%:gt(%{O*:%*;:0} 1):-g3}
</pre>

<p>
The semantics of spec functions are where specs get really complicated
and bizarre.  Stay tuned for another article that will delve into that
mess.
</p>
</div>
</div>
</div>]]></description>
    </item>
    <item>
      <title>Site hosting change</title>
      <link>https://wozniak.ca//blog/2024/01/06/1/index.html</link>
      <guid>https://wozniak.ca//blog/2024/01/06/1/index.html</guid>
      <pubDate>Sat, 06 Jan 2024 11:57:00 -0500</pubDate>
      <description><![CDATA[<div id="content" class="content">

<p>
I've moved the site to being hosted on Gitlab instead of Fastmail
since Fastmail cannot deal with an influx of requests.
</p>

<p>
Is Fastmail a great hosting provider?  No, but they are not meant to
host sites.  I used them because I use them for email and hosting a
static site with them is easy.  I guess if something gets popular for
a bit they can't handle it.
</p>

<p>
I don't like Gitlab all that much so I'll probably move it again at
some point.  It will take me a while to figure out where that will be.
For now Gitlab works.
</p>
</div>]]></description>
    </item>
    <item>
      <title>GCC specs: an introduction</title>
      <link>https://wozniak.ca//blog/2024/01/02/1/index.html</link>
      <guid>https://wozniak.ca//blog/2024/01/02/1/index.html</guid>
      <pubDate>Tue, 02 Jan 2024 21:17:00 -0500</pubDate>
      <description><![CDATA[<div id="content" class="content">

<p>
If you've ever used the <a href="https://gcc.gnu.org/">GNU Compiler Collection</a> (GCC) then you've
worked with the <code>gcc</code> binary.  For those who don't know, <code>gcc</code> is a
driver, not a compiler.  It runs the compiler/assembler/linker as
required and coordinates the input and output between them.  Use of
the driver is so ubiquitous that everyone calls it the compiler and
tends to take the assembly and linking actions for granted.
</p>

<p>
I'm not here to be a nitpicky pedant and admonish you to use the
correct terms.  There's no reason to start saying "well actually…"
about what <code>gcc</code> is and what it does.  Keep calling it the compiler
and don't worry about the assembler and linker unless it matters to
you.  The whole point of the driver is to abstract those
(annoying!) details away.
</p>

<p>
But then, how does the driver make it so you don't have to bother with
those details?  Somehow the driver has to take all the arguments you
provide, organize them, then run the appropriate subprograms using
those arguments stitched together in some way to process the input and
provide some output.
</p>

<p>
This article is about how the driver sets up the argument vectors for
subprograms.  Arguments are determined using <i>specification strings</i>,
or just <i>specs</i>.  Specs are a rather obscure and unintuitive
language that describe how and under what conditions the driver runs
each subprogram.  They are partially documented in the GCC manual in
the section <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Spec-Files.html">Specifying Subprocesses and the Switches to Pass to Them</a>,
which is probably not the name you would have looked for if I said to
find the documentation for specs.
</p>

<div id="outline-container-org037864e" class="outline-2">
<h2 id="org037864e">Lifting the curtain</h2>
<div class="outline-text-2" id="text-org037864e">
<p>
Have you ever passed <code>-v</code> to the driver and studied the output?
There's a lot of stuff in there that is revealing.  It's also a good
way to understand what specs are and why they exist.
</p>

<p>
Because the output from <code>-v</code> is so extensive I'm going to use one
example, broken up into parts.  Not all of it is relevant for
understanding specs, but it may be of general interest to some.
</p>

<p>
To get the output, I ran the following command on an amd64 machine
running Debian 12, which is using GCC 12.2.
</p>

<pre class="example" id="org01f3a3a">
gcc -v t.c
</pre>

<p>
The content of the source file is not important, only that the program
is in a single file.  (Use <code>int main (void) {}</code> if you have nothing
sitting around.)  I've reformatted the output where needed.  In some
cases I've omitted output and indicate it with <code>(...)</code>.
</p>

<pre class="example" id="org1929ad6">
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v (...)
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14) 
COLLECT_GCC_OPTIONS='-v' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-'
</pre>

<p>
The first line says where the driver is getting its specs.  It's
possible to override the builtin specs but that is for a later
discussion.
</p>

<p>
What follows are some environment variables that are set by <code>gcc</code> to
communicate with the subprograms and various other information.  There
is also the "Configured with" line that tells you how GCC was
configured, which is useful if you want to build it yourself.
</p>

<p>
In our example the driver has to compile the source using the C
compiler, assemble it into an object file, then link the object into
an executable.  The first subprogram run is the compiler.  <code>cc1</code> is
the C compiler, <code>cc1plus</code> is the C++ compiler.  Note that subprogram
invocations in the output are lines that start with a single space.
</p>

<pre class="example" id="orgd7ca7ad">
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1 \
   -quiet \
   -v \
   -imultiarch x86_64-linux-gnu \
   t.c \
   -quiet \
   -dumpdir a- \
   -dumpbase t.c \
   -dumpbase-ext .c \
   -mtune=generic \
   -march=x86-64 \
   -version \
   -fasynchronous-unwind-tables \
   -o /tmp/ccJOdUuR.s
(...)
</pre>

<p>
We only provided the arguments <code>-v</code> and <code>t.c</code> to the driver yet the
compiler has many more.  Some are even provided twice.  If you look at
the <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Invoking-GCC.html">documentation</a> for options you won't find the "dump" ones listed,
for example, so the compiler doesn't use the same options as the
driver.  That said, some of them match.
<code>-fasynchronous-unwind-tables</code> is a code generation option helpful
when debugging if the target machine supports it.  <code>-march</code> and
<code>-mtune</code> control what kind of code is generated for the processor.
These are options specific to the target or host.  It would be tedious
to specify them every time.
</p>

<p>
I've trimmed all the compiler output because it's mostly version info,
although it does print the C standard it is using, some of the
compiler's heuristic data, and the compiler executable checksum.  It
will also print the search order for headers which can be very handy
when debugging header problems or just knowing where system headers
reside.
</p>

<p>
As a side note, it's worth pointing out that you can run the compiler
directly if you want.  You can even pass it <code>--help</code> to see the
extensive set of options it accepts.  Running the compiler directly
isn't something you need to do very often, even when debugging it,
because <code>gcc</code> has a <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Overall-Options.html#index-wrapper">-wrapper</a> option to do this for you.
</p>

<p>
The compiler outputs assembler source, so the next step is for the
driver is to run the assembler.
</p>

<pre class="example" id="org665d584">
 as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s
(...)
</pre>

<p>
Note that the input is a temporary file, as is the output.  The driver
has to manage all these temporaries and coordinate them across the
subprograms.  If you pass <code>-save-temps</code> then it will change the way
this is done and use more intuitive names.
</p>

<p>
Finally, we have the linker<sup/> invocation, complete with the
messiest and most complex set of arguments.
</p>

<pre class="example" id="org84b688c">
/usr/lib/gcc/x86_64-linux-gnu/12/collect2 \
   -plugin /usr/lib/gcc/x86_64-linux-gnu/12/liblto_plugin.so \
   -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper \
   -plugin-opt=-fresolution=/tmp/cc7cMZRs.res \
   -plugin-opt=-pass-through=-lgcc \
   -plugin-opt=-pass-through=-lgcc_s \
   -plugin-opt=-pass-through=-lc \
   -plugin-opt=-pass-through=-lgcc \
   -plugin-opt=-pass-through=-lgcc_s \
   --build-id --eh-frame-hdr -m elf_x86_64 \
   --hash-style=gnu --as-needed \
   -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crti.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o \
   -L/usr/lib/gcc/x86_64-linux-gnu/12 \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib \
   -L/lib/x86_64-linux-gnu \
   -L/lib/../lib \
   -L/usr/lib/x86_64-linux-gnu \
   -L/usr/lib/../lib \
   -L/usr/lib/gcc/x86_64-linux-gnu/12/../../.. \
   /tmp/ccj7Fe2p.o \
   -lgcc \
   --push-state --as-needed -lgcc_s --pop-state \
   -lc -lgcc \
   --push-state --as-needed -lgcc_s --pop-state \
   /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o \
   /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/crtn.o
</pre>

<p>
What the options mean is beyond the scope of this article, but it's
worth noting what arguments are needed.  There are object files (the
<code>.o</code> suffix) and libraries (<code>-l</code> options) and library search paths
(<code>-L</code> options).  The order of these options matters, too, and the
driver has to get this right.  Furthermore, which system object files
to use (found in <code>/usr/lib</code> in this example) can be non-obvious, so we
should appreciate all the work the driver does behind the scenes.
Manually specifying a link is tedious.
</p>

<p>
To round out the discussion of the files that the driver has to manage,
all the temporary files created during the compile/assemble/link
sequence are deleted before the driver exits.
</p>

<p>
All the arguments passed to the subprograms and the management of
files are done with specs.
</p>
</div>
</div>

<div id="outline-container-orgc7b0dcc" class="outline-2">
<h2 id="orgc7b0dcc">What is a spec?</h2>
<div class="outline-text-2" id="text-orgc7b0dcc">
<p>
The driver determines what subprograms to run based on the arguments
it is given.  For example, if you give it a file name with the <code>.c</code>
suffix it will know to run the C compiler for that file; give it a
file with a <code>.cpp</code> suffix and it will use the C++ compiler.  It
<a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Overall-Options.html">recognizes</a> many file extensions and you can specify the language it
should use with the <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Overall-Options.html#index-x"><code>-x</code> option</a>.  Some options tell it not to run the
assembler or the linker.
</p>

<p>
The driver determines how to run a subprogram based on a spec for the
language.  Each language has its own spec that has the name of the
subprogram and the arguments to pass to it.  A spec is a string made
up of zero or more lines.  Each line corresponds to a program to run
so it's possible for you to run more than one program for each
language or "stage" being handled.
</p>

<p>
Since the arguments needed for a subprogram can vary significantly
from one invocation to the next they are rarely given directly.
Instead, they are provided with directives that expand to arguments
based on what the user supplied.  The syntax of these directives is
inspired by the ones used for <code>printf</code>, albeit with very different
semantics.
</p>

<p>
This may sound somewhat elegant but let me dissuade you of any notion
of elegance right now.  We are about to wade into an ad hoc
mini-language where the only reliable way to get the result you want
is to experiment to find out what happens, although admittedly the
lack of consistent (and documented) semantics only tends to show up
when you start doing the advanced stuff.
</p>
</div>
</div>

<div id="outline-container-org8e53fc6" class="outline-2">
<h2 id="org8e53fc6">How to read a spec (the basics)</h2>
<div class="outline-text-2" id="text-org8e53fc6">
<p>
We're going to focus on the assembler invocation since it is the
subprogram with the fewest arguments and is the easiest to follow.
The goal is to trace through how it gets its arguments.
</p>

<p>
To do this you have to learn how to read a spec.  This is accomplished
by looking at the <a href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Spec-Files.html">spec documentation</a> and casting your eyes on arguably
the ugliest output GCC can produce: <code>gcc -dumpspecs</code>.
</p>

<p>
The full output of <code>-dumpspecs</code> is a beast, so here's a snippet of the
dumped specs that provides one of the above options.  I've reformatted
the output slightly to put each directive from the spec string on its
own line so it's easier to read, but keep in mind that you're not
going to get output this agreeable most of the time.
</p>

<pre class="example" id="orgabe0bb5">
*asm:
%{m16|m32:--32} \
  %{m16|m32|mx32:;:--64} \
  %{mx32:--x32} \
  %{msse2avx:%{!mavx:-msse2avx}}
</pre>

<p>
The <code>*asm:</code> line declares this is a named spec with the name <code>asm</code>.
This happens to be the builtin spec name for options passed to the
assembler.  Confusingly, this is not how <i>all</i> the options are passed,
but we'll get to that shortly.
</p>

<p>
Let's look closely at the following directive, which is how the <code>--64</code>
option is passed to the assembler.  It's one pattern you're going to
see a lot.
</p>

<pre class="example" id="orgb3f3206">
%{m16|m32|mx32:;:--64}
</pre>

<p>
This says that if any of the arguments <code>-m16</code>, <code>-m32</code>, or <code>-mx32</code> has
been passed to the driver then substitute nothing, otherwise
substitute in the string <code>--64</code>.  Let's go over it in more detail.
</p>

<p>
The general form is <code>%{S:X;:D}</code>.  This says that if the argument <code>-S</code>
has been given then substitute <code>X</code>, otherwise substitute <code>D</code>.
(Clearly, <code>S</code> can also be a combination of arguments separated with
the pipe symbol <code>|</code>.)  This is a conditional and is the primary way
you control the way arguments are passed.
</p>

<p>
In the above case <code>X</code> is empty which is why it looks so confusing.
The first <code>:</code> indicates the end of the condition.  The <code>;</code> indicates
the end of the substitution.  Since there is nothing there, nothing
(well, the empty string) is substituted.  The last <code>:</code> indicates the
start of the last substitution.  You can add white space to make it
more readable, but it may or may not help.  You can nest them and when
they nest deeply (which they will), they can be very difficult to
follow.
</p>

<p>
Our example at the beginning only passed the arguments <code>-v</code> and
<code>t.c</code>.  Since none of <code>-m16</code>, <code>-m32</code>, or <code>-mx32</code> was passed, <code>--64</code>
is passed to the assembler.
</p>

<p>
If you look at the other directives in the <code>asm</code> spec you'll see other
forms of conditionals.
</p>

<ul class="org-ul">
<li>
<code>%{m16|m32:--32}</code> means that if either <code>-m16</code> or <code>-m32</code> was passed,
substitute <code>--32</code>.  Otherwise, substitute nothing.  This is a
conditional with only one clause and is very common.  You could
write it as <code>%{m16|m32:--32;:}</code> but that adds needless line noise so
it's better not to.</li>
<li>
<code>%{mx32:--x32}</code> should be clear by now: substitute <code>--x32</code> if <code>-mx32</code> was
given.</li>
<li>
<code>%{msse2avx:%{!mavx:-msse2avx}}</code> is interesting because it is how
you do conjunctions.  The other examples we've looked at were
disjunctions.  Conjunctions are done by nesting.  This is where
things can get really hairy.  This one says that if <code>-msse2avx</code> was
given and not <code>-mavx</code> then substitute <code>-msse2avx</code>.  (You may think
"oh, conjunctions aren't too bad".  I invite you to take a look at
the <code>link_command</code> spec.  Come back and we can share a story or
two.)</li>
</ul>

<p>
There are few other points to note about these directives.
</p>

<ul class="org-ul">
<li>The options in the condition are written without leading hyphens.
This is a good thing because adding them would make reading a spec
even harder.  Internally, the driver has already decoded the options
when matching happens so the hyphens are gone.</li>
<li>The substitution doesn't have to be an option, it can be any string.
Usually you want it to be an option, but it can also be another
directive.  If it is another directive, it will be processed as one.</li>
<li>You have negation with the <code>!</code> syntax.  There are other "operators"
but they are not common.</li>
<li>You can substitute an option directly if it was given, or nothing if
it wasn't, by writing <code>%{S}</code> where <code>S</code> is the option.  This is a
simple short form for <code>%{S:-S;:}</code>.</li>
<li>The full conditional syntax allows for any number of clauses.  For
example, you could write <code>%{m16|m32|mx32:;:--64}</code> as <code>%{m16:; m32:;
  mx32:; :--64}</code>.  I call this form the "maximal confusion" form
because if it's used it's rarely short and always difficult to read.</li>
<li>There doesn't seem to be a way to match option patterns or option
arguments.  There is a way, but we haven't gotten there yet.</li>
</ul>
</div>
</div>

<div id="outline-container-orgcdae7fa" class="outline-2">
<h2 id="orgcdae7fa">From spec to arguments</h2>
<div class="outline-text-2" id="text-orgcdae7fa">
<p>
We've covered all this ground and so far we've only explained one
argument.  And it seems to get there by magic.  The documentation says
that the <code>asm</code> spec specifies the options to pass to the assembler.
Clearly there are more than just the one.  What gives?
</p>

<p>
What gives is that the documentation does not tell the whole story.
The full spec for the assembler is actually in the <a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=bb07cc244e30fbeccc701816db888f497d65eb08;hb=2ee5e4300186a92ad73f1a1a64cb918dc76c8d67#l1476">source code</a>.  It's
not large so I'll reproduce it here.
</p>

<pre class="example" id="org48f1ed1">
%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A }}}}
</pre>

<p>
Here you can see the messiness of specs starting to show up.  If we
strip away the conditionals<sup/> we get something more manageable.
</p>

<pre class="example" id="orgb77deab">
as %(asm_debug) %(asm_options) %i %A
</pre>

<p>
The <code>%(asm_debug)</code> and <code>%(asm_options)</code> directives reference named
specs.  They substitute in the value of that spec for further
processing.  If you look at the output of <code>-dumpspecs</code> you should find
both <code>asm_debug</code> and <code>asm_options</code>.  On my system they are long and
mostly of no consequence so I won't completely reproduce them here,
but we can go through the ones that actually produce the options.
Then we can look at a few other notable ones.
</p>

<p>
<code>asm_debug</code> doesn't substitute anything in our running example, so
let's turn our attention to <code>asm_options</code>.
</p>

<p>
Here's an abbreviated version of the spec.  Parts that are omitted are
denoted with <code>(...)</code>.  You should see something similar if you are
using a recent version of GCC.
</p>

<pre class="example" id="orge6a1e9b">
*asm_options:
(...) %{v} (...) %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}
</pre>

<p>
Not all of these substitute anything, but they are notable so I have
included them.
</p>

<p>
To see how we go from spec to assembler invocation, I will show the
conceptual version of the full assembler spec at this point with the
final assembler invocation.  The conceptual version bypasses
<code>asm_debug</code> since it substitutes nothing and substitutes in the
abbreviated version of the <code>asm_options</code> spec.
</p>

<pre class="example" id="org8f37933">
as %{v} %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O} %i %A
</pre>

<pre class="example" id="orgd5de9a0">
as -v --64 -o /tmp/ccj7Fe2p.o /tmp/ccJOdUuR.s
</pre>

<p>
We will go through each directive and show what happens.
</p>

<ul class="org-ul">
<li>
<code>%{v}</code> should be clear from earlier discussion: substitute <code>-v</code>
because <code>-v</code> was passed.</li>
<li>
<code>%a</code> and <code>%Y</code> are special builtin specs.  <code>%a</code> substitutes the <code>asm</code>
spec.  (Why have a special one and not just say <code>%(asm)</code>?  I don't
know.)  This is how we get the <code>--64</code> option as described in the
previous section.  <code>%Y</code> is used to substitute any assembler
arguments that are passed via <code>-Wa</code>, which is the way you tell the
driver that an argument is meant specifically for the assembler.
It's how you pass through an argument without the driver validating
it.  We did not pass any of those so it substitutes nothing.</li>
<li>The next directive is conditional and only substitutes if <code>-c</code> was
passed to the driver.  It substitutes nothing in our example.  I
included it, however, because it is written directly beside the next
spec.  This may seem significant but it actually isn't.  All
directives, once substituted, have spaces around them so you can't
piece together multiple arguments into one argument.  Why was this
not written as a conditional with an else clause?  Probably as a
matter of style.</li>
<li>The directive <code>%{!c:-o %d%w%u%O}</code> is going to give us the arguments
for the output file, but it looks like random gibberish so take it
one step at a time and look them up in the documentation.
<ul class="org-ul">
<li>
<code>%d</code> marks the argument as a temporary file which will be deleted
when the driver is done.  It substitutes nothing.</li>
<li>
<code>%w</code> marks the argument as the designated output file and also
substitutes nothing.</li>
<li>
<code>%u%O</code> substitutes the file name.  <code>%u</code> generates and substitutes
the temporary file name with the suffix substituted by <code>%O</code>.</li>
</ul>
</li>
<li>After the directives that give the output we have <code>%i</code> which
substitutes the input file.  This is the assembler file generated by
the compiler.</li>
<li>Finally, <code>%A</code> substitutes the <code>asm_final</code> spec.  This lets you run
some post processing on the assembler.  It does nothing in our case,
but if you're using the <code>-gsplit-dwarf</code> option with a recent version
of GCC you might want to check it out.</li>
</ul>

<p>
That's it!  That's the (mostly) complete trace of how the driver
determines how to call the assembler.
</p>
</div>
</div>

<div id="outline-container-org4b3b3fb" class="outline-2">
<h2 id="org4b3b3fb">Experimenting with specs, the easy way</h2>
<div class="outline-text-2" id="text-org4b3b3fb">
<p>
At this point we've traced through how the assembler got its arguments
from the driver in some detail—and there is plenty more that we
could cover.  I'm going to do that in a later article, but in the
meantime I can show you the easiest way to experiment with specs that
doesn't involve writing them or changing the source.
</p>

<p>
If you pass the option <code>-###</code> to the driver it will print all the
commands it would run without actually running them.  Using this "dry
run" mode you can see how options are substituted without having to
bother with making the subprogram actually accept them.
</p>

<p>
Earlier in the <code>asm_options</code> spec I omitted an interesting directive:
<code>%{I*}</code>.  This is a spec that matches any option that starts with <code>-I</code>
<i>and</i> its option.  In this case it will substitute all the <code>-I</code>
options that were passed.  For example, if we had passed <code>-I dir1</code> and
<code>-Ipath/to/headers</code> to the driver, both of them would be substituted
in for <code>%{I*}</code>.  You can see this by running
</p>

<pre class="example" id="org8eb803b">
gcc '-###' -I dir1 -Ipath/to/headers t.c
</pre>

<p>
and looking for the assembler command.  The <code>-###</code> command is quoted
to prevent the shell from possibly interpreting it as a comment or a
pattern.
</p>

<pre class="example" id="org998f04d">
% gcc '-###' t.c -I dir -Ipath/to/headers 2&gt;&amp;1 | grep '^ as'
 as -I dir -I path/to/headers --64 -o /tmp/ccNz0vob.o /tmp/ccVtbBuj.s
</pre>
</div>
</div>

<div id="outline-container-org6d528d3" class="outline-2">
<h2 id="org6d528d3">Experimenting with specs, the more interesting way</h2>
<div class="outline-text-2" id="text-org6d528d3">
<p>
If you want to explore specs a bit more and don't want to start
changing the compiler, a good way is to use a custom spec that will
show you what gets substituted by defining a new file type.
</p>

<p>
You can do this easily with the following spec file.
</p>

<pre class="example" id="org041ace7">
.xx:
./test.sh %i
</pre>

<p>
This will register a new file extension <code>.xx</code> and run the compiler
<code>./test.sh</code> passing it the input file as the argument.  All <code>test.sh</code>
has to do is echo its arguments.
</p>

<p>
You can run the driver like this to use your custom specs and
"compiler".
</p>

<pre class="example" id="org389943b">
gcc -c -specs=custom.specs file.xx
</pre>

<p>
You want to pass <code>-c</code> so that the driver does not try to run any other
subprograms.  Change the custom spec to your liking.  Here's an
example that will transform all options that start with <code>-m</code> to ones
that start with <code>-k</code>.  See the documentation for more details.
</p>

<pre class="example" id="orgbe414b7">
.xx:
./test.sh %{m*:-k%*}
</pre>

<p>
If you pass <code>-m32 -march=blah</code> it will be passed to your "compiler" as
<code>-k32 -karch=blah</code>.
</p>
</div>
</div>

<div id="outline-container-org6d4453a" class="outline-2">
<h2 id="org6d4453a">Summary</h2>
<div class="outline-text-2" id="text-org6d4453a">
<p>
The GCC driver coordinates and runs multiple subprograms.  To manage
how it specifies all the arguments to these programs it uses something
called <i>specs</i>.  These are strings with <code>printf</code>-like directives that
are processed and substituted with values based on the arguments
passed to the driver and various contextual information.
</p>

<p>
You can view most, but not all, of the builtin specs to the driver by
running <code>gcc -dumpspecs</code>.  To get the full spec for a subprogram you
need to look in the driver source code.
</p>

<p>
Once you have these specs you can trace the logic behind an option,
but it is tedious work.  Specs are not known for their readability.
And it's pretty uncommon for anyone to have to debug the driver.
</p>

<p>
Nevertheless, specs can help you understand what the driver is doing.
It's one of those odd mysteries of a common tool that aren't really
explained anywhere.
</p>

<p>
In a future article I'll show some of the more esoteric parts of spec
usage, how to use specs to affect the driver itself, and the
idiosyncrasies of writing spec functions.
</p>

<p>
(Thanks to <a href="https://www.amazon.ca/More-Than-Good-Enough-Holmes/dp/099194755X">Matt L. Holmes</a> for reviewing this.)
</p>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">

<div class="footdef">
<sup><a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink">1</a></sup> <div class="footpara" role="doc-footnote"><p class="footpara">
Strictly speaking, <code>collect2</code> is a wrapper for the actual
linker.  Explaining <code>collect2</code> is something else entirely.
</p></div>
</div>

<div class="footdef">
<sup><a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink">2</a></sup> <div class="footpara" role="doc-footnote"><p class="footpara">
The options in the conditionals are those that tell the
driver not to run the assembler.
</p></div>
</div>


</div>
</div>
</div>]]></description>
    </item>
  </channel>
</rss>
