GCC spec functions in-depth
This is third in a series of articles about GCC specs. (See also: Part 1, Part 2.)
In this article I'm going to burrow into the world of spec functions. Spec functions are functions that can be called from within spec strings to apply arbitrary logic to some part of it. They are used to break out of the confines of plain string substitution that is the only logic that the rest of spec processing provides.
There are a variety of spec functions documented in the manual. If you read through them you'll probably notice they are rather specialized. Spec functions don't exist to turn spec strings into a general programming language, they exist to cover the odd corner cases that come up.
This article is written such that you do not have to run any examples to follow along, but it is helpful to be familiar with some of the previous material if you want to really understand what is going on. Spec functions require much more work to experiment with than just writing specs.
How to define a spec function
To define a spec function you must alter the source code. All the
documented functions (and a few more) are found in the driver in the
array static_spec_functions. The EXTRA_SPEC_FUNCTIONS macro at
the end of the array, if defined, is defined by the target backend.
Names of spec function implementations follow the convention of the
name of the spec function as with the suffix _spec_function. For
example, the if-exists spec function is implemented by
if_exists_spec_function.
To add a spec function you must add it to the static_spec_functions
array and define it somewhere. If you want to try writing one and
using it, I recommend you add the function to the driver source code
in gcc.cc. Otherwise you will have to start messing with backend
code or the build configuration and that gets complicated fast.
Entries to static_spec_functions have the type struct
spec_function, which is defined in gcc.h.
A spec function takes two arguments: an int of the argument count
and a const char ** of the strings passed to it. This is the same
as main, so it should be familiar to many. It returns a const char
*. How the return value is handled is explained below.
See the appendix for information on how to obtain and build GCC for the purposes of experimenting with specs.
An example
Here is a spec function that will help you understand how they work. All it does is write its arguments to stderr on a single line.
static const char *
echo_spec_func (int argc, const char **argv)
{
for (int i = 0; i < argc; ++i)
fprintf (stderr, "%s ", argv[i]);
if (argc > 0)
fprintf (stderr, "\n");
return NULL;
}
And here it is in action. The content of the spec file is given
first, followed by the execution of the gcc command. The command
prompt in the examples is the % character. The dummy compiler is
test.sh; it does nothing except print its arguments to stdout. In
all the examples that follow I've omitted any empty line output by the
dummy compiler.
.xx: ./test.sh %:echo(Hello World)
% gcc --specs=local.specs -c t.xx Hello World
The spec function output goes to stderr so it can be split it from the output from the dummy compiler. It only really matters if you are trying this out yourself.
Return values
A spec function must return a string (well, a const char *). The
return value is then processed as a spec. This is why spec functions
in a conditional context must return NULL to indicate failure.
Let's change the echo spec function to return its first argument.
static const char *
echo_spec_func (int argc, const char **argv)
{
if (argc == 0)
return NULL;
for (int i = 0; i < argc; ++i)
fprintf (stderr, "%s ", argv[i]);
fprintf (stderr, "\n");
return argv[0];
}
With our previous example, the compiler will output "Hello" in addition to the "Hello World" spit out by the spec function.
Let's do something more interesting.
.xx: ./test.sh %:echo(%%i)
% gcc --specs=local.specs -c t.xx %i t.xx
In this case we have %%i as an argument. Arguments to spec
functions are processed as specs (more on that to come) then passed to
the spec function. In this case %% expands to % so the whole
argument is the string %i. This is printed by the spec function and
then returned as the result, which is then processed as a spec and
expands to the input file t.xx. This is passed to the dummy
compiler and printed as the argument.
And yes, functions can be nested.
.xx: ./test.sh %:echo(%:echo(%%i))
% gcc --specs=local.specs -c t.xx %i t.xx t.xx
Do you want to do this all the time? Probably not. The syntax is not wonderful and the semantics of arguments is not as simple as you might think.
Arguments to spec functions
As mentioned above the arguments to spec functions are first processed as specs and the results are passed as arguments. This appears straightforward but, of course, isn't.
What really happens is that the string between the parentheses of the spec function is processed as a single spec string and then, when done processing, all the whitespace delimited words are passed as individual arguments. Unless you are careful you won't know how many arguments your function will actually get.
Let's change the echo spec function to print the number of arguments as well as the arguments.
static const char *
echo_spec_func (int argc, const char **argv)
{
fprintf (stderr, "There are %d args! ", argc);
for (int i = 0; i < argc; ++i)
fprintf (stderr, "%s ", argv[i]);
fprintf (stderr, "\n");
return NULL;
}
Here's an example from the previous article to illustrate the importance of understanding that you are not passing arguments, but rather you are providing a spec string that will turn into arguments.
# There is a space after the word "middle"
*stuff:
%{dummy} middle
.xx:
./test.sh %:echo(BEGIN%(stuff)END)
% ~/tmp/opt/bin/gcc --specs=local.specs -c t.xx There are 3 args! BEGIN middle END
Quoting will not help because there is no notion of a value in the spec language: it's just string substitution. The best you can do is backslash escape a space, and that won't always work.
But wait! It gets worse. Here's another example from the previous article.
.xx:
./test.sh %:echo(/%{O*:%*/opt}/%{g*:%*/debug})
How many arguments do you think will be passed to the echo spec
function if no -O or -g arguments are given to the driver? Based
on all the discussion so far it should be one. And this is correct.
Notch a tally for predictability (and forgive the subject-verb
disagreement in the output).
% gcc --specs=local.specs -c t.xx There are 1 args! //
Now, how about this?
% gcc --specs=local.specs -c t.xx -O1 -g3
In the previous article the /%{O*:%*/opt}/%{g*:%*/debug} expanded to
/1/opt /3/debug so two arguments would be a reasonable guess. It
turns out that is half right. The actual answer is four.
% gcc --specs=local.specs -c t.xx -O1 -g3 There are 4 args! /1 /opt /3 /debug
This is ultimately attributed to the code at the end of do_spec_1.
It calls end_going_arg before exiting if it is processing the
argument string in a spec function context. Each time do_spec_1 is
called it might end the argument and put the current string in the
argument vector. This is not the case during normal spec processing.
If you think that's bad, get ready for a surprise. What follows is an
attempt to echo all the -g options passed to the driver.
.xx:
./test.sh %:echo(%{g*})
% gcc --specs=local.specs -c t.xx -g -g3 There are 4 args! - g - g3
This is wholly unexpected. The reason it happens is that do_spec_1
is called to add a single hyphen-minus ('-') to the argument stack.
Internally the leading '-' is not stored in the parsed argument
(known as a switch), so to print the option is has to manually add it.
But because we're processing a spec function and the arg is "going",
it becomes an argument by itself.
Another caution
One final note on spec processing: the conditional directive. Suppose
you have a spec function called analyze-opts that analyzes some
options and returns a meaningful, non-NULL value when some condition
is met. You decide to use it in a conditional, like this.
%{O*: --optimize; %:analyze-opts(%{g*}): --debug; :--default}
If we pass -O2 as an option the call to analyze-opts—and the
processing of its return value—will still happen. In other words,
conditionals are not short-circuiting. If analyze-opts manipulates
some state behind the scenes you will need to take this into account.
I wouldn't call this really weird, but it's not usual.
Outright abuse
We've seen a lot with specs at this point, and some of it is just weird. Now we're going to be underhanded and push this in ways it was never meant to go.
If you write your spec function in the driver source you have access to all its functions and data. That means you can do side-effects and change specs at runtime. Here's an example that uses named specs as variables.
The spec function is set. The first argument is the name of the
"variable" and the rest of the arguments are the value. This uses a
data structure known as an obstack to hold the string being created.
There is an obstack available and initialized in the driver source
file by the time any spec function is called. If only one argument is
given it sets the variable to the empty string.
static const char *
set_spec_func (int argc, const char **argv)
{
if (argc == 1)
{
set_spec (argv[0], "", true);
return NULL;
}
for (int i = 1; i < argc; ++i)
{
obstack_grow (&obstack, argv[i], strlen (argv[i]));
obstack_1grow (&obstack, ' ');
}
obstack_1grow (&obstack, '\0');
const char *val = XOBFINISH (&obstack, const char *);
set_spec (argv[0], val, true);
return NULL;
}
Now you can capture values from %* and use them in nested specs to
make cross products, just like I'm sure you've always dreamt of doing.
.xx:
./test.sh %{O*:%:set(x %*)%{O*:(%(x), %*)}}
% gcc --specs=local.specs -c t.xx -O1 -O2 -O3 (1 , 1) (1 , 2) (1 , 3) (2 , 1) (2 , 2) (2 , 3) (3 , 1) (3 , 2) (3 , 3)
Let's abuse the fact that a return value is processed as a spec and
have the echo spec function return a call to itself, making it
recursive.
static const char *
echo_spec_func (int argc, const char **argv)
{
if (argc == 0)
return NULL;
fprintf (stderr, "%s\n", argv[0]);
obstack_grow (&obstack, "%:echo(", (sizeof ("%:echo(") - 1));
for (int i = 1; i < argc; ++i)
{
obstack_grow (&obstack, argv[i], strlen (argv[i]));
obstack_1grow (&obstack, ' ');
}
obstack_grow0 (&obstack, ")", 1);
return XOBFINISH (&obstack, const char *);
}
.xx:
./test.sh %:echo(%{O*})
% gcc --specs=local.specs -c t.xx -O1 -O2 -O3 - O1 - O2 - O3
Sure, you could just iterate over the arguments and print them, but where's the fun in that?
Implementing factorial and/or Fibonacci is left as an exercise for the reader.
Appendix: Adding a spec function to GCC, the easy way
I'm going to assume you have some experience installing packages on whatever system you use and that you're familiar with building C/C++ code. You can look at the GCC installation prerequisites if you want all the names of things you'll need installed. There is a decent OSDev wiki page that gives the packages to install on different systems. I'm not going to reproduce all that here. However, you do not need to install GMP, MPC, MPFR, Cloog, or ISL. You should only need to install a compiler, Make, Bison, Flex, and maybe Texinfo.
Once you have those installed, clone the GCC repository or download the source. Unless you are getting serious about this, you don't need to do anything beyond a basic cloning of the repo. If you download the sources, it's good to get a recent version.
I'm going to assume that the code is in ~/src/gcc for the purposes
of the following discussion.
The initial step is to download the library prerequisites. This done by changing to the source directory and running the script that GCC comes with to get the correct versions.
% cd ~/src/gcc % ./contrib/download_prerequisites
When this is done the source tree will contain the sources to the necessary libraries and GCC's build will automatically detect them as required.
Next, create a build directory and work in there. (Do not build GCC in-tree. Just don't.)
% rm -rf ~/tmp/build && mkdir ~/tmp/build && cd ~/tmp/build
Next, you have to configure GCC. There are many, many configuration options. We don't care about most of them. What we want is a native compiler without any other system support (meaning if we are on a 64-bit system we don't want 32-bit support), we want it installed in a custom location, we want minimal language support (because we don't actually care about compiling anything), and we want the build to be reasonably fast.
A simple way to get this is with the following options.
% ~/src/gcc/configure --prefix=${HOME}/tmp/opt \
--disable-multilib \
--enable-languages=c \
--with-pkgversion="Local custom compiler"
This only builds for the C language avoiding all the other compiler builds. It also sets the version string to contain "Local custom compiler" so you can verify easily that you are using the correct driver.
Assuming the configuration succeeds, run the following command to
build and install gcc. Adjust the -j argument as required.
% make -j12 all-gcc && make install-gcc
You will find the driver at ${HOME}/tmp/opt/bin/gcc. Run the above
command each time you make a change to the driver. The initial build
can take a while, but subsequent builds should be quick.
% ~/tmp/opt/bin/gcc --version gcc (Local custom compiler) 14.0.0 20231211 (experimental) Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.