Main

Blog (Atom feed)

GCC spec functions in-depth

This is third in a series of articles about GCC specs. (See also: Part 1, Part 2.)

In this article I'm going to burrow into the world of spec functions. Spec functions are functions that can be called from within spec strings to apply arbitrary logic to some part of it. They are used to break out of the confines of plain string substitution that is the only logic that the rest of spec processing provides.

There are a variety of spec functions documented in the manual. If you read through them you'll probably notice they are rather specialized. Spec functions don't exist to turn spec strings into a general programming language, they exist to cover the odd corner cases that come up.

This article is written such that you do not have to run any examples to follow along, but it is helpful to be familiar with some of the previous material if you want to really understand what is going on. Spec functions require much more work to experiment with than just writing specs.

How to define a spec function

To define a spec function you must alter the source code. All the documented functions (and a few more) are found in the driver in the array static_spec_functions. The EXTRA_SPEC_FUNCTIONS macro at the end of the array, if defined, is defined by the target backend.

Names of spec function implementations follow the convention of the name of the spec function as with the suffix _spec_function. For example, the if-exists spec function is implemented by if_exists_spec_function.

To add a spec function you must add it to the static_spec_functions array and define it somewhere. If you want to try writing one and using it, I recommend you add the function to the driver source code in gcc.cc. Otherwise you will have to start messing with backend code or the build configuration and that gets complicated fast.

Entries to static_spec_functions have the type struct spec_function, which is defined in gcc.h.

A spec function takes two arguments: an int of the argument count and a const char ** of the strings passed to it. This is the same as main, so it should be familiar to many. It returns a const char *. How the return value is handled is explained below.

See the appendix for information on how to obtain and build GCC for the purposes of experimenting with specs.

An example

Here is a spec function that will help you understand how they work. All it does is write its arguments to stderr on a single line.

static const char *
echo_spec_func (int argc, const char **argv)
{
  for (int i = 0; i < argc; ++i)
    fprintf (stderr, "%s ", argv[i]);

  if (argc > 0)
    fprintf (stderr, "\n");

  return NULL;
}

And here it is in action. The content of the spec file is given first, followed by the execution of the gcc command. The command prompt in the examples is the % character. The dummy compiler is test.sh; it does nothing except print its arguments to stdout. In all the examples that follow I've omitted any empty line output by the dummy compiler.

.xx:
./test.sh %:echo(Hello World)
% gcc --specs=local.specs -c t.xx
Hello World 

The spec function output goes to stderr so it can be split it from the output from the dummy compiler. It only really matters if you are trying this out yourself.

Return values

A spec function must return a string (well, a const char *). The return value is then processed as a spec. This is why spec functions in a conditional context must return NULL to indicate failure.

Let's change the echo spec function to return its first argument.

static const char *
echo_spec_func (int argc, const char **argv)
{
  if (argc == 0)
    return NULL;

  for (int i = 0; i < argc; ++i)
    fprintf (stderr, "%s ", argv[i]);

  fprintf (stderr, "\n");
  return argv[0];
}

With our previous example, the compiler will output "Hello" in addition to the "Hello World" spit out by the spec function.

Let's do something more interesting.

.xx:
./test.sh %:echo(%%i)
% gcc --specs=local.specs -c t.xx
%i 
t.xx

In this case we have %%i as an argument. Arguments to spec functions are processed as specs (more on that to come) then passed to the spec function. In this case %% expands to % so the whole argument is the string %i. This is printed by the spec function and then returned as the result, which is then processed as a spec and expands to the input file t.xx. This is passed to the dummy compiler and printed as the argument.

And yes, functions can be nested.

.xx:
./test.sh %:echo(%:echo(%%i))
% gcc --specs=local.specs -c t.xx
%i 
t.xx 
t.xx

Do you want to do this all the time? Probably not. The syntax is not wonderful and the semantics of arguments is not as simple as you might think.

Arguments to spec functions

As mentioned above the arguments to spec functions are first processed as specs and the results are passed as arguments. This appears straightforward but, of course, isn't.

What really happens is that the string between the parentheses of the spec function is processed as a single spec string and then, when done processing, all the whitespace delimited words are passed as individual arguments. Unless you are careful you won't know how many arguments your function will actually get.

Let's change the echo spec function to print the number of arguments as well as the arguments.

static const char *
echo_spec_func (int argc, const char **argv)
{
  fprintf (stderr, "There are %d args! ", argc);

  for (int i = 0; i < argc; ++i)
    fprintf (stderr, "%s ", argv[i]);

  fprintf (stderr, "\n");
  return NULL;
}

Here's an example from the previous article to illustrate the importance of understanding that you are not passing arguments, but rather you are providing a spec string that will turn into arguments.

# There is a space after the word "middle"
*stuff:
%{dummy} middle 

.xx:
./test.sh %:echo(BEGIN%(stuff)END)
% ~/tmp/opt/bin/gcc --specs=local.specs -c t.xx
There are 3 args! BEGIN middle END 

Quoting will not help because there is no notion of a value in the spec language: it's just string substitution. The best you can do is backslash escape a space, and that won't always work.

But wait! It gets worse. Here's another example from the previous article.

.xx:
./test.sh %:echo(/%{O*:%*/opt}/%{g*:%*/debug})

How many arguments do you think will be passed to the echo spec function if no -O or -g arguments are given to the driver? Based on all the discussion so far it should be one. And this is correct. Notch a tally for predictability (and forgive the subject-verb disagreement in the output).

% gcc --specs=local.specs -c t.xx
There are 1 args! //

Now, how about this?

% gcc --specs=local.specs -c t.xx -O1 -g3

In the previous article the /%{O*:%*/opt}/%{g*:%*/debug} expanded to /1/opt /3/debug so two arguments would be a reasonable guess. It turns out that is half right. The actual answer is four.

% gcc --specs=local.specs -c t.xx -O1 -g3
There are 4 args! /1 /opt /3 /debug 

This is ultimately attributed to the code at the end of do_spec_1. It calls end_going_arg before exiting if it is processing the argument string in a spec function context. Each time do_spec_1 is called it might end the argument and put the current string in the argument vector. This is not the case during normal spec processing.

If you think that's bad, get ready for a surprise. What follows is an attempt to echo all the -g options passed to the driver.

.xx:
./test.sh %:echo(%{g*})
% gcc --specs=local.specs -c t.xx -g -g3
There are 4 args! - g - g3 

This is wholly unexpected. The reason it happens is that do_spec_1 is called to add a single hyphen-minus ('-') to the argument stack. Internally the leading '-' is not stored in the parsed argument (known as a switch), so to print the option is has to manually add it. But because we're processing a spec function and the arg is "going", it becomes an argument by itself.

Another caution

One final note on spec processing: the conditional directive. Suppose you have a spec function called analyze-opts that analyzes some options and returns a meaningful, non-NULL value when some condition is met. You decide to use it in a conditional, like this.

%{O*: --optimize; %:analyze-opts(%{g*}): --debug; :--default}

If we pass -O2 as an option the call to analyze-opts—and the processing of its return value—will still happen. In other words, conditionals are not short-circuiting. If analyze-opts manipulates some state behind the scenes you will need to take this into account.

I wouldn't call this really weird, but it's not usual.

Outright abuse

We've seen a lot with specs at this point, and some of it is just weird. Now we're going to be underhanded and push this in ways it was never meant to go.

If you write your spec function in the driver source you have access to all its functions and data. That means you can do side-effects and change specs at runtime. Here's an example that uses named specs as variables.

The spec function is set. The first argument is the name of the "variable" and the rest of the arguments are the value. This uses a data structure known as an obstack to hold the string being created. There is an obstack available and initialized in the driver source file by the time any spec function is called. If only one argument is given it sets the variable to the empty string.

static const char *
set_spec_func (int argc, const char **argv)
{
  if (argc == 1)
    {
      set_spec (argv[0], "", true);
      return NULL;
    }

  for (int i = 1; i < argc; ++i)
    {
      obstack_grow (&obstack, argv[i], strlen (argv[i]));
      obstack_1grow (&obstack, ' ');
    }

  obstack_1grow (&obstack, '\0');
  const char *val = XOBFINISH (&obstack, const char *);
  set_spec (argv[0], val, true);
  return NULL;
}

Now you can capture values from %* and use them in nested specs to make cross products, just like I'm sure you've always dreamt of doing.

.xx:
./test.sh %{O*:%:set(x %*)%{O*:(%(x), %*)}}

% gcc --specs=local.specs -c t.xx -O1 -O2 -O3
(1 , 1) (1 , 2) (1 , 3) (2 , 1) (2 , 2) (2 , 3) (3 , 1) (3 , 2) (3 , 3)

Let's abuse the fact that a return value is processed as a spec and have the echo spec function return a call to itself, making it recursive.

static const char *
echo_spec_func (int argc, const char **argv)
{
  if (argc == 0)
    return NULL;

  fprintf (stderr, "%s\n", argv[0]);
  obstack_grow (&obstack, "%:echo(", (sizeof ("%:echo(") - 1));

  for (int i = 1; i < argc; ++i)
    {
      obstack_grow (&obstack, argv[i], strlen (argv[i]));
      obstack_1grow (&obstack, ' ');
    }

  obstack_grow0 (&obstack, ")", 1);
  return XOBFINISH (&obstack, const char *);
}
.xx:
./test.sh %:echo(%{O*})
% gcc --specs=local.specs -c t.xx -O1 -O2 -O3
-
O1
-
O2
-
O3

Sure, you could just iterate over the arguments and print them, but where's the fun in that?

Implementing factorial and/or Fibonacci is left as an exercise for the reader.

Appendix: Adding a spec function to GCC, the easy way

I'm going to assume you have some experience installing packages on whatever system you use and that you're familiar with building C/C++ code. You can look at the GCC installation prerequisites if you want all the names of things you'll need installed. There is a decent OSDev wiki page that gives the packages to install on different systems. I'm not going to reproduce all that here. However, you do not need to install GMP, MPC, MPFR, Cloog, or ISL. You should only need to install a compiler, Make, Bison, Flex, and maybe Texinfo.

Once you have those installed, clone the GCC repository or download the source. Unless you are getting serious about this, you don't need to do anything beyond a basic cloning of the repo. If you download the sources, it's good to get a recent version.

I'm going to assume that the code is in ~/src/gcc for the purposes of the following discussion.

The initial step is to download the library prerequisites. This done by changing to the source directory and running the script that GCC comes with to get the correct versions.

% cd ~/src/gcc
% ./contrib/download_prerequisites

When this is done the source tree will contain the sources to the necessary libraries and GCC's build will automatically detect them as required.

Next, create a build directory and work in there. (Do not build GCC in-tree. Just don't.)

% rm -rf ~/tmp/build && mkdir ~/tmp/build && cd ~/tmp/build

Next, you have to configure GCC. There are many, many configuration options. We don't care about most of them. What we want is a native compiler without any other system support (meaning if we are on a 64-bit system we don't want 32-bit support), we want it installed in a custom location, we want minimal language support (because we don't actually care about compiling anything), and we want the build to be reasonably fast.

A simple way to get this is with the following options.

% ~/src/gcc/configure --prefix=${HOME}/tmp/opt \
   --disable-multilib \
   --enable-languages=c \
   --with-pkgversion="Local custom compiler"

This only builds for the C language avoiding all the other compiler builds. It also sets the version string to contain "Local custom compiler" so you can verify easily that you are using the correct driver.

Assuming the configuration succeeds, run the following command to build and install gcc. Adjust the -j argument as required.

% make -j12 all-gcc && make install-gcc

You will find the driver at ${HOME}/tmp/opt/bin/gcc. Run the above command each time you make a change to the driver. The initial build can take a while, but subsequent builds should be quick.

% ~/tmp/opt/bin/gcc --version
gcc (Local custom compiler) 14.0.0 20231211 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

NaN, NaN

comment@wozniak.ca

Generated on 2024-08-25