index

Templating in C

2015-02-28

The question of how to do templating in C is often raised, and I couldn't find a good overview of the different approaches so I'll try to make a relatively exhaustive list here.

Note: the methods presented here rely on the C preprocessor, which you can use directly yourself by calling the cpp command, paste your code and see the output by sending an EOS (pressing control + d typically).

Simple C macro

So the most common method is to dump some code enclosed (or not) into the do { ... } while (0) form:

#define DO_RANDOM_STUFF(type) do {      \
    int i;                              \
    type *p = buf;                      \
                                        \
    for (i = 0; i < len; i++)           \
        p[i] = p[i] * k;                \
} while (0)

... and using it directly:

int func(void *buf, int len, float k, int request)
{
    if      (request == INT8)   DO_RANDOM_STUFF(int8_t);
    else if (request == INT16)  DO_RANDOM_STUFF(int16_t);
    else if (request == INT32)  DO_RANDOM_STUFF(int32_t);
}

Simple C macro, full function

It is also common to create the whole function that way:

#define DECLARE_FUNC(n)                                 \
static void func_##n(int##n##_t *p, int len, float k)   \
{                                                       \
    int i;                                              \
                                                        \
    for (i = 0; i < len; i++)                           \
        p[i] = p[i] * k;                                \
}

DECLARE_FUNC(8)
DECLARE_FUNC(16)
DECLARE_FUNC(32)

... which you will then use by simply calling func_8(), func_16() and func_32().

Alternative function creator

So far we used the templating just to avoid typing redundancy, but it is sometimes motivated by performance. Let's observe the following pattern:

int process_image(void *img, int width, int height, const int n)
{
    int x, y;

    for (y = 0; y < height; y++) {
        for (x = 0; x < width; x++) {
            if      (n == 0) foo(img, x, y);
            else if (n == 1) bar(img, x, y);
            else             baz(img, x, y);
        }
    }
}

We will assume here that foo(), bar() and baz() functions are meant to be inlined for performance reasons (be it justified or not, we assume you do not want a per-pixel function call), so you do not want to use an array of function pointers and do func_lut[n](img, x, y) in the inner loop. We could also consider the scenario where the functions take a completely different set of parameters.

Your compiler will sometimes be smart enough to see that the n check in the inner loop can instead enclose the whole logic just as if you had written this:

int process_image(void *img, int width, int height, const int n)
{
    int x, y;

    if (n == 0)
        for (y = 0; y < height; y++)
            for (x = 0; x < width; x++)
                foo(img, x, y);
    else if (n == 1)
        for (y = 0; y < height; y++)
            for (x = 0; x < width; x++)
                bar(img, x, y);
    else
        for (y = 0; y < height; y++)
            for (x = 0; x < width; x++)
                baz(img, x, y);
}

Unfortunately, it is very likely that your code is much more complex (otherwise you could even just put the 2 loops into your inner functions) which will cause your compiler not to do it.

One solution for this is to create wrapper for each scenario. It would look like this:

static inline int process_image(void *img, int width, int height, const int n)
{
    int x, y;

    for (y = 0; y < height; y++) {
        for (x = 0; x < width; x++) {
            if      (n == 0) foo(img, x, y);
            else if (n == 1) bar(img, x, y);
            else             baz(img, x, y);
        }
    }
}

int process_image_foo(void *img, int width, int height)
{
    return process_image(img, width, height, 0);
}

int process_image_bar(void *img, int width, int height)
{
    return process_image(img, width, height, 1);
}

int process_image_baz(void *img, int width, int height)
{
    return process_image(img, width, height, 2);
}

Notice how the process_image() function has been marked as static inline in order to make sure it is fully inlined in each wrapper.

By doing so, the compiler can very easily see the dead paths (because n is now a constant in the wrappers) and generate 3 standalone functions with zero call nor inner condition. These functions can now be put into a function lookup table mapped to n (process_image_lut[n](img, width, height)).

One great thing about this method is that you can also do type-specific code in each of the function (by interpreting img as int16_t, float, uint64_t, etc for example).

As a side effect, you will notice that the main logic is not under a huge hardly maintainable macro, and the redundancy overhead is also very small.

Note: you might want to rely on __attribute__((always_inline)) if your compiler supports it and you want the inline to be effective at every optimization level.

Mixing full functions mechanism and macros

If the redundancy overhead in the previous example is still too much for you, you can mix it with a small macro mechanism:

static inline int process_image(void *img, int width, int height, const int n)
{
    int x, y;

    for (y = 0; y < height; y++) {
        for (x = 0; x < width; x++) {
            if      (n == 0) foo(img, x, y);
            else if (n == 1) bar(img, x, y);
            else             baz(img, x, y);
        }
    }
}

#define DECLARE_PROCESS_IMAGE_FUNC(name, n)                 \
int process_image_##name(void *img, int width, int height)  \
{                                                           \
    return process_image(img, width, height, n);            \
}

DECLARE_PROCESS_IMAGE_FUNC(foo, 0)
DECLARE_PROCESS_IMAGE_FUNC(bar, 1)
DECLARE_PROCESS_IMAGE_FUNC(baz, 2)

This avoids the painful redundancy and still benefits from the same advantages as previously.

FFmpeg's XBR filter and paletteuse filter are a few real cases examples.

External file

One alternative to all of this is to use external files.

The logic is pretty simple; let's start with the content of caller.c:

#include <stdint.h>

#define TEMPLATE_U16
#include "evil_template.c"
#undef TEMPLATE_U16

#define TEMPLATE_U32
#include "evil_template.c"
#undef TEMPLATE_U32

#define TEMPLATE_FLT
#include "evil_template.c"
#undef TEMPLATE_FLT

#define TEMPLATE_DBL
#include "evil_template.c"
#undef TEMPLATE_DBL

The content of evil_template.c could look like something like this:

#if defined(TEMPLATE_U16)

#    define RENAME(N)   N ## _u16
#    define TYPE        uint16_t
#    define SUM_TYPE    uint32_t
#    define XDIV(x, n)  (((x) + ((1<<(n))-1)) >> (n))

#elif defined(TEMPLATE_U32)

#    define RENAME(N)   N ## _u32
#    define TYPE        uint32_t
#    define SUM_TYPE    uint64_t
#    define XDIV(x, n)  (((x) + ((1<<(n))-1)) >> (n))

#elif defined(TEMPLATE_FLT)

#    define RENAME(N)   N ## _flt
#    define TYPE        float
#    define SUM_TYPE    float
#    define XDIV(x, n)  ((x) / (float)(1<<(n)))

#elif defined(TEMPLATE_DBL)

#    define RENAME(N)   N ## _dbl
#    define TYPE        double
#    define SUM_TYPE    double
#    define XDIV(x, n)  ((x) / (double)(1<<(n)))

#endif

TYPE RENAME(func)(const TYPE *p, int n)
{
    int i;
    SUM_TYPE sum = 0;

    for (i = 0; i < 1<<n; i++)
        sum += p[i];
    return XDIV(sum, n);
}

#undef RENAME
#undef TYPE
#undef SUM_TYPE
#undef XDIV

This will produce the following functions:

% gcc -Wall -c caller.c -o caller.o && readelf -s caller.o | grep func
 9: 0000000000000000   110 FUNC    GLOBAL DEFAULT    1 func_u16
10: 000000000000006e   119 FUNC    GLOBAL DEFAULT    1 func_u32
11: 00000000000000e5   128 FUNC    GLOBAL DEFAULT    1 func_flt
12: 0000000000000165   131 FUNC    GLOBAL DEFAULT    1 func_dbl

This way of templating could be relatively handy if you have a set of functions in the template but can be considered by many as the root of all evils. Nevertheless, this is still a few order of magnitude better than C++ templating.

Note: make sure your build system properly handles the dependency of evil_template.c from caller.c.

Resampling code in libswresample (which inspired this example) is one of the many real case example.

Conclusion

Mixing the full functions mechanism and macros covers most of the cases and can lead to very decent code with high level of readability and maintenance.

For more complex cases, you also can simply rely on another language, generally a scripting one (but you can use C to generate C as well of course). This can often be the most appropriate approach if the complexity is increasing and your project already has a dependency on such a language.

Edit: As pointed by someone on HN, it is also worth mentioning __Generic in C11 that can be used for similar purpose if your project can afford the dependency on C11.


For updates and more frequent content you can follow me on Mastodon. Feel also free to subscribe to the RSS in order to be notified of new write-ups. It is also usually possible to reach me through other means (check the footer below). Finally, discussions on some of the articles can sometimes be found on HackerNews, Lobste.rs and Reddit.

index