115

I am reading K&R's “The C Programming Language” and came across this statement [Introduction, p. 3]:

Because the data types and control structures provided by C are supported directly by most computers, the run-time library required to implement self-contained programs is tiny.

What does the bolded statement mean? Is there an example of a data type or a control structure that isn't supported directly by a computer?

10
  • 1
    These days, the C language does support complex arithmetic, but originally it didn't because computers don't directly support complex numbers as data types. Jan 16, 2015 at 4:59
  • 12
    Actually, it was historically the other way round: C was designed from the hardware operations & types available at that time. Jan 16, 2015 at 5:49
  • 2
    Most computers have no direct hardware support for decimal floats
    – PlasmaHH
    Jan 16, 2015 at 10:14
  • 3
    @MSalters: I was trying to hint into some direction for the question of "Is there an example of a data type or a control structure that isn't supported directly by a computer?" which I did not interpret to be limited to K&R
    – PlasmaHH
    Jan 16, 2015 at 11:05
  • 11
    How is this not a duplicate more than 6 years after Stack Overflow launched? Jan 16, 2015 at 16:42

12 Answers 12

146

Yes, there are data types not directly supported.

On many embedded systems, there is no hardware floating point unit. So, when you write code like this:

float x = 1.0f, y = 2.0f;
return x + y;

It gets translated into something like this:

unsigned x = 0x3f800000, y = 0x40000000;
return _float_add(x, y);

Then the compiler or standard library has to supply an implementation of _float_add(), which takes up memory on your embedded system. If you're counting bytes on a really tiny system, this can add up.

Another common example is 64-bit integers (long long in the C standard since 1999), which are not directly supported by 32-bit systems. Old SPARC systems didn't support integer multiplication, so multiplication had to be supplied by the runtime. There are other examples.

Other languages

By comparison, other languages have more complicated primitives.

For example, a Lisp symbol requires a lot of runtime support, just like tables in Lua, strings in Python, arrays in Fortran, et cetera. The equivalent types in C are usually either not part of the standard library at all (no standard symbols or tables) or they are much simpler and don't require much runtime support (arrays in C are basically just pointers, nul-terminated strings are almost as simple).

Control structures

A notable control structure missing from C is exception handling. Nonlocal exit is limited to setjmp() and longjmp(), which just save and restore certain parts of processor state. By comparison, the C++ runtime has to walk the stack and call destructors and exception handlers.

8
  • 2
    basically just pointers... rather, basically just raw chunks of memory. Even if that's nit-picking, and the answer is good anyway. Jan 16, 2015 at 14:07
  • 2
    You could argue that null terminated strings have "hardware support" as the string terminator fits the 'jump if zero' operation of most processors and thus is slightly faster than other possible implementations of strings.
    – Peteris
    Jan 17, 2015 at 12:03
  • 1
    Posted my own answer to expand on how C is designed to map simply to asm. Jan 17, 2015 at 15:59
  • 1
    Please don't use the collocation "arrays are basically just pointers", it can seriously, badly mislead a beginner like OP. Something along the lines of "arrays are directly implemented using pointers at the hardware level" would be better IMO. Jan 19, 2015 at 20:30
  • 1
    @TheParamagneticCroissant: I think in this context it's appropriate... clarity comes at the cost of precision. Jan 19, 2015 at 20:53
35

Actually, I'll bet that the contents of this introduction haven't changed much since 1978 when Kernighan and Ritchie first wrote them in the First Edition of the book, and they refer to the history and evolution of C at that time more than modern implementations.

Computers are fundamentally just memory banks and central processors, and each processor operates using a machine code; part of the design of each processor is an instruction set architecture, called an Assembly Language, which maps one-to-one from a set of human-readable mnemonics to machine code, which is all numbers.

The authors of the C language – and the B and BCPL languages that immediately preceded it – were intent upon defining constructs in the language that were as efficiently compiled into Assembly as possible ... in fact, they were forced to by limitations in the target hardware. As other answers have pointed out, this involved branches (GOTO and other flow control in C), moves (assignment), logical operations (& | ^), basic arithmetic (add, subtract, increment, decrement), and memory addressing (pointers). A good example is the pre-/post-increment and decrement operators in C, which supposedly were added to the B language by Ken Thompson specifically because they were capable of translating directly to a single opcode once compiled.

This is what the authors meant when they said "supported directly by most computers". They didn't mean that other languages contained types and structures that were not supported directly - they meant that by design C constructs translated most directly (sometimes literally directly) into Assembly.

This close relation to the underlying Assembly, while still providing all the elements required for structured programming, are what led to C's early adoption, and what keep it a popular language today in environments where efficiency of code compiled is still key.

For an interesting write-up of the history of the language, see The Development of the C Language - Dennis Ritchie

14

The short answer is, most of the language constructs supported by C are also supported by the target computer's microprocessor, therefore, compiled C code translates very nicely and efficient to the microprocessor's assembly language, thereby resulting in smaller code and a smaller footprint.

The longer answer requires a little bit of assembly language knowledge. In C, a statement such as this:

int myInt = 10;

would translate to something like this in assembly:

myInt dw 1
mov myInt,10

Compare this to something like C++:

MyClass myClass;
myClass.set_myInt(10);

The resulting assembly language code (depending on how big MyClass() is), could add up to hundreds of assembly language lines.

Without actually creating programs in assembly language, pure C is probably the "skinniest" and "tightest" code you can make a program in.

EDIT

Given the comments on my answer, I decided to run a test, just for my own sanity. I created a program called "test.c", which looked like this:

#include <stdio.h>

void main()
{
    int myInt=10;

    printf("%d\n", myInt);
}

I compiled this down to assembly using gcc. I used the following command line to compile it:

gcc -S -O2 test.c

Here is the resulting assembly language:

    .file   "test.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "%d\n"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB1:
    .section    .text.startup,"ax",@progbits
.LHOTB1:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB24:
    .cfi_startproc
    movl    $10, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    jmp __printf_chk
    .cfi_endproc
.LFE24:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE1:
    .section    .text.startup
.LHOTE1:
    .ident  "GCC: (Ubuntu 4.9.1-16ubuntu6) 4.9.1"
    .section    .note.GNU-stack,"",@progbits

I then create a file called "test.cpp" which defined a class and outputted the same thing as "test.c":

#include <iostream>
using namespace std;

class MyClass {
    int myVar;
public:
    void set_myVar(int);
    int get_myVar(void);
};

void MyClass::set_myVar(int val)
{
    myVar = val;
}

int MyClass::get_myVar(void)
{
    return myVar;
}

int main()
{
    MyClass myClass;
    myClass.set_myVar(10);

    cout << myClass.get_myVar() << endl;

    return 0;
}

I compiled it the same way, using this command:

g++ -O2 -S test.cpp

Here is the resulting assembly file:

    .file   "test.cpp"
    .section    .text.unlikely,"ax",@progbits
    .align 2
.LCOLDB0:
    .text
.LHOTB0:
    .align 2
    .p2align 4,,15
    .globl  _ZN7MyClass9set_myVarEi
    .type   _ZN7MyClass9set_myVarEi, @function
_ZN7MyClass9set_myVarEi:
.LFB1047:
    .cfi_startproc
    movl    %esi, (%rdi)
    ret
    .cfi_endproc
.LFE1047:
    .size   _ZN7MyClass9set_myVarEi, .-_ZN7MyClass9set_myVarEi
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .section    .text.unlikely
    .align 2
.LCOLDB1:
    .text
.LHOTB1:
    .align 2
    .p2align 4,,15
    .globl  _ZN7MyClass9get_myVarEv
    .type   _ZN7MyClass9get_myVarEv, @function
_ZN7MyClass9get_myVarEv:
.LFB1048:
    .cfi_startproc
    movl    (%rdi), %eax
    ret
    .cfi_endproc
.LFE1048:
    .size   _ZN7MyClass9get_myVarEv, .-_ZN7MyClass9get_myVarEv
    .section    .text.unlikely
.LCOLDE1:
    .text
.LHOTE1:
    .section    .text.unlikely
.LCOLDB2:
    .section    .text.startup,"ax",@progbits
.LHOTB2:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB1049:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $10, %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE1049:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE2:
    .section    .text.startup
.LHOTE2:
    .section    .text.unlikely
.LCOLDB3:
    .section    .text.startup
.LHOTB3:
    .p2align 4,,15
    .type   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi, @function
_GLOBAL__sub_I__ZN7MyClass9set_myVarEi:
.LFB1056:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $_ZStL8__ioinit, %edi
    call    _ZNSt8ios_base4InitC1Ev
    movl    $__dso_handle, %edx
    movl    $_ZStL8__ioinit, %esi
    movl    $_ZNSt8ios_base4InitD1Ev, %edi
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    jmp __cxa_atexit
    .cfi_endproc
.LFE1056:
    .size   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi, .-_GLOBAL__sub_I__ZN7MyClass9set_myVarEi
    .section    .text.unlikely
.LCOLDE3:
    .section    .text.startup
.LHOTE3:
    .section    .init_array,"aw"
    .align 8
    .quad   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi
    .local  _ZStL8__ioinit
    .comm   _ZStL8__ioinit,1,1
    .hidden __dso_handle
    .ident  "GCC: (Ubuntu 4.9.1-16ubuntu6) 4.9.1"
    .section    .note.GNU-stack,"",@progbits

As you can clearly see, the resulting assembly file is much larger on the C++ file then it is on the C file. Even if you cut out all the other stuff and just compare the C "main" to the C++ "main", there is a lot of extra stuff.

2
  • 14
    That "C++ code" just isn't C++. And real code such as MyClass myClass { 10 } in C++ is very likely to compile to exactly the same assembly. Modern C++ compilers have eliminated the abstraction penalty. And as a result, they can often beat C compilers. E.g. the abstraction penalty in C's qsort is real, but C++'s std::sort has no abstraction penalty after even basic optimization.
    – MSalters
    Jan 16, 2015 at 11:09
  • 1
    You can easily see using IDA Pro that most C++ constructs compile down to the same thing as doing it manually in C, constructors and dtors get inlined for trivial objects, then future optimization is applied
    – paulm
    Jan 18, 2015 at 17:08
7

K&R mean that most C expressions (technical meaning) map to one or a few assembly instructions, not a function call to a support library. The usual exceptions are integer division on architectures without a hardware div instruction, or floating point on machines with no FPU.

There's a quote:

C combines the flexibility and power of assembly language with the user-friendliness of assembly language.

(found here. I thought I remembered a different variation, like "speed of assembly language with the convenience and expressivity of assembly language".)

long int is usually the same width as the native machine registers.

Some higher level languages define the exact width of their data types, and implementations on all machines must work the same. Not C, though.

If you want to work with 128bit ints on x86-64, or in the general case BigInteger of arbitrary size, you need a library of functions for it. All CPUs now use 2s complement as the binary representation of negative integers, but even that wasn't the case back when C was designed. (That's why some things that would give different results on non 2s-complement machines are technically undefined in the C standards.)

C pointers to data or to functions work the same way as assembly addresses.

If you want ref-counted references, you have to do it yourself. If you want c++ virtual member functions that call a different function depending on what kind of object your pointer is pointing to, the C++ compiler has to generate a lot more than just a call instruction with a fixed address.

Strings are just arrays

Outside of library functions, the only string operations provided are read/write a character. No concat, no substring, no search. (Strings are stored as nul-terminated ('\0') arrays of 8bit integers, not pointer+length, so to get a substring you'd have to write a nul into the original string.)

CPUs sometimes have instructions designed for use by a string-search function, but still usually process one byte per instruction executed, in a loop. (or with the x86 rep prefix. Maybe if C was designed on x86, string search or compare would be a native operation, rather than a library function call.)

Many other answers give examples of things that aren't natively supported, like exception handling, hash tables, lists. K&R's design philosophy is the reason C doesn't have any of these natively.

3
6

The assembly language of a process generally deals with jump (go to), statements, move statements, binary arthritic (XOR, NAND, AND OR, etc), memory fields (or address). Categorizes memory into two types, instruction and data. That is about all an assembly language is (I am sure assembly programmers will argue there is more to it than that, but it boils down to this in general). C closely resembles this simplicity.

C is to assemble what algebra is to arithmetic.

C encapsulates the basics of assembly (the processor's language). Is probably a truer statement than "Because the data types and control structures provided by C are supported directly by most computers"

5

Beware of misleading comparisons

  1. The statement relies on the notion of a "run-time library", which has mostly gone out of fashion since, at least for mainstream high-level languages. (It is still relevant for the smallest embedded systems.) The run-time is the minimal support a program in that language requires to execute when you use only constructs built into the language (as opposed to explicitly calling a function provided by a library).
  2. In contrast, modern languages tend not to discriminate between the run-time and the standard library, the latter often being quite extensive.
  3. At the time of the K&R book, C did not even have a standard library. Rather, the available C libraries differed quite a bit between different flavors of Unix.
  4. For understanding the statement you should not compare to languages with a standard library (such as Lua and Python mentioned in other answers), but to languages with more built-in constructs (such as old-day LISP and old-day FORTRAN mentioned in other answers). Other examples would be BASIC (interactive, like LISP) or PASCAL (compiled, like FORTRAN) which both have (among other things) input/output features built right into the language itself.
  5. In contrast, there is no standard way to get the computation results out from a C program that is using only the run-time, not any library.
1
  • On the other hand, most modern languages run inside of dedicated runtime environments that provide facilities like garbage collection.
    – Nate C-K
    Jan 20, 2015 at 22:27
5

Is there an example of a data type or a control structure that isn't supported directly by a computer?

All the fundamental data types and their operations in the C language can be implemented by one or a few machine-language instructions without looping -- they are directly supported by the (practically every) CPU.

Several popular data types and their operations require dozens of machine-language instructions, or require iterating of some runtime loop, or both.

Many languages have special abbreviated syntax for such types and their operations -- using such data types in C generally requires typing a lot more code.

Such data types, and operations include:

  • arbitary-length text string manipulation -- concatenation, substring, assigning a new string to a variable initialized with some other string, etc. ('s = "Hello World!"; s = (s + s)[2:-2]' in Python)
  • sets
  • objects with nested virtual destructors, as in C++ and every other object-oriented programming language
  • 2D matrix multiplication and division; solving linear systems ( "C = B / A; x = A\b" in MATLAB and many array programming languages)
  • regular expressions
  • variable-length arrays -- in particular, appending an item to the the end of the array, which (sometimes) requires allocating more memory.
  • reading the value of variables that change type at runtime -- sometimes it's a float, other times it's a string
  • associative arrays (often called "maps" or "dictionaries")
  • lists
  • ratios ( "(+ 1/3 2/7)" gives "13/21" in Lisp)
  • arbitrary-precision arithmetic (often called "bignums")
  • converting data into a printable representation (the ".tostring" method in JavaScript)
  • saturating fixed-point numbers (often used in embedded C programs)
  • evaluating a string typed in at run time as though it were an expression ("eval()" in many programming languages).

All of these operations require dozens of machine-language instructions or require iterating some runtime loop on nearly every processor.

Some popular control structures that also require dozens of machine-language instructions or looping include:

  • closures
  • continuations
  • exceptions
  • lazy evaluation

Whether written in C or some other language, when a program manipulates such data types, the CPU must eventually execute whatever instructions are required to manipulate those data types. Those instructions are often contained in a "library". Every programming language, even C, has a "run-time library" for each platform that is included by default in every executable.

Most people who write compilers put the instructions for manipulating all the data types that are "built into the language" into their run-time library. Because C doesn't have any of the above data types and operations and control structures built into the the language, none of them are included in the C run-time library -- which makes the C run-time library smaller than the run-time library of other programming languages that have more of the above stuff built-in to the language.

When a programmer want a program -- in C or any other language of his choice -- to manipulate other data types that are not "built into the language", that programmer generally tells the compiler to include additional libraries with that program, or sometimes (to "avoid dependencies") writes yet another implementation of those operations directly in the program.

1
  • If your implementation of Lisp evaluates (+ 1/3 2/7) as 3/21, I think you must have a particularly creative implementation...
    – RobertB
    Jan 20, 2015 at 21:59
4

What are the built-in data types in C? They are things like int, char, * int, float, arrays etc... These data types are understood by the CPU. The CPU knows how to work with arrays, how to dereference pointers and how to perform arithmetic on pointers, integers and floating point numbers.

But when you go to higher level programming languages you have built in abstract datatypes and more complex constructs. For example look at the vast array of built-in classes in the C++ programming language. The CPU doesn't understand classes, objects or abstract datatypes, so the C++ run-time bridges the gap between the CPU and the language. These are examples of datatypes not directly supported by most computers.

3
  • 2
    x86 knows to work with some arrays, but not all. For big or unusual element sizes, it will need to perform integer arithmetic to convert an array index into a pointer offset. And on other platforms, this is always needed. And the idea that the CPU doesn't understand C++ classes is laughable. That's just pointer offsets, like C structs. You don't need a runtime for that.
    – MSalters
    Jan 16, 2015 at 11:14
  • @MSalters yes, but the actual methods of the standard library classes like iostreams etc are library functions rather than being directly supported by the compiler. However, the higher-level languages they were likely comparing it to was not C++, but contemporary languages such as FORTRAN and PL/I.
    – Random832
    Jan 16, 2015 at 15:46
  • 1
    C++ classes with virtual member functions translate into a lot more than just an offset into a struct. Jan 17, 2015 at 15:09
3

It depends on the computer. On the PDP-11, where C was invented, long was poorly supported (there was an optional add-on module you could buy that supported some, but not all, 32-bit operations). The same is true to various degrees on any 16-bit system, including the original IBM PC. And likewise for 64-bit operations on 32-bit machines or in 32-bit programs, though the C language at the time of the K&R book did not have any 64-bit operations at all. And of course there have been many systems throughout the 80s and 90s [including the 386 and some 486 processors], and even some embedded systems today, that did not directly support floating point arithmetic (float or double).

For a more exotic example, some computer architectures only support "word-oriented" pointers (pointing at a two-byte or four-byte integer in memory), and byte pointers (char * or void *) had to be implemented by adding an extra offset field. This question goes into some detail about such systems.

The "run-time library" functions it refers to are not the ones you will see in the manual, but functions like these, in a modern compiler's runtime library, which are used to implement the basic type operations that are not supported by the machine. The runtime library that K&R themselves were referring to can be found on The Unix Heritage Society's website - you can see functions like ldiv (distinct from the C function of the same name, which did not exist at the time) which is used to implement division of 32-bit values, which the PDP-11 did not support even with the add-on, and csv (and cret also in csv.c) which save and restore registers on the stack to manage calls and returns from functions.

They were likely also referring to their choice to not support many data types that aren't directly supported by the underlying machine, unlike other contemporary languages such as FORTRAN, which had array semantics that did not map as well to the CPU's underlying pointer support as C's arrays. The fact that C arrays are always zero-indexed and always of known size in all ranks but the first means that there is no need to store the index ranges or sizes of the arrays, and no need to have runtime library functions to access them - the compiler can simply hardcode the necessary pointer arithmetic.

3

The statement simply means that the data and control structures in C are machine-oriented.

There are two aspects to consider here. One is that the C language has a definition (ISO standard) which allows latitude in how the data types are defined. This means that C language implementations are tailored to the machine. The data types of a C compiler match what is available in the machine which the compiler targets, because the language has latitude for that. If a machine has an unusual word size, like 36 bits, then the type int or long can be made to conform to that. Programs which assume that int is exactly 32 bits will break.

Secondly, because of such portability problems, there is a second effect. In a way, the statement in the K&R has become a sort of self-fulfilling prophesy, or perhaps in reverse. That is to say, implementors of new processors are aware of the keen need for supporting C compilers, and they know that there exists a lot of C code which assumes that "every processor looks like an 80386". Architectures are designed with C in mind: and not only C in mind, but with common misconceptions about C portability in mind also. You simply can't introduce a machine with 9 bit bytes or whatever for general purpose use any more. Programs which assume that the type char is exactly 8 bits wide will break. Only some programs written by portability experts will continue to work: likely not enough to pull together a complete system with a toolchain, kernel, user space and useful applications, with reasonable effort. In other words, C types look like what is available from the hardware because the hardware was made to look like some other hardware for which many nonportable C programs were written.

Is there an example of a data type or a control structure that isn't supported directly by a computer?

Data types not directly supported in many machine languages: multi-precision integer; linked list; hash table; character string.

Control structures not directly supported in most machine languages: first class continuation; coroutine/thread; generator; exception handling.

All of these require considerable run-time support code created using numerous general purpose instructions, and more elementary data types.

C has some standard data types which are not supported by some machines. Since C99, C has complex numbers. They are made out of two floating-point values and made to work with library routines. Some machines have no floating-point unit at all.

With regard to some data types, it is not clear. If a machine has support for addressing memory using one register as a base address, and another as a scaled displacement, does that mean that arrays are a directly supported data type?

Also, speaking of floating-point, there is standardization: IEEE 754 floating-point. Why your C compiler has a double which agrees with the floating-point format supported by the processor is not only because the two were made to agree, but because there is an independent standard for that representation.

2

Things such as

  • Lists Used in almost all functional languages.

  • Exceptions.

  • Associative arrays (Maps) - included in e.g. PHP and Perl.

  • Garbage collection.

  • Data types/control structures included in many languages, but not directly supported by the CPU.

2

Supported directly should be understood as mapping efficiently to the instruction set of the processor.

  • Direct support for integer types is the rule, except for the long (may require extended arithmetic routines) and short sizes (may require masking).

  • Direct support for floating-point types requires an FPU to be available.

  • Direct support for bit fields is exceptional.

  • Structs and arrays require address computation, directly supported to some extent.

  • Pointers are always directly supported via indirect addressing.

  • goto/if/while/for/do are directly supported by unconditional/conditional branches.

  • switch can be directly supported when a jump table applies.

  • Function calls are directly supported by means of the stack features.

Not the answer you're looking for? Browse other questions tagged or ask your own question.