Linux kernel coding style

kagia · on Nov 26, 2014

I doubt everyone agrees to that coding style, I certainly don't. However when submitting code to a project I'd still stick to the prescribed coding style, because I believe consistency in any code base can be just as important as any other measure of readability.

snlacks · on Nov 26, 2014

I find it hard to agree with a lot of this, but it'd obviously be someone who's written a lot of code, thought a lot about how to write code, and reads a lot of code - even if we didn't know who it was. There's a lot to learn from reading stuff like this, if you take it all with a grain of salt... or if you're contributing.

stinos · on Nov 26, 2014

This. It's easier to find lots of people who don't agree with a particular style (I agree with none but mine which is far, far away from the kernel one's for instance and much more readable of course, lol) than people who do, but worse is projects where styles are mixed. Or even tabs and spaces are mixed. Don't get started on that one :] (edit already happened in other comments, of course)

andmarios · on Nov 26, 2014

I think it was written by Torvalds and other kernel hackers. It is part of the Linux source code, under the Documentation directory.

robin_reala · on Nov 26, 2014

Initial commit was by Linus[0], but it looks like most of the other commits[1] have been small amends by other people (apart from a couple of new chapters)

[0] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

cesarb · on Nov 26, 2014

That "initial commit" was the import of the whole kernel tree into git, ignoring any previous history; that commit having been made by Linus does not mean any particular file was written by Linus.

You should instead look into historical repositories like https://archive.org/details/git-history-of-linux, which go further back; however, before Bitkeeper the authorship of each change was not tracked in detail (that historical repository IIRC does not have the detailed Bitkeeper history, but there is another repository somewhere which has it).

robin_reala · on Nov 26, 2014

Interesting, thanks for the link.

cesarb · on Nov 26, 2014

If I recall correctly, it was written by Linus Torvalds a long time ago. I haven't found yet when it originally appeared, but it was probably the pre-Bitkeeper days; I found a discussion about it dated from the last century.

There were many edits to it, by several people; its more recent story can be seen on the kernel git repository. Of course, if Linus disagreed with an edit, it wouldn't go in, so in a way it's still his document.

kps · on Nov 26, 2014

And it's more or less a codification of K&R style.

notacoward · on Nov 26, 2014

Mostly good advice, sometimes even great, but the part about typedefs is total BS. Any non-trivial program will use values that have clearly different meanings but end up being the same C integer type. One's an index, one's a length, one's a repeat count, one's an enumerated value ("enum" was added to the language to support this very usage), and so on. It's stupid that C compilers don't distinguish between any two types that are the same width and signedness; why compound that stupidity? Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result. Being able to change one type easily might also make some future maintainer's life much better. There's practically no downside except for having to look up the type in some situations (to see what printf format specifier to use), but that's a trivial problem compared to those that can result from not using a typedef.

Don't want to use typedefs? I think that's a missed opportunity, but OK. Don't use them. OTOH, anyone who tries to pretend that the bad outweighs the good, or discourage others from using them, is an ass. Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally.

cesarb · on Nov 26, 2014

Most of that section is concerned with hiding structs or pointers as typedefs: "In general, a pointer, or a struct that has elements that can reasonably be directly accessed should _never_ be a typedef."

Say you are reading a function, and see a local variable declared: "something_t variable_name;". Is it a struct, a pointer, or a basic type? Now compare with "struct something * variable_name;", which is clearly a pointer. If on the other hand it is "struct something variable_name;", you know that it's a struct allocated on the very small kernel stack (less than 8KiB per thread) - something which wouldn't be as clear if the fact that it's a struct were hidden by a typedef.

There are three main reasons to use typedefs: to allow for changes to the underlying type; to add new information to the underlying type (which is item (c) in that section); and to hide information. Since the Linux kernel runs in a constrained environment (as I mentioned, the kernel stack is severely limited, among other things), hiding information without a good reason is frowned upon. It's the same reason they use C instead of C++; the C++ language idioms hide more information.

> Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result.

The Linux kernel does that! As I mentioned, it's item (c): "when you use sparse to literally create a _new_ type for type-checking." See for instance the declaration of gfp_t:

  typedef unsigned __bitwise__ gfp_t;

The __bitwise__ is for the Sparse static checker. There are other similar typedefs, like __le32 which holds a little-endian value; the Sparse checker will warn you if used incorrectly (without converting to "native" endian).

notacoward · on Nov 26, 2014

> it's item (c): "when you use sparse to literally create a > _new_ type for type-checking."

The problem is that this is presented as an exception that must be (strongly) justified. I think that using typedefs for integer types should be acceptable by default, and there should be specific rules for when to avoid them. The burden of proof is being put on the wrong side.

Even for structs, the argument for typedefs is stronger than the argument against. Even across a purely internal API, the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever. Therefore they shouldn't need to know in order to write a declaration, which will become stale and need to be changed if the API ever changes. This is basic information hiding, as known since the 60s. Exposing too much reduces modularity and future flexibility. I've been working on kernels for longer than Linus, and the principle still applies there.

Again, it comes down to defaults and burden of proof. The rule should be to forego struct typedefs only if every user provably needs to know that it's a struct and what's in it (which is often a sign of a bad API). Even then, adding a typedef hardly hurts; anyone who needs to know that a "foo_t" is a "struct foo" and can't figure it out in seconds shouldn't be programming in the kernel or anywhere else.

userbinator · on Nov 26, 2014

the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever.

The parent provided a very good example: a structure takes up a lot more space than a single int/pointer type, and passing them by value is usually an unnecessary copy.

and need to be changed if the API ever changes.

If the API changes then changing the declarations is likely to be trivial in comparison to the other changes that would need to be made to all the code using it.

Exposing too much reduces modularity and future flexibility.

...and exposing too little reduces understanding of the details, which I think is far more important especially for a kernel.

notacoward · on Nov 26, 2014

> a structure takes up a lot more space than a single int/pointer type

Not necessarily. Many structures, especially those used to make up for the lack of tuples/lists in C, are very small. The real difference is between large and small objects. Knowing which is which is part of the essential discipline of being a kernel (or embedded) programmer, and is hardly affected by whether or not typedefs are used.

> changing the declarations is likely to be trivial in comparison

That's generally true of pointer typedefs, which is why I don't particularly care for them and said so in another sub-thread. I think it's much less likely to be true for integer/enum or struct/union typedefs. For example, in the integer/enum case, the most common scenario is a change to a parameter's real type without changing its width or sign. The compiler won't flag that, even though it can cause real problems. Giving the compiler more information should be encouraged, not discouraged, even if there are exceptions either way.

> more important especially for a kernel.

Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply? Being able to know is not the same as being forced to know. Kernel programmers are already more burdened than others with concerns that they need to think about for every line. Forcing more at them when it's not necessary doesn't help anyone. If you need to know whether something's a pointer to a struct or an array of something even though you never dereference into either (maybe you just pass it back or onward to another function), then somebody's wasting your precious time. Believe me, I know all about the tighter resource constraints for kernel code. OTOH, the people who worked on the AIX and Solaris kernels still knew and applied this stuff. They didn't have the anti-CS attitude that seems rampant among Linux kernel devs, and IMO they were better for that. If an RTOS for tiny devices can have decent modularity - and I've seen some that do - then why can't a full-blown kernel?

userbinator · on Nov 26, 2014

Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply?

Why do you think that these "software-engineering principles" should apply? I think the fact that the Linux kernel works, and it works quite well, is strong enough evidence that they don't matter.

the people who worked on the AIX and Solaris kernels still knew and applied this stuff.

I don't know about AIX, but there's a reason Solaris has been called "Slowlaris"...

If an RTOS for tiny devices can have decent modularity

But is that modularity actually necessary? I've worked with plenty of overly complex applications that were far more inefficient and harder to understand as a whole than they could be, and most of them were the result of dogmatic adherence to principles of modularity, encapsulation, extensibility, etc. (none of which actually improved anything from the point of view of either the users nor the ones trying to figure out how everything works), so maybe that "anti-CS attitude" is a good thing after all...

jblow · on Nov 26, 2014

Usually when people say stuff like this, they haven't programmed anything as complex and performant as the thing they are criticizing, so the comments can and should be disregarded as noise.

notacoward · on Nov 26, 2014

Nice ad hominem you've got there. Was it directed at me, or my interlocutors? If it was directed at me, it's not only a fallacy but based on a false premise, as I've worked on seven UNIX kernels plus NT since 1989. That includes HA systems, FT systems, supercomputers, etc. so I don't think one can reasonably say I haven't dealt with some complexity before. Maybe we should delve into your experience to see if you know what you're talking about . . . but no, that would be just as fallacious.

There have been some good comments in this thread, but the only "noise" is from those who haven't even tried to present an argument one way or the other. I get that some people would draw the lines between good vs. bad use of typedefs differently than I would. I'm OK with that, as long as there's some kind of rational decision process behind it. The problem is that often there doesn't seem to be. Aesthetic concerns or the trivial difficulty of getting from the typedef to the underlying type do not, in my opinion, stand against the proven benefits of modularity or robust type checking.

dllthomas · on Nov 26, 2014

"Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply?"

It's not that it's a "mystical realm where software-engineering principles don't apply", it's that - like embedded - it's a domain where you often face tighter resource constraints. Applying the same engineering principles with different constraints can lead to different trade-offs, and ultimately different best practices.

DSMan195276 · on Nov 26, 2014

You're not understanding their idea behind typedef. IMO, their lines for usage are extremely good when adhered too.

The note about integers is worth complaining a bit about, I agree there is merit to typedef'ing integers in some situations, but the Kernel standard agrees with that in those instances (And the example is bad, there are instances of 'flag' typedefs in the kernel). In general the note about integers is just to discourage spamming typedef's everywhere.

More importantly then integers though, their note about making opaque objects with a typedef is extremely good practice, as it makes it easy to distinguish when it is or isn't expected that you'll be accessing the members directly.

The point of those rules are to allow typedef to actually be useful and communicate some information. If you just allow typedef'ing everything in every situation, then whether or not something is typedef'd becomes useless information to the reader.

raverbashing · on Nov 26, 2014

"Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally"

Did you even read their explanation. Apparently not.

This is an acceptable use of typedefs, as explained there, exactly because a size_t varies between architectures.

notacoward · on Nov 26, 2014

That's just rationalization. It's basically saying that some typedefs are OK because Linus is used to them, but he doesn't want to take the few seconds to figure out any new ones. The cases for typedefs shouldn't be treated as exceptions. The cases against them should.

raverbashing · on Nov 26, 2014

"some typedefs are OK because Linus is used to them, but he doesn't want to take the few seconds to figure out any new ones."

NO

Because the wrong uses are more numerous than the right ones. It's that simple

Creating typedefs for integers is mostly useless and causes confusion, except in the cases specified.

Of course, if you work with a small project it's easier than with a big project like the kernel.

And of course I admire Linus for cutting through BS and usually avoiding it.

dllthomas · on Nov 26, 2014

I've had luck using single-element structs to distinguish between types of data when I'm throwing a lot of primitive types around. In my test with gcc, the generated code was identical to using the primitives directly, although the standard doesn't actually guarantee that and it's historically not been the case in some particular compilers (not sure which).

bluecalm · on Nov 26, 2014

As a recreational C programmer I have the same impression. I've always used typedefs for structs and enums in my code and I think it makes it more readable and easier to work on. My reaction to reading the kernel style guidelines was a surprise and I am happy I am not the only one disagreeing.

coldpie · on Nov 26, 2014

As a "professional" C/C++ programmer (day job), I don't have strong feelings either way. It is frustrating not knowing what the type of a variable is. Am I being passed a pointer, or an integer, or a floating point value, or a whole struct, or what? This really matters! Digging up the definition isn't difficult, but isn't easy, either. I would lean against using typedefs liberally, but I don't feel strongly about it.

Personally I use typedefs as a shortcut. Rather than type boost::shared_ptr<const MyFavoriteClass> over and over, typedef it to ConstMyFavoriteClassPtr for convenience. Then be consistent with that paradigm through the whole project, so you only have to learn it once to know any given Ptr type.

bluecalm · on Nov 26, 2014

Right, I imagine it's completely different in a project a lot of people are contributing to and 1-2 people thing where I am familiar with the whole codebase and define the typedefs myself.

maguirre · on Nov 26, 2014

I agree with you. I think structs help readability specially we using function pointers within structures. Would it be out of line to suggest a new naming convention for struct typedefs and pointer typedefs i.e _t for typedegs and _tp for typedeg pointers

notacoward · on Nov 26, 2014

While I generally think typedefs for integer/enumerated types and structs/unions are a good idea, I also don't think the arguments I've made apply as much to typedefs for pointers. The difference between an X and a pointer/reference to X is often an explicit part of the contract between modules or functions. If that contract ever changes, the declarations and usage should change in ways beyond replacement of an identifier. That's different than if X itself changes, which usually can and should be transparent. You also get the same type checking for an "X pointer" declaration (avoiding star because of HN mis-formatting issues) as for its "X_ptr" equivalent. Even compilers will flag "pointer to wrong type" errors, even as they remain oblivious to many "wrong integer type" errors. In short, "X_ptr" typedefs don't help anywhere that "X" typedefs don't already.

I'm not going to argue against pointer typedefs, though I personally don't use them. I'm just saying that I can't make a strong argument for them as I believe I can for other cases.

adestefan · on Nov 26, 2014

The argument is that you don't need to resort to naming conventions since the language already supports differentiating them with the struct and the * markings. It's one of the things I fully support. I hate working on code with a billion typedefs for every struct.

robinhoodexe · on Nov 26, 2014

"First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture."

Shots fired.

wsc981 · on Nov 26, 2014

I thought it was a funny read (and I certainly don't agree with everything, even if some snark remarks gave me a good laugh). I wonder if the style guide has been written by Linus, which would make sense to me.

bryal · on Nov 26, 2014

With the target being the, what?, 5 people?, that enjoy following the GNU coding standards.

eyko · on Nov 26, 2014

The Kernel coding style wasn't written a week ago. The first I read it, more than a decade ago, the GNU coding standards did matter and I remember feeling quite hurt by that (in a good way, since I don't think anybody took it that seriously).

Matter of fact, the GNU coding standards still matter (to a certain extent) to many of us, and you would be thankful that they did, since it's the basis that provides consistency among GNU (and non GNU) command line apps, for example.

The GNU coding standards is an extensive document which doesn't only talk about how to write C code, but also how to talks about how to design software consistent with the GNU principles (command line user interfaces, options, stdin/stdout behaviour, etc).

Personally I take the kernel coding style as a whole different thing. It's a short guide on how to write consistent code for the linux kernel. And full of good opinionated choices in my opinion. Its scope is very different from that of the GNU coding standards (which, I'd say, is focused towards writing userland programs which the user will interact with).

Also, remember that GNU wanted (wants?) to create an OS, not just a kernel, so I guess we can read their guidelines as something similar to Apple's human interface guidelines for devs :)

bryal · on Nov 26, 2014

I've always thought of the GNU coding standards as an, IMO, ugly and hardly readable way of formatting C code. I didn't realize there was this much to it.

Thanks for enlightening me.

patrickg · on Nov 26, 2014

I am glad that for Go there is `go fmt` which predefines some of the issues mentioned in the article. Thus there is "one global coding style for Go". It's another matter if one likes it or not.

Dewie · on Nov 26, 2014

I don't see why there couldn't be a `kernel fmt` tool. In this day and age, we should really be beyond having to worry about things like hmm, what was the brace style in this project again, and should all if/while/for have mandatory braces?.

DSMan195276 · on Nov 26, 2014

The kernel has a perl script called 'checkpatch.pl'[0] which can check if code is formatted correctly. The Kernel coding style isn't actually enforced 100% though, which makes it a bit more iffy. Not all the code in the kernel actually follows the same style (IIRC, there's at least one sub-system that uses a slightly different style, I think 'net' maybe?), and so 'checkpatch' is recommended but may not be the be-all end-all in every situation.

[0] https://github.com/torvalds/linux/blob/master/scripts/checkp...

qznc · on Nov 26, 2014

That would be "astyle --style=linux" for example.

maxlybbert · on Nov 26, 2014

Or lindent, which I think is mentioned in the kernel style guide (it's a shell script that calls indent with set parameters).

dezgeg · on Nov 26, 2014

Sadly, running Lindent on almost any existing source file in the tree will produce dozens of spurious diff hunks due to most other people manually formatting their code, so Lindent is quite useless in practice.

It really does bother me how much of the coders' and code reviewers' bandwidth in the kernel community is wasted due to these silly formatting issues. In most IDE-using communities these problems were solved a long time ago, by the IDE autoreformatting your code on commit, with no exceptions.

qznc · on Nov 26, 2014

Linus could easily do "lindent reformat" commit every once in a while or even automate it. It seems they do not care that much for the styleguide?

adestefan · on Nov 26, 2014

Linus shuns commits that do nothing by reformat code. However, if you go in and make a change, then you'll probably get marked down by the maintainer if you don't fix up the formatting at that time.

jackalope · on Nov 26, 2014

"Get a decent editor and don't leave whitespace at the end of lines."

Trailing whitespace always raises a huge red flag for me whenever I look at someone's code. It's not just sloppy, it often makes diff output so noisy you can't detect real changes to the code.

Perseids · on Nov 26, 2014

I understand that it would be bad to introduce whitespace-only changes, but why would whitespace at the end of the line that doesn't break the 80 character limit be a problem otherwise? Sure, git colors them red in its diffs, but that is kind of a circular reasoning.

E.g. in this diff

  0a1
  > k=2 
  2c3,4
  <     print(i)
  ---
  >     k*=k 
  >     print(k)

you don't even see the spaces after "k=2" and "k*=k".

teacup50 · on Nov 26, 2014

Who cares? Get a decent editor that doesn't give a crap if there's invisible whitespace at the end of a line.

edran · on Nov 26, 2014

As the top commenter said, the problems do not end with choosing a good editor and fixing its display methodology. Git and many other VCSs create noisy diffs whenever space is added and forgotten, which ultimately complicates the life of developers that want to review changes. Even if your editor were able to make display diffs in a clean way, you would still have a dirty history when for instance using less/more or other tools (not to mention the merging issues problem).

teacup50 · on Nov 27, 2014

> Git and many other VCSs create noisy diffs whenever space is added and forgotten, which ultimately complicates the life of developers that want to review changes.

That's because Git is stupidly opinionated about end-of-line whitespace, having been written by .... Linus.

pshc · on Nov 26, 2014

Right, but that's only half of the robustness principle.

uniformlyrandom · on Nov 26, 2014

or run perl -e s/(.*)\s+$/$1/g

JBiserkov · on Nov 26, 2014

>Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs.

"Making Wrong Code Look Wrong" by Joel Spolsky is a must-read and contains an explanation of Apps Hungarian (the original, thoughtful one) vs Systems Hungarian http://www.joelonsoftware.com/articles/Wrong.html

humanrebar · on Nov 26, 2014

C is not a strongly typed language and it does not allow function overloading. C projects should allow for some flexibility in naming notations to make up for those language design decisions.

Also, any project that uses int return codes shouldn't be leaning too heavily on type safety.

juliangregorian · on Nov 26, 2014

This is really interesting. I was about to correct you since I remembered using function overloading, but then I double checked and it was indeed C++. I knew about C++'s name mangling from having coded against it in FFI, but never knew why it existed: the name mangling is what allows C++ to have function overloading. Light bulb!

cbd1984 · on Nov 26, 2014

Right. Use Hungarian to encode information the type system can't represent. This is relevant even in object oriented languages, such as distinguishing between escaped and unescaped strings.

kakakiki · on Nov 26, 2014

"There are heretic movements that try to make indentations 4 (or even 2!) characters deep, and that is akin to trying to define the value of PI to be 3."

HA!

snlacks · on Nov 26, 2014

I laughed too. He makes a good point about the amount of indentation in code.

As someone who spends most of their time in JavaScript, I see how hard it would be to fit our code to this, and at the same time how much we'd all benefit if we tried to.

I just looked at the random JS file on top of my editor... have some refactoring to do.

kakakiki · on Nov 27, 2014

I understand your point. I am having some rethinking about my style of writing too :)

kijin · on Nov 26, 2014

> spaces are never used for indentation

If indentation should always use tabs (0x09) and never spaces (0x20), then the whole rant about indentation width is pointless. Any modern editor will allow you to adjust the displayed width of a tab. It's only when you use spaces for indentation that the width becomes a concern.

repsilat · on Nov 26, 2014

There are two counterarguments to this:

- Line length. Some people say lines should be no longer than 78-80 characters, and you can't reasonably enforce a rule like this without answering how "wide" a tab is.

- Alignment. The "right thing to do" is to indent with tabs and align with spaces, but this is difficult for some people, against the religion of others (mixing tabs and spaces!) and insufficient if you want to align text that spans different levels of indentation. If most people use 8-character wide tabs, things will at least look right for them when it inevitably goes wrong.

LnxPrgr3 · on Nov 26, 2014

I've been playing with clang-format for my own projects (installed via homebrew, since Apple doesn't ship it with Xcode). I tell it to enforce limits assuming ts=8 and use tabs for indentation. My editor is configured for ts=4.

Doing that seems to actually mostly work! It's made some weird (and in one case obviously broken) formatting decisions, but otherwise I'm pleased with it.

dllthomas · on Nov 26, 2014

I almost want a language that demands a visible character where indentation ends and alignment begins...

mbell · on Nov 26, 2014

It's far easier to simply ban the tab character. All indentation problems magically go away.

dllthomas · on Nov 26, 2014

It's easier, but that doesn't mean it's better.

rossy · on Nov 26, 2014

When you have a limit of 80 characters per line, the indentation width still matters because code that doesn't appear to overflow with 4 space tabs could overflow with 8 space tabs.

raverbashing · on Nov 26, 2014

I like it

I really prefer using tabs. Having it displayed as 8 spaces in other languages is not as good as in C

And they get it right about typedefs in C

Zardoz84 · on Nov 26, 2014

2 spaces to indent is enough for anyone.

Dewie · on Nov 26, 2014

Right, having tabs seems better since I can configure my editor to display tabs as 4 spaces, while whoever else can have tabs be displayed as 8 spaces. Having spaces instead, and having to make your tabs output spaces, and perhaps also backspace deleting four spaces (a "tab") in certain contexts, seems pretty complex in comparison.

twic · on Nov 26, 2014

Agreed. Using spaces to painstakingly emulate tabs, rather than just using tabs, seems absurd to me.

Even better, if you use real tabs, you might be able to use elastic tabstops:

http://nickgravgaard.com/elastictabstops/

dezgeg · on Nov 26, 2014

In theory, it sounds like a nice idea that by having tabs, you could choose your own preferred indent width of something different than 8 columns. But in practice, this will cause problems, such as code written by you going over the maximum line length when viewed by people with 8 column tabs.

Dewie · on Nov 26, 2014

My editor is set to wrap around on long lines... but that is again of course another thing that potentially has to be tweaked. That is something that is done when writing normal text (like in this text box here on this website), and is done in word processors, so it feels pretty natural.

Though I've used that (wrapping around) mostly when writing in Haskell, for some reason. In Java and whatever lines tend to not become too long, perhaps because I tend to use four space indent...

scrollaway · on Nov 26, 2014

I got into that fight so many times. It baffles me a majority of programmers out there do not understand that tabs are not just a matter of preference, they are a matter of accessibility. I read better on 4-char indent, and some people read better on 8-char indent. Let the user choose, rather than force it with spaces.

coldpie · on Nov 26, 2014

I don't care much either way so long as you NEVER VERTICALLY-ALIGN YOUR LINES.

  int valueone     = 1;
  
  int anothervalue = 2;
  
  float yetmore    = 3.;

Aggggh what a waste of time why do people do this

edent · on Nov 26, 2014

Because I find it really handy to quickly, and visually, check the sanity / logic of something.

In your example, it's really easy to run your eyes down a column and see that one of those values is radically different from the others.

As a trivial example:

    int robert_age = 32;
    int annalouise_age = 25;
    int bob_age = 250;
    int dorothy_age = 56;

I find easier to read as:

    int robert_age     = 32;
    int annalouise_age = 25;
    int bob_age        = 250;
    int dorothy_age    = 56;

Coding styles are about readability and usability. The columns metaphor works well for some categories of data - that's why spreadsheets are so popular.

coldpie · on Nov 26, 2014

This is true if you never change your code. But as soon as I have to add

  int rumpelstiltskin_age = 202;

to your code, I already want to throw you out the window for the work I have to do and the diff I have to ruin to keep your "pretty" formatting. Just don't bother.

andreasvc · on Nov 26, 2014

This ruins the readability and usability of your diffs. Say you need to quickly track down a major bug due to a change in a single constant. With horizontal alignment, the diff might contain any number of changed lines, obscuring the crucial change. There are workarounds that ignore whitespace and word-based diffs, but it's just not worth the trouble IMHO.

ddevault · on Nov 26, 2014

Is this really so bad?

https://gist.github.com/SirCmpwn/540c5fc115e9f65bfa3b

andreasvc · on Nov 26, 2014

The problem I was alluding to occurs when a change causes the amount of alignment spaces to change, which then affects all the lines that have been aligned. Without alignment, the diff would be limited to just the code that was changed.

Dewie · on Nov 26, 2014

We should adjust our code and text to `diff`, rather than adjusting our tools to our code and text?

scrollaway · on Nov 27, 2014

While I agree diff should be adjusted to accomodate for whitespace diffs more easily by default (it can do that with some options), it's not just a burden on the reviewer. It is also a burden on the programmer.

If you have, say, 50 lines of assignment and you align all the values to the largest one, adding one forces you to update 50 lines. I've been faced with those very situations and that is when I understood how important it is not to align values like that.

Dewie · on Nov 27, 2014

I don't why the committer can't leave a message like:

> Formatting change. Use `wdiff` to confirm that changes are just stylistic

The reviewer runs `wdiff` and confirms that the commit is just a formatting change. If the language is not layout-aware, then he will know that none of the changes are "semantic". Now he can look over the changed lines themselves (not necessarily with `diff`; just looking at the changed lines themselves) and see if the change is worth it/in line with the project.

PS: Maybe there should be a "column diff", something that checks that one file uses the same alignment as another file. I'm not able to show it here since HN will truncate spaces between words ( ;) ), but the point is to check if two files uses the same alignment, for example that in

> var v = 12

the next variable declaration, the numbers line up. I don't know if that is worth it, and the check would only be valid for some parts of the files.

andreasvc · on Nov 27, 2014

Sure the reviewer could run `wdiff`, but it's an extra manual step due to a completely cosmetic and useless formatting style. If you're perusing hundreds of diffs, deciding when to applying whitespace significant vs insignificant diffs is a distraction, and it can't be done automatically.

You can fantasize about better diffs. Why isn't a diff applied directly to the abstract syntax trees of a language, for example? However, I think part of the robustness of version control systems comes from keeping things simple, namely line-based diffs, and with that comes a preference for keeping line-based diffs short.

andreasvc · on Nov 27, 2014

Indeed, because adjusting version control system to parse ever language under the sun is not a good idea, and that would be required to properly identify strictly formatting-related changes.

chj · on Nov 26, 2014

People waste a lot of time in making things pleasant to look at.

adestefan · on Nov 26, 2014

Because emacs does it for me automatically.

DonHopkins · on Nov 30, 2014

The correct term is "horizontal alignment", not "vertical alignment". It's impossible to vertically align code unless you're using a two-dimensional visual programming language.

https://google-styleguide.googlecode.com/svn/trunk/javaguide...

Google Java Style

4.6.3 Horizontal alignment: never required

Terminology Note: Horizontal alignment is the practice of adding a variable number of additional spaces in your code with the goal of making certain tokens appear directly below certain other tokens on previous lines.

This practice is permitted, but is never required by Google Style. It is not even required to maintain horizontal alignment in places where it was already used.

Here is an example without alignment, then using alignment:

    private int x; // this is fine
    private Color color; // this too

    private int   x;      // permitted, but future edits
    private Color color;  // may leave it unaligned

Tip: Alignment can aid readability, but it creates problems for future maintenance. Consider a future change that needs to touch just one line. This change may leave the formerly-pleasing formatting mangled, and that is allowed. More often it prompts the coder (perhaps you) to adjust whitespace on nearby lines as well, possibly triggering a cascading series of reformattings. That one-line change now has a "blast radius." This can at worst result in pointless busywork, but at best it still corrupts version history information, slows down reviewers and exacerbates merge conflicts.

https://developer.mozilla.org/en-US/docs/Web/CSS/vertical-al...

The vertical-align CSS property specifies the vertical alignment of an inline or table-cell box.

Values (for inline elements)

Most of the values vertically align the element relative to its parent element:

baseline: Aligns the baseline of the element with the baseline of its parent. The baseline of some replaced elements, like <textarea> is not specified by the HTML specification, meaning that their behavior with this keyword may change from one browser to the other.

sub: Aligns the baseline of the element with the subscript-baseline of its parent.

super: Aligns the baseline of the element with the superscript-baseline of its parent.

text-top: Aligns the top of the element with the top of the parent element's font.

text-bottom: Aligns the bottom of the element with the bottom of the parent element's font.

middle: Aligns the middle of the element with the middle of lowercase letters in the parent.

<length>: Aligns the baseline of the element at the given length above the baseline of its parent.

<percentage>: Like <length> values, with the percentage being a percent of the line-height property. (Negative values are allowed for <length> and <percentage>.)

The following two values vertically align the element relative to the entire line rather than relative to its parent:

top: Align the top of the element and its descendants with the top of the entire line.

bottom: Align the bottom of the element and its descendants with the bottom of the entire line. For elements that do not have a baseline, the bottom margin edge is used instead.

Zardoz84 · on Nov 26, 2014

but looks nice....

oneeyedpigeon · on Nov 26, 2014

Agreed, although this doesn't sit well with me when combined with the reasons for the recommendation - i.e. 8 is just better, line length should be 80, nesting should be limited to 3. If someone can set their indent level to something other than 8, won't they be more likely to violate other rules? I say this having just realised I have my tab set to 4 spaces ...

BugsBunnySan · on Nov 26, 2014

It's a very nice coding style. It keeps the code in pieces that are easy to grasp as units, it doesn't waste space and doesn't clutter the code at the same time.

Just take any random function from the kernel sources and ask yourself, what does it do. I think in most cases you'll find it's really obvious...

For me I find the kernel sources one of the most readable and understandable sources I've seen. The structure of them is just so clearly visible from the sources. I think a lot of that has to do with the coding style.

davidw · on Nov 26, 2014

I'm a fan of the Tcl/Apache/BSD style. Indeed, Tcl has nice C code:

https://github.com/tcltk/tcl/blob/master/generic/tclCompile....

oneeyedpigeon · on Nov 26, 2014

Nice, but they definitely overcomment IMO, e.g.

https://github.com/tcltk/tcl/blob/master/generic/tclCompile....

jmnicolas · on Nov 26, 2014

After having maintained several projects with absolutely 0 comment apart the ones that were copy pasted from examples found on the web, I can tell you there's no such thing as "overcomment".

oneeyedpigeon · on Nov 26, 2014

I mentioned it partly because it's addressed in the actual article, and the specific instance I referenced is such a flagrant abuse of the guideline:

"Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to write the code so that the _working_ is obvious, and it's a waste of time to explain badly written code."

It's almost as bad as:

  // Increment i
  i++;

I think we should be aiming for code that is so beautiful and simple that it doesn't require commenting; comments should be left for exceptional circumstances in which something really can't be clearly expressed in the code. But that's definitely separate from documentation which should be separate, and at a much higher level.

why-el · on Nov 26, 2014

Then I think you will enjoy this thread. :)

http://stackoverflow.com/questions/184618/what-is-the-best-c...

thatswrong0 · on Nov 26, 2014

> Do not unnecessarily use braces where a single statement will do.

    if (condition)
	    action();

> and

    if (condition)
	    do_this();
    else
	    do_that();

The Apple SSL bug (https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-got...) makes me wonder if this is really worth the potential for introducing bugs.

DSMan195276 · on Nov 26, 2014

IMO, the bug was a much deeper issue then simply not putting braces on if statements. It doesn't matter if the code becomes this:

    if (condition) {
        goto fail;
    }
        goto fail;

if nobody looks at the commit. Don't get me wrong though, braces on if's do help for making cleaner patches, so there is a valid reason to request braces. You should never rely on them to fix these types of bugs though, that's bound to come back and bite you. In general, having a proper system for submitting and approving patches (Like the Kernel has) will allow you to avoid errors like this one.

thatswrong0 · on Nov 26, 2014

I certainly don't disagree with you on the importance of process. I was going to mention how this probably would never be an issue for the kernel.

To me, though, requiring braces would make it much easier to spot any such problem at any point in the development process (writing, debugging, reviewing, maintenance) such that the extra line per conditional would be well worth it in all cases, not to mention making edits easier.

DSMan195276 · on Nov 26, 2014

I think it's worth noting that a pretty big percentage of the if's in the kernel are one line. I'm not particularly tied to one opinion or the other, I'll do whatever fits with the project (Though I do tend to use the one line if syntax for personal projects). But, I personally like them in the kernel's source simply because one line if's are so common and mostly encouraged. IMO, a better solution is to use a context-aware patch system, rather then line-based patches. That brings it's own set of problems though unfortunately.

That said, I think the argument does apply in that some pieces of the kernel don't strictly follow the kernel style, and the fact that braces aren't enforced leads to some uglier pieces of code [0] being allowed despite not strictly adhering to the style.

[0] https://github.com/torvalds/linux/blob/master/kernel/groups....

RogerL · on Nov 27, 2014

Any decent code analyzer will pick that kind of thing up. I'd say the Apple bug speaks more about their process than their coding standards.

whoisthemachine · on Nov 26, 2014

I agree with and already practice many of these conventions (at least the ones that apply to C-like languages in general). It's interesting that I do and I kind of wonder what lead me down that path, since I haven't programmed in C since my college days. I often think that my assembly class from those days pushed me into making my code as vertical as possible rather than the indented-if-statement-curly-brace hell that I often see, since assembly was very readable without having that capability.

geekam · on Nov 26, 2014

>> Do not unnecessarily use braces where a single statement will do.

Shouldn't this be changed to always use braces? Given the Apple bug?

dllthomas · on Nov 26, 2014

The "Apple bug" - I assume you mean the duplicated "goto fail" - isn't really a common kind of error. That said, there are others that "braces everywhere" does protect from somewhat. The question is whether the clutter trades off too much in readability. Linus apparently thinks it does.

Nmachine · on Nov 26, 2014

I stopped reading pretty early on: ... if you need more than three levels of indentation you're screwed and should rewrite...