|
|
Subscribe / Log in / New account

Calculating the "truck factor" for GitHub projects

The idea of a truck or bus factor (or number) has been—morbidly, perhaps—bandied about in development projects for many years. It is a rough measure of how many developers would have to be lost (e.g. hit by a bus) to effectively halt the project. A new paper [PDF] outlines a method to try to calculate this number for various GitHub projects. Naturally, it has its own GitHub project with a description of the methodology used and some of the results. It was found that 46% of the projects looked at had a truck factor of 1, while 28% were at 2. Linux scored the second highest at 90, while the Mac OS X Homebrew package manager had the highest truck factor at 159.

(Log in to post comments)

Calculating the "truck factor" for GitHub projects

Posted Jul 16, 2015 23:50 UTC (Thu) by bronson (subscriber, #4806) [Link]

This is a great idea. It produced some surprises, like sprockets having a truck factor of 1.

The linked github repo only has a README, no code. I'd like to run it against some other projects like atom/electron, rubygems/rubygems, and rspec/rspec... Has anyone seen a way to do that?

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 0:20 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

I think that there may be some problems with the methodology: Linux gets a truck factor of 90, it appears, because the researchers don't distinguish core parts of the kernel from drivers (all nontrivial non-documentation files are equal), so it is as though Linus and all his deputies could die in a plane crash but everything would be fine because 20 authors of obscure drivers remain. Likewise, in many cases there are other people who are thoroughly familiar with some part of an important project, but don't count as an author because a control freak project owner insists on reworking all the checkins himself.

Still, it's something, and we should pay attention to important projects with a "truck factor" of one to make sure that there's a backup plan.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 7:54 UTC (Fri) by Felix (subscriber, #36445) [Link]

I think everyone will quickly notice that the methodology can not be relied on to produce completely accurate results. For example the "homebrew" repository on github doesn't only include the actual package manager but also all "formulas" (packages"). Of course a huge number of people contributes to these formulas (similar to what you see in Debian, Fedora, ...).

However a lot of value about the number of supported packagees (as well as packaging quality for "key" components). So in theory the homebrew "distro" might be fine if 40, 50 persons were leaving but that would weaken the network effect quite a lot. Also I assume that there are some packages which are used very often and you might find that these are only maintained by a "core group" of much fewer people (similar to Fedora for example).

On the other hand it's a nice automated approach without much influence of subjective judgement so I see some value in the results.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 1:32 UTC (Fri) by sashal (subscriber, #81842) [Link]

Makes you wonder if events such as the kernel summit are a good idea...

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 3:26 UTC (Fri) by HybridAU (guest, #85157) [Link]

I think this is great avenue for research and showing important weaknesses in some projects with some surprising results.

I'm not sure I completely agree with their methodology though. Anecdotally, I work on some software where I'm the only developer so by their methodology a bus factor of 1. But for precisely that reason my employer requires that the project is well documented, not just comments in the code but developer manuals, style guidelines, database schemas, issue tracker, documents explaining past design decisions and a road map for the future. If I was to go out for lunch today and get hit by a little red bus, my employer could simply find someone new and carry on.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 9:03 UTC (Fri) by Lennie (subscriber, #49641) [Link]

Pretty certain that the size of the code base is also an important factor.

Let's say that if you have a very large code base, even with a large amount of great documentation and great code comments if it take a months or maybe even a year before an other developer is productive with fixing bugs and adding new features then it would still be a big problem for your employer.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 13:09 UTC (Fri) by nix (subscriber, #2304) [Link]

That's what I thought with my stuff at my last employer. But, y'know, it's not true: unless you have actual co-developers, nobody feels confident enough to do anything more than bugfixes and routine maintenance after you leave. Feature development, which by definition generally requires a good understanding of the codebase, tends to come to a halt :(

Calculating the "truck factor" for GitHub projects

Posted Jul 19, 2015 3:26 UTC (Sun) by pr1268 (subscriber, #24648) [Link]

If I was to go out for lunch today and get hit by a little red bus, my employer could simply find someone new and carry on.

Yes, but the researchers were studying github projects (which assumes an open-source and development strategy). From the sounds of it, your employment project is an in-house only, presumably closed-source project.

Not that I disagree with your employer's methodology; he/she (and you) have a good handle on how to manage such a contingency. Which I pray doesn't happen to you.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 7:00 UTC (Fri) by andreashappe (subscriber, #4810) [Link]

could the analysis have problems with software repositories that are just a mirror of other repositories? There was an "android" repository with a bus factor of 1 in it (;

Abandonment factor.

Posted Jul 17, 2015 8:58 UTC (Fri) by fb (guest, #53265) [Link]

I would appreciate if we dropped the morbid metaphor for something realistic, e.g. "abandonment" factor.

Buses are not a common cause for projects fading away. What we see often is:
- "I got married" / "I have kids now"
- "I've switched jobs" (or "I've got a job")
- "I lost interest"

Switching the metaphor to (something like) "abandonment/leave factor" would help making people think more about which are the real risk factors to projects.

A. Is this project funded? If so, to which extent? If **all** Linux kernel devs decided to leave to work on something else, different people would just get hired to work on it.
B. What is the growth/decay rate of contributors?

For instance, for those 46% of projects with a single maintainer, what is the ratio in that group for:
- paid to work on it/hobbyist
- single/married
- kids/no kids
- working/student.

Abandonment vs. Bus/Truck factor

Posted Jul 19, 2015 3:37 UTC (Sun) by pr1268 (subscriber, #24648) [Link]

I would appreciate if we dropped the morbid metaphor [...]

Well, in the defense of our editor and the authors of the paper, that term has been used for over twenty years (according to the Wikipedia page linked in our editor's blurb).

While I agree it's not the most pleasant nor politically-correct term, it is commonly known among software development organizations (or at least it should be).

Your bullet points are a more accurate way of enumerating reasons for developer loss among open-source/open-development projects (thankfully), but, assuming a project does fade away, these developers are at least still alive to perhaps provide documentation and support to someone who may be willing to take over. Projects with a single maintainer don't have that luxury when that single maintainer is hit by a ...

Abandonment vs. Bus/Truck factor

Posted Jul 19, 2015 15:14 UTC (Sun) by jwakely (guest, #60262) [Link]

> While I agree it's not the most pleasant nor politically-correct term

What is not politically correct about it?! Is it discriminatory against bus drivers? People who suffer from chronic falling under buses syndrome? Is the word bus an outdated stereotype for modern public transport systems? Is it offensive to car drivers who don't like public transport? I think objecting to the term because it's unpleasant would fit Wikipedia's definition of PC: "a pejorative term used to criticize language, actions, or policies seen as being excessively calculated to not offend or disadvantage any particular group of people in society". Objecting to Bus Factor seems excessively concerned with a complete non-issue.

Anyway, as you say, the point is that you can plan for other forms of abandonment and perform a risk analysis (assuming you could find out the marital status, age, family status of the contributors, which is incidentally all information that could be used to discriminate when someone is applying for a job!) but you can't plan for someone getting hit by a bus. It can happen to even the most dedicated contributors that you know would never abandon the project.

Abandonment vs. Bus/Truck factor - politically correctness

Posted Jul 19, 2015 16:29 UTC (Sun) by pr1268 (subscriber, #24648) [Link]

Maybe "politically correct" was the wrong term... But, I've learned to describe a phrase as being not-PC whenever it involves grisly or macabre details; i.e. it might be more appropriate to use what the GP suggested.

For example, I've been called out for calling this thing a "finger chopper". (Shame on me!)

I certainly did not mean to imply that "bus factor" (or "truck factor") was discriminatory; but instead (like our editor and the GP said) somewhat morbid.

Abandonment vs. Bus/Truck factor - politically correctness

Posted Jul 21, 2015 15:05 UTC (Tue) by jwakely (guest, #60262) [Link]

> For example, I've been called out for calling this thing a "finger chopper". (Shame on me!)

Haha! When I was at school we called that a guillotine, which brings even gorier imagery to mind :)

Abandonment vs. Bus/Truck factor

Posted Jul 20, 2015 14:25 UTC (Mon) by fb (guest, #53265) [Link]

For one, I agree that calling it 'politically incorrect' was not the most accurate choice of words by "pr1268".

That said, bringing such a morbid metaphor can be problematic. Specially when arguments get hot. IIRC Guido van Rossum really did not like when someone -amidst an adversarial discussion- started making comments about Guido suffering a (fatal) bus accident (IIRC it was over making integer division return a FractionalNumber in Python many/many years ago.... with which Guido decided not to go ahead).

One can argue that "unlike the got too busy with something else" case, a deceased developer can't help with any comments etc. But I still /think/ we have more projects where the developer simply abandons a project "never to be heard from again" due to "getting busy with other things" than due to loss of life.

Abandonment vs. Bus/Truck factor

Posted Jul 21, 2015 1:31 UTC (Tue) by dakas (guest, #88146) [Link]

Not politically correct would be "Discovered girls factor". "Got a life factor" would be gender neutral but still playing on stereotypes. Though those stereotypes will likely hold strongest for the projects with low bus factor, probably related to the "this is my baby factor".

Incidentally, the statistics don't seem to include the projects with a bus factor of 0. I'd guess those to constitute the majority.

Abandonment vs. Bus/Truck factor

Posted Jul 29, 2015 1:40 UTC (Wed) by apollock (subscriber, #14629) [Link]

Well, in the defense of our editor and the authors of the paper, that term has been used for over twenty years (according to the Wikipedia page linked in our editor's blurb).

While I agree it's not the most pleasant nor politically-correct term, it is commonly known among software development organizations (or at least it should be).

In the now defunct SysOps, we had a sysadmin who objected to the term "hit by a bus" because he knew someone who had literally been hit by a bus, so we came up with "Eaten by a GRUE" to remove the microaggression.

We extended it to be a great disaster recovery training exercise with GRUE becoming a backronym for "Google's Real Untimely Education". A sysadmin could declare they were "eaten by a GRUE" for the day and go off and work on something that didn't involve interacting with the service they were normally responsible for (or any of the co-workers who would have to step in and pick up the pieces).

Abandonment factor.

Posted Jul 21, 2015 5:05 UTC (Tue) by NightMonkey (subscriber, #23051) [Link]

Morbid it may be, but one gets more bites, more eyeball traction on the knowledge one wishes to spread when one writes with flair and spark, rather than dry, ungainly, flat prose. Don't let superstition curtail your keyboards' song!

Keep this awesome phrase going! :) The art of Rhetoric is a fun art, indeed.

FFmpeg/libav

Posted Jul 17, 2015 9:06 UTC (Fri) by Lennie (subscriber, #49641) [Link]

Let's look at an other article on lwn.net what happens in most project in practice: "Why Debian returned to FFmpeg"
__

With regard to security issues, Reinhard attributed the difference in fix rates to a difference in how the two projects approach development ("Michael" is Michael Niedermayer, the lead developer of FFmpeg):

Michael seems to have much more capacity and time, and thus is usually faster with pushing patches for such crashers. Libav takes the time to investigate, reproduce and understand those patches. Unfortunately, in the majority of cases, this is not trivial at all, often because of terse (or even wrong) commit messages, or the fact that there are better places to fix a particular issue in the code. "Better" usually means that more than a single instance of the issue is fixed.
__

Ohh... does anyone still think FFmpeg is a better project than libav when we include the 'truck factor' ? ;-)

FFmpeg/libav

Posted Jul 17, 2015 17:45 UTC (Fri) by dlang (guest, #313) [Link]

> Ohh... does anyone still think FFmpeg is a better project than libav when we include the 'truck factor' ? ;-)

as was discussed in the article, yes

because even if you remove Michael, the next several contributers down are all doing about as much work as the top contributers of libav

in other words, if Michael was to disappear and nobody else did any more work, the two projects would be in about the same shape as far as patch rate.

FFmpeg/libav

Posted Jul 18, 2015 18:52 UTC (Sat) by flussence (subscriber, #85566) [Link]

But in this case, we already know what happens in practice:

Libav is what FFmpeg looks like after it's been hit by a truck.

Calculating the "truck factor" for GitHub projects

Posted Jul 17, 2015 11:16 UTC (Fri) by Samathy (guest, #102370) [Link]

This is very interesting data but, its not really useful.

What happens when you have a large project where one person has contributed a to only a few core files but there are hundreds of other authors who have contributed less important files. Loosing the one core developer would be devastating, but loosing a few lesser devs wouldn't be so bad.

Additionally, there may be developers who have actually contributed very little measurable code but in fact are the driving force behind the design of the project.

I think it boils down to the lack of data analysed. You can't figure out how many people you could loose because every developer on a project has a different importance - You can't treat all the contributions as having the same weight.
I'm not sure how you could improve upon the data, but you'd have to factor in some more information to get a usable number.

Calculating the "truck factor" for GitHub projects

Posted Jul 18, 2015 23:04 UTC (Sat) by mrjk (subscriber, #48482) [Link]

I find the data useful, just not the way you thinking. The exact number is suspect of course but it is where that number is very small that it is interesting. You could categorize as one, small, medium and large and that way would tell you where there may be risk. It could be useful for that earlier story where someone was trying to find points of risk in important packages to perhaps give support.

I think the thing is since this is automated, you could use it to perhaps raise flags, not for the exact value.

Calculating the "truck factor" for GitHub projects

Posted Jul 20, 2015 3:27 UTC (Mon) by pabs (subscriber, #43278) [Link]

Anyone know of some code for calculating bus/truck factor?

Calculating the "truck factor" for GitHub projects

Posted Jul 20, 2015 20:55 UTC (Mon) by bronson (subscriber, #4806) [Link]

I emailed the author, no reply.

Gotta say, it's a neat idea, but if all it produces is one blog entry (that reads like a blog entry) then I guess it's fairly useless.

Calculating the "truck factor" for GitHub projects

Posted Jul 20, 2015 21:24 UTC (Mon) by bronson (subscriber, #4806) [Link]

ahhaahaha... I should have waited another 1/2 hour before posting that. He says they'll put up the code in a few weeks.

Looking forward to it.

Calculating the "truck factor" for GitHub projects

Posted Sep 12, 2015 23:07 UTC (Sat) by pabs (subscriber, #43278) [Link]

Did any code go up?


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds