|
|
Subscribe / Log in / New account

Moving some of Python to GitHub?

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jake Edge
December 3, 2014

Over the years, Python's source repositories have moved a number of times, from CVS on SourceForge to Subversion at Python.org and, eventually, to Mercurial (aka hg), still on Python Software Foundation (PSF) infrastructure. But the new Python.org site code lives at GitHub (thus in a Git repository) and it looks like more pieces of Python's source may be moving in that direction. While some are concerned about moving away from a Python-based DVCS (i.e. Mercurial) into a closed-source web service, there is a strong pragmatic streak in the Python community that may be winning out. For good or ill, GitHub has won the popularity battle over any of the other alternatives, so new contributors are more likely to be familiar with that service, which makes it attractive for Python.

The discussion got started when Nick Coghlan posted some thoughts on his Python Enhancement Proposal (PEP 474) from July. It suggested creating a "forge" for hosting some Python documentation repositories using Kallithea—a Python-based web application for hosting Git and Mercurial repositories—once it has a stable release. More recently, though, Coghlan realized that there may not be a need to require hosting those types of repositories on PSF infrastructure as the PEP specified; if that is the case, "then the obvious candidate for Mercurial hosting that supports online editing + pull requests is the PSF's BitBucket account".

But others looked at the same set of facts a bit differently. Donald Stufft compared the workflow of the current patch-based system to one that uses GitHub-like pull requests (PRs). Both for contributors and maintainers (i.e. Python core developers), the time required to handle a simple patch was something like 10-15 minutes with the existing system, he said, while a PR-based system would reduce that to less than a minute—quite possibly much less.

Python benevolent dictator for life (BDFL) Guido van Rossum agreed, noting that GitHub has easily won the popularity race. He was also skeptical that the PSF should be running servers:

[...] We should move to GitHub, because it is the easiest to use and most contributors already know it (or are eager to learn thee). Honestly, the time for core devs (or some other elite corps of dedicated volunteers) to sysadmin their own machines (virtual or not) is over. We've never been particularly good at this, and I don't see us getting better or more efficient.

Moving the CPython code and docs is not a priority, but everything else (PEPs, HOWTOs etc.) can be moved easily and I am in favor of moving to GitHub. For PEPs I've noticed that for most PEPs these days (unless the primary author is a core dev) the author sets up a git repo first anyway, and the friction of moving between such repos and the "official" repo is a pain.

GitHub, however, only supports Git, so those who are currently using Mercurial and want to continue would be out of luck. Bitbucket supports both, though, so in Coghlan's opinion, it would make a better interim solution. But Stufft is concerned that taking the trouble to move, but choosing the less popular site, makes little sense.

On the other hand, some are worried about lock-in with GitHub (and other closed-source solutions, including Bitbucket). As Coghlan put it:

And this is why the "you can still get your data out" argument doesn't make any sense - if you aren't planning to rely on the proprietary APIs, GitHub is just a fairly mundane git hosting service, not significantly different in capabilities from Gitorious, or RhodeCode, or BitBucket, or GitLab, etc. So you may as well go with one of the open source ones, and be *completely* free from vendor lockin.

The feature set that GitHub provides is what will keep the repositories there, though, Stufft said: "You probably won’t want to get your data out because Github’s features are compelling enough that you don’t want to lose them". Furthermore, he looked at the Python-affiliated repositories on the two sites and found that there were half a dozen active repositories on GitHub and three largely inactive repositories on Bitbucket.

The discussion got a bit testy at times, with Coghlan complaining that choosing GitHub based on its popularity was anti-community: "I'm very, very disappointed to see folks so willing to abandon fellow community members for the sake of following the crowd". He went on to suggest that perhaps Ruby or JavaScript would be a better choice for a language to work on since they get better press. Van Rossum called that "a really low blow" and pointed out: "*A DVCS repo is a social network, so it matters in a functional way what everyone else is using.*" He continued:

So I give you that if you want a quick move into the modern world, while keeping the older generation of core devs happy (not counting myself :-), BitBucket has the lowest cost of entry. But I strongly believe that if we want to do the right thing for the long term, we should switch to GitHub. I promise you that once the pain of the switch is over you will feel much better about it. I am also convinced that we'll get more contributions this way.

Eventually, Stufft proposed another PEP (481) that would migrate three documentation repositories (the Development Guide, the development system in a box (devinabox), and the PEPs) to GitHub. Unlike the situation with many PEPs, Van Rossum stated that he didn't feel it was his job to accept or reject the PEP, though he made a strong case for moving to GitHub; he believes that most of the community is probably already using GitHub in one way or another, lock-in doesn't really concern him since the most important data is already stored in multiple places, and, in his mind, Python does not have an "additional hidden agenda of bringing freedom to all software".

It turns out that Brett Cannon is the contact for two of the three repositories mentioned in the PEP (devguide and devinabox), so Van Rossum is leaving the decision to Cannon for those two. Coghlan is the largest contributor to the PEPs repository, so the decision on that will be left up to him. He is currently exploring the possibility of using RhodeCode Enterprise (a Python-based, hosted solution with open code, but one that has licensing issues that Coghlan did acknowledge). For his part, Cannon noted his preference for open, Mercurial-and-Python-based solutions, but he is willing to consider other options. There may be a discussion at the Python language summit (which precedes PyCon), but, if so, Van Rossum said he probably won't take part—it's clear he has tired of the discussion at this point.

There are good arguments on both sides of the issue, but it is a little sad to see Python potentially moving away from the DVCS written in the language and into the more popular (and feature-rich, seemingly) DVCS and hosting site (Git and GitHub). While Van Rossum does not plan to propose moving the CPython (main Python language code) repository to GitHub anytime soon, the clear implication is that he would not be surprised if that happens eventually. While it might make pragmatic sense on a number of different levels, and may have all the benefits that have been mentioned, it would certainly be something of a blow to the open-source Python DVCS communities. With luck, those communities will find the time to fill the functionality gaps, but the popularity gap will be much harder to overcome.



(Log in to post comments)

Moving some of Python to GitHub?

Posted Dec 3, 2014 18:34 UTC (Wed) by rsidd (subscriber, #2582) [Link]

"There is a strong pragmatic streak in the Python community that may be winning out."

There was a strong pragmatic streak in Linus Torvalds that won out, leading him to embrace Bitkeeper -- until McVoy tightened the screws just enough that Linus ended up inventing git.

Hopefully the Python community will not have to develop a github alternative down the line. Since github has a much larger base among open source projects than bitkeeper did, the bitkeeper story may not repeat itself. Hopefully.

Moving some of Python to GitHub?

Posted Dec 3, 2014 19:52 UTC (Wed) by marduk (subscriber, #3831) [Link]

This is quite sad, yet predictable. For just the fact that the biggest reason git was created was to wean "ourselves" off of a closed system, only to have that same tool used to suck us back into a closed system. But we seem to do this (e.g. Linux -> MacOS, Firefox -> Chrome, vim/emacs/eclipse -> Sublime).

Seems that for some things open source is good, but not quite good enough. Or maybe some closed features are simply too tempting. 🍎

Moving some of Python to GitHub?

Posted Dec 3, 2014 20:16 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Well, I think the reason GitHub is big is because of network effects. Unlike many other networks though, that's about the only thing that can't be replicated from GitHub out-of-the-box elsewhere in case GitHub does go all Sourceforge into the dumps.

Moving some of Python to GitHub?

Posted Dec 4, 2014 0:59 UTC (Thu) by ras (subscriber, #33059) [Link]

> Sourceforge into the dumps.

They did go there, but competition made them see the error of their ways.

Sourceforge is now based on Allura, which unlike GitHub is open source. Allura Provides most of the things that makes GitHub popular - easy forking, pull requests and and so on. GitHub has a slicker UI IMO, but Apollo (or at least Sourceforge's version of it) supports a lot of VCS's, and gives you a shell account, ssh and rsync access and other things which amount to a "standard Linux API" of sorts.

Allura happens to be written in Python. It looks like a natural fit for python.org.

https://allura.apache.org/

Moving some of Python to GitHub?

Posted Dec 4, 2014 3:11 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Thanks for the info, it's good to hear they're improving. Unfortunately for their network effects, the only project I contribute to even sparingly hosted there is tmux and they use git send-email rather than pull requests. (Any service missing the network effects is going to lose out to my own personal gitolite hosting if GitHub takes a nosedive.)

Moving some of Python to GitHub?

Posted Dec 5, 2014 22:00 UTC (Fri) by sadboy (subscriber, #94691) [Link]

> Well, I think the reason GitHub is big is because of network effects.

That, or it's just faster and more stable than its competitors.

Moving some of Python to GitHub?

Posted Dec 8, 2014 18:11 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Well, those certainly helped establish the network effects. I certainly wouldn't use it without the network effects at least (e.g., I already run my own cgit/gitolite setup at home).

Moving some of Python to GitHub?

Posted Dec 9, 2014 7:12 UTC (Tue) by ploxiln (subscriber, #58395) [Link]

It's not more stable. Around a dozen times a year, I run into timeouts fetching from github while deploying a revision of code to a set of servers. At my previous job, we had a backup synced repo to deploy from. I'm currently at a pre-launch startup, so we haven't set up such things yet, and it hit me today, for about an hour.

If you use github (professionally), you need a fallback, preferably self-hosted so you can better guarantee its uptime. (Same goes for chat clients like hipchat, which I've also experienced day-long outages of.)

Moving some of Python to GitHub?

Posted Dec 9, 2014 10:50 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

"If you use github (professionally), you need a fallback, preferably self-hosted"

Such as, for example, github. They sell a boxed product (though of course historically boxed product versions of such sites get deprecated when the owner decides they wanted to be in some other business altogether, e.g. Google, Sourceforge).

Interestingly the public vs non-public repo stuff is still there in the boxed product. If I don't log in to our github instance it will show me various repos owned by other groups in the company (that are presumably marked "public" even though the whole thing is inside a firewall) but not our projects, which are private.

Moving some of Python to GitHub?

Posted Dec 3, 2014 21:29 UTC (Wed) by b7j0c (subscriber, #27559) [Link]

the situation isn't that dire imho. yeah, the oss community probably relies too much on github as a social network...but everything else it provides is a commodity. i personally put repos up there to collect PRs and bug reports (basically another facet of its social value)...but i don't treat it as the "source of truth"

Moving some of Python to GitHub?

Posted Dec 3, 2014 23:33 UTC (Wed) by juliank (guest, #45896) [Link]

There is an open alternative: gitorious.

It's just incredibly slow, so it does not really make a lot of sense to use it.

Moving some of Python to GitHub?

Posted Dec 4, 2014 1:36 UTC (Thu) by DrMcCoy (guest, #86699) [Link]

There's also https://kallithea-scm.org/ , but AFAIK unfortunately no hosted instance for the general public.

Moving some of Python to GitHub?

Posted Dec 8, 2014 22:53 UTC (Mon) by fb (guest, #53265) [Link]

> There is an open alternative: gitorious.

I'm all for FOSS solutions. I wanted a remote Git repository provider that gave me *private* repositories. This is just to sync and backup some private stuff of my own.

Guess what? Gitorious is not interested in taking my money. Github OTOH will happily get paid to host my personal private repositories.

You can add that as another reason why they win out in the network effects.

Moving some of Python to GitHub?

Posted Dec 9, 2014 8:34 UTC (Tue) by palmer_eldritch (guest, #95160) [Link]

With bitbucket or gitlab.com you get unlimited private/public repositories. Gitlab is FOSS.
But you only get git hosting, not a coders social network.

Moving some of Python to GitHub?

Posted Dec 10, 2014 0:17 UTC (Wed) by fb (guest, #53265) [Link]

When I created my paid account at GitHub, there was only mercurial at bitbucket. Never heard about Gitlab, will check it out for curiosity's sake.

Moving some of Python to GitHub?

Posted Dec 6, 2014 19:25 UTC (Sat) by ceplm (subscriber, #41334) [Link]

> vim/emacs/eclipse -> Sublime

I believe vim is still the most often used $EDITOR and I am not even sure Sublime is on the second post.

Moving some of Python to GitHub?

Posted Dec 12, 2014 18:02 UTC (Fri) by thestinger (guest, #91827) [Link]

> Firefox -> Chrome

Chrome is nothing more than a branded build of Chromium with bundled Flash and EME plugins. Firefox isn't free software because the branding isn't free - just like Chrome. The only difference is that there's a restrictive set of terms under which you can use the Firefox branding with your own build.

There's a reason that Debian ships Firefox as Iceweasel but Chromium doesn't need to be altered.

Moving some of Python to GitHub?

Posted Dec 3, 2014 18:48 UTC (Wed) by cstanhop (subscriber, #4740) [Link]

Practical and pragmatic are terms that only make sense in the context of shared values. If your values, or idealogy, differs from others, then what you consider simply practical and pragmatic can actually be grave matters of ethics for others. (The same can be said for many people's use of technical.) This is clearly a discussion of the values of the Python community, and it is quite revealing.

Moving some of Python to GitHub?

Posted Dec 3, 2014 19:23 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I've recently had to do some more development on Bitbucket and, compared to GitHub, there are some things that are really missing from a collaborative viewpoint (or at least not obvious as I didn't see how to do it):

* no way to submit a pull request to another repository which is forked from the same one (so you either have to share your repo or do manual pulls between repos to do collaborative development);
* no list of branches to create a pull request from the main repo page (so I don't have to go through my fork *then* create a pull request).

I remember there being some more things, but I can't remember them off-hand.

Moving some of Python to GitHub?

Posted Dec 3, 2014 22:13 UTC (Wed) by ejr (subscriber, #51652) [Link]

Email is a perfectly acceptable method for "submitting a pull request." For the second, well, I don't even know what you mean. Do you mean git branch -r doesn't work?

Much of the point behind git, mercurial, arch, etc. is that actions are *local*. That's how I've always used them, so many of these issues seem odd to be. But then, I'm old and not so much into using web browsers for everything.

Moving some of Python to GitHub?

Posted Dec 3, 2014 22:29 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> Email is a perfectly acceptable method for "submitting a pull request."

Not everyone has a mailing list. I'd also rather the development be open than locked to email. A real-world example is where trentforkert and I[1] have been working on D support in CMake. Both are forks of Kitware/CMake, but I can open pull requests directly to his fork for fixes and such. On Bitbucket, I'd only be able to open a pull request to Kitware/CMake (AFAICS).

> For the second, well, I don't even know what you mean. Do you mean git branch -r doesn't work?

No, when I visit the page for a repository for which I have a fork where I have recently pushed a branch, GitHub has a button *on that page* to open a pull request. On Bitbucket, I have to go to my fork, open a pull request, and find the branch in the dropdown (to be fair, not that much different than GitHub once the "recent push" hint drops the banner, but it is a nice shortcut).

> Much of the point behind git, mercurial, arch, etc. is that actions are *local*. That's how I've always used them, so many of these issues seem odd to be. But then, I'm old and not so much into using web browsers for everything.

Well, I'd prefer git send-email if possible, but not everyone has such a setup for contributors to use (and where contributions go is one place I'm more amenable to the "conservative in what you send; liberal in what you receive" mantra: "if they have rules, follow them, and if not, at least get it to the developers").

[1]Mostly him, by far. I've been mostly helping with testing and clarifying how CMake internals work.

Moving some of Python to GitHub?

Posted Dec 3, 2014 23:37 UTC (Wed) by viro (subscriber, #7872) [Link]

What do you mean, "mailing list"? What's wrong with personal email? It's not as if _that_ had been hard to come up with, after all. Unless something has changed, gmail ought to provide imaps access, so any normal MUA ought to work...

I really don't get it - all you need is an ability to pull from the tree hosted by whoever is hosting it and push to it (e.g. with gitolite(1)). Do pulls and merges in your local tree, then push the result into the mirroring public one. Allows to handle trivial conflicts conveniently, while we are at it, etc. Public tree (and its host) don't need to be aware of any of that. As for the buttons... what's wrong with cut'n'paste from mutt(1) into xterm running shell session?

Al, honestly confused...

Moving some of Python to GitHub?

Posted Dec 4, 2014 1:06 UTC (Thu) by louie (guest, #3285) [Link]

"What do you mean, "mailing list"? What's wrong with personal email?"

You did read the part of the article that talked about 10-15 minutes for patches in email, as opposed to 1 minute for pull requests, right? :)

Moving some of Python to GitHub?

Posted Dec 4, 2014 2:40 UTC (Thu) by bnorris (subscriber, #92090) [Link]

The article spoke about emailed patches being difficult. (This is debatable, but I'll admit it's not simple for the layperson to get right.)

Al is speaking about pull requests via email being simple.

Pull requests don't rule out the use of email.

Moving some of Python to GitHub?

Posted Dec 4, 2014 3:29 UTC (Thu) by viro (subscriber, #7872) [Link]

You mean, like https://lkml.org/lkml/2014/11/20/101? You do realize that having it Cc'd to l-k is just a courtesy, right? Linus isn't subscribed to l-k, after all... And the same level of courtesy could be achieved by copying it to a blog somewhere - hell, to a facepuke page, if you are into that sort of thing and your contributors won't run away retching after being told to go there.

Not to mention that applying a patch series from mail is a matter of saving the messages in question into an empty mailbox, then saying git am -s filename_of_that_mbox. It's not as if you had to apply them one-by-one...

Seriously, folks, it's an argument in favour of git, not github. Sure, CPython workflow sucks - they are paying for having it cast in stone back when they were using CVS. "Branches are costly, merges - even more so and woe onto you if you make a mistake in one of those" kind of life - their workflow could be used as a demonstration of the reasons why using CVS is always a bad decision. One that will keep hurting you long after you switch to something else...

OT

Posted Dec 4, 2014 3:35 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> facepuke

I prefer "farcebook", but this one is decent too. (The farce refers to both the drama that plays out there and the one the company plays on its users-as-commodity.)

Moving some of Python to GitHub?

Posted Dec 4, 2014 3:41 UTC (Thu) by ejr (subscriber, #51652) [Link]

One aspect of CVS is that it can record different histories for different individual files and not "change sets". I said aspect, not benefit, but it's something that is not supported by other systems (erm, well, arch had a way, but that's because he tried to support everything and it kinda just happened).

Moving some of Python to GitHub?

Posted Dec 4, 2014 2:56 UTC (Thu) by ejr (subscriber, #51652) [Link]

For whatever reason, the browser is now *the* interface. For us, it's mutt, gnus, etc. But for the next generation, it's the browser. I receive baffled looks when I speak of having to open a web browser.

At least there finally is one GUI to rule them all. I suppose.

Moving some of Python to GitHub?

Posted Dec 4, 2014 13:04 UTC (Thu) by gioele (subscriber, #61675) [Link]

> For whatever reason, the browser is now *the* interface. For us, it's mutt, gnus, etc. But for the next generation, it's the browser. I receive baffled looks when I speak of having to open a web browser.

> At least there finally is one GUI to rule them all. I suppose.

The browser is more a vt100 terminal or a X11 server. The "one GUI to rule them all" is yet to come: everybody who creates a web application these days has do recreate all its widgets or use a toolkit.

Moving some of Python to GitHub?

Posted Dec 4, 2014 3:07 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> What do you mean, "mailing list"? What's wrong with personal email? It's not as if _that_ had been hard to come up with, after all.

Did you miss where I'd rather contributions go to somewhere public rather than private email? Sure, I'll do it if necessary, but it's low in my preference list.

> Unless something has changed, gmail ought to provide imaps access, so any normal MUA ought to work...

Yep. And I use it exclusively (via mutt with help from vim, offlineimap, and esmtp).

> I really don't get it - all you need is an ability to pull from the tree hosted by whoever is hosting it and push to it (e.g. with gitolite(1)). Do pulls and merges in your local tree, then push the result into the mirroring public one. Allows to handle trivial conflicts conveniently, while we are at it, etc. Public tree (and its host) don't need to be aware of any of that.

And that's what I do…for my projects. Even GitHub pull requests get pulled, merged locally, then pushed. But not everyone does that.

> As for the buttons... what's wrong with cut'n'paste from mutt(1) into xterm running shell session?

Not every project uses email for patches. For those that use pull requests, GitHub is more convenient (and it isn't something Bitbucket can't implement; they just haven't).

Moving some of Python to GitHub?

Posted Dec 4, 2014 3:45 UTC (Thu) by viro (subscriber, #7872) [Link]

So set a blog up and post copies of those requests there... And cut'n'paste part was very definitely *not* about patches - too high chances of whitespace damage that way. No, it's "select that line with git URL in your MUA, type 'git pull ' at shell prompt, paste the selection there and hit enter". That's why it's considered normal to use that format for pull request mail... I still don't get it. If you want discussion to happen in public, but not on a public maillist, fine, post exact same text wherever it is that you are discussing stuff. What's so special about github pull requests? I've only used github to clone (and fetch from) repositories hosted there, so I really have no such experience with them, but everything I've heard so far sounds quite unconvincing...

Moving some of Python to GitHub?

Posted Dec 4, 2014 6:34 UTC (Thu) by cebewee (guest, #94775) [Link]

It easily provides some public development infrastructure, even for projects small enough I wouldn't bother to set up blogs, mailing lists, issue tracker et al.

For example, Hackages has a lot of small Haskell libraries, often developed by a single person. Due to them being on Github, I have a standardized platform for communicating issues with the maintainer (I can even use mail to continue the discussion on their issue tracker). And this infrastructure comes basically for free with the repository -- no need to setup a separate mailing list or whatever is the preferred form of communication for you. Otherwise, most wouldn't bother and you are back to private communication with the maintainer.

Moving some of Python to GitHub?

Posted Dec 9, 2014 16:25 UTC (Tue) by fb (guest, #53265) [Link]

> So set a blog up and post copies of those requests there...

Seriously, who has the time and willingness to set up a blog and manage the publication of requests in it?

> What's so special about github pull requests?

They work.

Without a fuss or without the need for a project owner to set up anything more than a git repository. No need to setup blog or worry about how to publish, store and backup anything etc.

It takes 1 piece of information to fork and send pull requests. A project at GH. Sending stuff to mailing lists often requires creating an account first, and finding the aforementioned mailing list before that. So a PR is just a way to contribute that has a much lower barrier to entry for someone already on GH.

Moving some of Python to GitHub?

Posted Jan 2, 2016 14:28 UTC (Sat) by smurf (subscriber, #17840) [Link]

No,. email is not a perfectly acceptable method for submitting a pull request. Not when compared to the time-saving (and mistake-preventing) things github does with a pull request, such as telling me whether there'd be a merge conflict or running Travis CI checks on it.

Non-controversial change / bug fix? three clicks and I'm done. Do that with a patch / git repository link in email.

Moving some of Python to GitHub?

Posted Dec 4, 2014 5:43 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

You left out nearly useless search. I use both bitbucket and github on a daily basis and GH's search features are a compelling feature IMHO.

How about using hggit and the like?

Posted Dec 3, 2014 22:15 UTC (Wed) by saffroy (guest, #43999) [Link]

Maybe that was mentioned in the original thread, but the article left me wondering: isn't is possible nowadays to use the Github service with a Mercurial client?

I happen to have created my first (only) Github repo with hg and hggit (http://hg-git.github.io/), but maybe it's not enough to work with pull requests?

How about using hggit and the like?

Posted Dec 4, 2014 3:41 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

I think I remember seeing something like that, but I'm someone who has a 'bitbuckethg:' alias in git to 'hg::https://bitbucket.org/'. If hg has something similar, setting such an alias might be worth it.

Moving some of Python to GitHub?

Posted Dec 4, 2014 0:40 UTC (Thu) by josh (subscriber, #17465) [Link]

This seems like two separate issues. Moving to Git: completely sensible, and the alternatives don't make sense for the exact reasons mentioned in the article: "A DVCS repo is a social network, so it matters in a functional way what everyone else is using." On the other hand, moving to GitHub hardly seems like the best choice; any number of git hosting services would work just as well, as would hosting it on python.org infrastructure.

Moving some of Python to GitHub?

Posted Dec 4, 2014 6:07 UTC (Thu) by pabs (subscriber, #43278) [Link]

The comment about shows how it doesn't matter as much as people think it matters. I can use git-hg to avoid having to learn Mercurial and Mercurial people have something similar IIRC.

Moving some of Python to GitHub?

Posted Dec 4, 2014 6:01 UTC (Thu) by pabs (subscriber, #43278) [Link]

mako has an excellent post about this entitled "Free Software Needs Free Tools":

http://mako.cc/writing/hill-free_tools.html

popular

Posted Dec 4, 2014 9:13 UTC (Thu) by marcH (subscriber, #57642) [Link]

> But Stufft is concerned that taking the trouble to move, but choosing the less popular site, makes little sense.

Same as: Linux is not popular for the desktop, so using it there makes little sense.

(good to see other points were more elaborated)

popular

Posted Dec 9, 2014 7:25 UTC (Tue) by ploxiln (subscriber, #58395) [Link]

Yes. This is exactly my objection to the "popularity" argument.

Anyone making conscious explicit use of open-source, and contributing to open-source, around 10 years ago, was doing one of the least popular things in the world. Today it's more common, but it wouldn't be, if just "pragmatically" preferring the popular thing was how we all operated.

Distributed social networks can lead away from the trap

Posted Dec 4, 2014 10:06 UTC (Thu) by ber (subscriber, #2142) [Link]

In the mid term distributed social networks, funding and development functions are the way to avoid lock-in. The best technical approach I know for distributed social networking is http://pump.io. For distributed funding I think flattr's subscribe function is good and for development function we would need loose coupling of sites like sourceforge to enable cross dev-site requests.

To be fair on pump.io, it is in alpha. (My company runs a small beta service at https://io.intevation.de and we have an alpha bridge to other services at https://pumpbridge.me .) But it shows that the problem can be solved with resonable effort.

Distributed social networks can lead away from the trap

Posted Dec 4, 2014 17:18 UTC (Thu) by pj (subscriber, #4506) [Link]

I keep hoping that mediagoblin will evolve into a good distributed social network.

Distributed social networks can lead away from the trap

Posted Dec 6, 2014 18:33 UTC (Sat) by lamawithonel (subscriber, #86149) [Link]

This is what I was thinking. GitLab et al. really just need an open protocol/API for exchanging user metadata in conjunction with pull requests. All of GitHub's great features stem from the fact that it can track the network of users and pull requests. For example, their [network graph](https://github.com/blog/39-say-hello-to-the-network-graph...). The integration of ticketing helps, too.

So The One Aspect That Is Not In Dispute...

Posted Dec 6, 2014 2:52 UTC (Sat) by ldo (guest, #40946) [Link]

...is the move away from Mercurial towards Git. All the argument is over where to host the public Git repos and associated infrastructure.

But the Blender folks manage to host their own development service just fine (at developer.blender.org)—is it that hard to do your own? That’s a project with a bit over a million lines of source code and perhaps around a hundred active developers. How big is Python?

So The One Aspect That Is Not In Dispute...

Posted Dec 6, 2014 5:46 UTC (Sat) by dlang (guest, #313) [Link]

writing code and managing systems are different (but somewhat related) skillsets.

It's reasonable to say that they don't want to spend the resources to run the servers and just concentrate on the code.

that said, github is a reasonable place for your source, discussions and buglists, but you can't do thinks like host package repositories there (at least as far as I know)

So The One Aspect That Is Not In Dispute...

Posted Dec 8, 2014 16:25 UTC (Mon) by ber (subscriber, #2142) [Link]

There were arguments in favour of keeping Mercurial because it is written in Python (and slightly easier to learn and operate).

And one strong argument against Github is their locking-effect, with them running proprietary code.

Re: n favour of keeping Mercurial

Posted Dec 9, 2014 0:29 UTC (Tue) by ldo (guest, #40946) [Link]

I suspect that they found that the majority of contributors were in fact already using Git, interfacing to upstream Mercurial repos via git-remote-hg. So those clinging to Mercurial were starting to look somewhat ... isolated.

Also, Git ≠ GitHub; using the former doesn’t mean you have to use the latter.

So The One Aspect That Is Not In Dispute...

Posted Dec 8, 2014 23:14 UTC (Mon) by fb (guest, #53265) [Link]

> But the Blender folks manage to host their own development service just fine (at developer.blender.org)—is it that hard to do your own? That’s a project with a bit over a million lines of source code and perhaps around a hundred active developers. How big is Python?

The difficulty of hosting is not really a point if what you're after is just benefit from the network effects of GitHub.

A few years ago Red Hat's JBOSS/Wildfly migrated 'en masse' from IIRC self-hosted SVN repositories to GitHub. I don't think they were trying to save hosting costs...

So The One Aspect That Is Not In Dispute...

Posted Dec 8, 2014 23:52 UTC (Mon) by dlang (guest, #313) [Link]

other than the name recognition, what causes this "network effect" you are talking about?

I've seen other projects migrate to GitHub with the proponents claiming huge benefits from the 'network effect' and the increased contributions that the project would get, only to see no change at all as a result of the migration.

I'm not opposed to using GitHub, they do offer a good service. But treating it like it's a social network or other proprietary system that isolates people from things not on GitHub just doesn't ring true to me.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 6:04 UTC (Tue) by fb (guest, #53265) [Link]

> other than the name recognition, what causes this "network effect" you are talking about?

I got a paid account there for private personal projects because i. they sell private repo hosting for individuals (i.e. gitorious does not), ii. it was git, iii. the interface was good enough.

Then I change jobs and find myself as part of a project that wanted to move from SVN to Git, I was the only team member that had used git extensively before. Who drove the migration choices? I did ;-) Where did the team ended up? At GitHub.

So the trick with GitHub is: 1. they are popular, in fact the most popular DVCS host service; 2. their users are (for all I can tell) happy with what they get. So they have this huge net promotor score.

[...]

Their popularity also drives people into GitHub for the fact that it lowers the barrier to entry for new developers as -odds are- they should already be familiar with it.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 8:24 UTC (Tue) by dlang (guest, #313) [Link]

The network effect is more than just popularity, it's that the functionality of the network increases based on the number of people using it (or more precisely, based on the square of the number of people using it). This is based mostly on the fact that you can only communicate effectively with people on the same network.

File formats have a similar effect, as they affect who you can exchange files with, even over other networks.

There's nothing on github that's any sort of lock-in. GitHub is a good service and it deserves to be used because they offer a good service at a good value.

But that's not the "network effect"

The closest thing I've seen mentioned is people who only look on github for software, and will use old versions found there instead of newer versions hosted elsewhere. I'm not really sure that qualifies as being such a strong draw.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 13:13 UTC (Tue) by mchapman (subscriber, #66589) [Link]

> There's nothing on github that's any sort of lock-in.

I would disagree with that.

Their "issues" and "pull requests" systems, amongst other features, are vendor lock-ins. You can't even raise an issue or submit a pull request without a GitHub account. To submit a pull request, you must first have forked the repository properly into your own account. For project owners, getting your data out of these systems and into others could be difficult. I do believe the data is available through some APIs, however.

As someone who is used to interacting with Git-based projects through patches and code review conducted over email (with git-send-email, etc.), I find these aspects of GitHub to be rather problematic.

I do admit these are compelling features for project owners though.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 14:33 UTC (Tue) by mbunkus (subscriber, #87248) [Link]

> I do believe the data is available through some APIs, however.

That is correct. They have a HTTP+JSON-based API that allows retrieval of all issues (pull requests are issues on github) as well as the other way around (creation and modification of issues). See https://developer.github.com/v3/

Wiki pages are just one special branch of your repository; so there's no problem exporting those either.

> As someone who is used to interacting with Git-based projects through patches and code review conducted over email (with git-send-email, etc.), I find these aspects of GitHub to be rather problematic.

No one prevents you (or any other project hosted on github) from keeping that workflow. Even if someone submits a github-based pull request you can still do all the necessary steps without github (add the remote, pull, compare diffs/view logs, merge, push). Here github doesn't replace existing functionality but extends it.

One feature that I don't think you can archive easily, though, is the »comment on arbitrary lines and patches«. On github you can view a diff and add comments directly beneath the line you're referring to. I quite like that feature as it combines the context with precisely pointing out what you have something to say about. I vastly prefer this over email. But like I said I don't know if there's a way to export that data somehow.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 14:47 UTC (Tue) by mchapman (subscriber, #66589) [Link]

> No one prevents you (or any other project hosted on github) from keeping that workflow. Even if someone submits a github-based pull request you can still do all the necessary steps without github (add the remote, pull, compare diffs/view logs, merge, push). Here github doesn't replace existing functionality but extends it.

No, but it's an impediment to me as an ad-hoc contributor to *other people's* projects.

With an email-based workflow, I can clone a repository, patch it, then git-send-email to the appropriate mailing list. In most cases, I don't even need to be subscribed to that list.

With a GitHub pull-request-based workflow I need a GitHub account (I've been resisting getting one for myself), I need to make sure I explicitly "fork" the repository within GitHub (simply pushing my copy of the repo to my account won't make pull requests work, as far as I know, because GitHub doesn't know that the original project and my project are "linked"), and I need to use the GitHub web interface to actually generate the pull request and take part in its review. If all of this isn't vendor lock-in, I don't know what is.

I've got bigger problems with the GitHub pull request workflow anyway. If you generate a pull request, discover that changes need to be made, you have two choices: you can create a new pull request, losing all comments from the previous one, or you have to add new commits. If you drop the to-be-pulled branch from your repository and replace it with a different branch with the same name, the pull request loses all of its comments.

So The One Aspect That Is Not In Dispute...

Posted Dec 10, 2014 4:09 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> the pull request loses all of its comments.

This seems to have been fixed since you last looked (I remember such behaviors). The old commits are now collapsed as "older diffs" which can be expanded to see the comments. The biggest issue is with deleting entire repos.

So The One Aspect That Is Not In Dispute...

Posted Dec 10, 2014 8:01 UTC (Wed) by mchapman (subscriber, #66589) [Link]

> This seems to have been fixed since you last looked (I remember such behaviors). The old commits are now collapsed as "older diffs" which can be expanded to see the comments.

If that's the case, then it certainly has improved!

> The biggest issue is with deleting entire repos.

Yeah, I hit that one too. As I said before, I'm an "ad-hoc contributor" to many projects, which means I am very likely to remove a forked repository as soon as my branch has been merged. At the same time, though, I don't want the comments on the pull request to become lost to future contributors.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 14:41 UTC (Tue) by fb (guest, #53265) [Link]

> I do believe the data is available through some APIs, however.

They have a pretty decent HTTP-based API, pulling out all metadata from a project is trivial if you are comfortable issuing GET/POST requests.

https://developer.github.com/v3/

> To submit a pull request, you must first have forked the repository properly into your own account.

Can you imagine the amount of spam people would get if you could just post/upload without an account?

> As someone who is used to interacting with Git-based projects through patches and code review conducted over email (with git-send-email, etc.), I find these aspects of GitHub to be rather problematic.

I understand that many are used to this workflow, my impression is that most projects today are using web interfaces to manage PRs etc. So I think this is more about one workflow losing relative popularity to another.

_That_ said, you _can_ create PRs using command line tools, and you can answer to review comments using email (it gets included in the code review at the web page). I reckon that since most people are using the web interface, using email for PR reviews/comments is not as fine tuned as it could be.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 15:04 UTC (Tue) by mchapman (subscriber, #66589) [Link]

> Can you imagine the amount of spam people would get if you could just post/upload without an account?

I'm subscribed to a number of projects' mailing lists. As far as I know, all of them allow posting without subscription, and they're mostly unmoderated. Spam has almost always never been a problem.

> I understand that many are used to this workflow, my impression is that most projects today are using web interfaces to manage PRs etc. So I think this is more about one workflow losing relative popularity to another.

Actually, I'm quite happy with the issue stuff being on the web. That's the case with most projects, even those not using GitHub, and there's just as much lock-in with all the other solutions.

No, I find the bigger problems are with pull-request based workflow that GitHub uses -- specifically, how that workflow interacts with code review. If your branch is reviewed and it needs modifications, then these modifications *should* be made to the original commits (not just tacked on as extra commits), which necessarily means the branch will be rebased. GitHub's workflow breaks completely when you rebase branches.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 16:35 UTC (Tue) by fb (guest, #53265) [Link]

> I'm subscribed to a number of projects' mailing lists. As far as I know, all of them allow posting without subscription, and they're mostly unmoderated. Spam has almost always never been a problem.

I expect that posting spam to web pages is a different kind of spam than posting spam to mailing lists.

I imagine that one gets more spam when the end result is having your spam text getting hosted at some high profile web page. Mailing list archives are not as high profile.

At least the zsh-wiki used to get a lot of spam (years ago when I was an active visitor/contributor).

> GitHub's workflow breaks completely when you rebase branches.

(don't get me wrong, I like using rebase and use it *all the time*)

Most git users I know are scared to death of doing a rebase and won't touch or use it at all.

In general, I think code review at GH has a lot to improve. I also use code-collab at work, much better in some areas. GH doesn't have enough features while code-collab has too many features....

I'm still waiting for a (web) code review tool that is both powerful and easy to use.

So The One Aspect That Is Not In Dispute...

Posted Dec 9, 2014 22:31 UTC (Tue) by MrWim (subscriber, #47432) [Link]

GitHub's workflow breaks completely when you rebase branches.

This used to annoy me too. It turns out you can make comments on changes which are bound to the pull-request rather than the commit. This means that they will be preserved even if you rebase and force-push. For an example see "Show outdated diff" on stb-tester pull-request 220. The trick is to click on "Files changed" at the top of the pull-request to make your comments rather than making the comments on the individual commits.

It's not perfect by any means. You still lose much of the history of the changesets but at least the review comments are preserved.

So The One Aspect That Is Not In Dispute...

Posted Dec 10, 2014 4:13 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> As far as I know, all of them allow posting without subscription, and they're mostly unmoderated. Spam has almost always never been a problem.

I maintain one low traffic list. You'd be surprised at the amount of spam that gets trapped in there.

> GitHub's workflow breaks completely when you rebase branches.

I rebase all the time and haven't had issues? It's better than email here because I have to look for all the old threads to see what was said about it before. I've also used other tools *cough*gerrit*cough* which do handle rebases very poorly and either email or GitHub is vastly preferable.

Re: benefit from the network effects of GitHub

Posted Dec 9, 2014 0:23 UTC (Tue) by ldo (guest, #40946) [Link]

Network effects don’t have to be either/or: in fact, the more places you can find the code, the better.

For example, Linus Torvalds has a GitHub account, and puts a copy of the Linux kernel there. But nobody is suggesting that they get rid of git.kernel.org.

Git makes it easy to pull from any number of places, and push to any number of places.

Re: benefit from the network effects of GitHub

Posted Dec 9, 2014 0:42 UTC (Tue) by dlang (guest, #313) [Link]

> Git makes it easy to pull from any number of places, and push to any number of places.

I agree, which is why this talk about the "network effects" of having it on github are confusing me.

Re: benefit from the network effects of GitHub

Posted Dec 9, 2014 2:19 UTC (Tue) by foom (subscriber, #14868) [Link]

If I want to find the source code of some package, so I can check what it's doing, or make some hack to it, or see when some behavior changed, how do I best do that?

Increasingly, the most useful answer is "search for it on github". Even when the project isn't actually hosted on github, a copy of their software is almost always there.

I used to use apt-get source, but, having the full VCS history is better.

I could use a general search engine, but that often just finds me the project's homepage. And finding a link to their favored git repository isn't always easy from there (Can you find python's VCS from www.python.org? I can't.). And sometimes, like python, they use the wrong VCS, "officially", so even if I could find such a link, it'd be useless.

So, that's why I always use github to find source code, these days.

If the project itself doesn't upload their own repo, perhaps I find an outdated clone someone else uploaded -- so, everyone really should at least keep an up-to-date mirror of their software on github, even if they don't use it for primary hosting.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds