Hacker News new | past | comments | ask | show | jobs | submit login
How We Moved Our API From Ruby to Go (parse.com)
151 points by spimmy on June 10, 2015 | hide | past | favorite | 60 comments



Go is a good target to move something to when one knows one's business domain and performance issues.

But I wouldn't recommend it to beginners trying to build a secure web application, because they will have to roll their own code when it comes user registration, authentication, authorization management, and i'm not even talking about writing thread safe libraries where one has to deal with mutexes and shared memory access.

If you need something highly concurrent and you know what you're doing, by all means. But Go is definitely not RAD for classic website development, and it's easy to shoot oneself in the foot with Go. It's by no mean a silver bullet.

I'm surprises Parse is coded in Rails,I assumed to was built on Nodejs, since it uses JS for cloud code.


Good points. Technically we still use rails for our website, and have no plans to move off of it. Rails is great for websites! It's just our API that we rewrote in Go.


So if you had to do it all over again, would you write the API from the start in Go, or would you start with Ruby and transition to Go again?


Hard to say. Ruby really did let us move and ship products insanely fast, without having to sink precious engineering time into boilerplate and standard libraries. Most startups fail, and it's often because they couldn't move fast enough. So I don't think it was a bad choice at the time. We were able to do the rewrite once we had grown up a bit, gotten acquired, hired more engineers, and had a proven business model that was worth investing in.


Did you guys consider Erlang or Elixir? Elixir might have especially been a good fit given the history with Ruby.


If those were my choices I would start with Ruby again, but start planning for a transition earlier.


It's not uncommon (and may be a good idea) to prototype in something like Ruby or Python, and port later when the problem is well understood.


The interesting question for me is, is Go straightforward enough to use through the discovery phase? Or do you really need to use a scripting language to evolve the API rapidly?

I love Ruby and have worked in a high performance Ruby environment, but it can be pretty problematic to scale (and frankly, it was Apache and Solr that made the environment high performance, not Ruby). Clearly Go can scale, but would it significantly slow down early development?


I believe the language is less cumbersome than some of the previous contenders, however there are fewer third party libraries. As it continues to mature, the line will get harder to draw.


That's one heck of a bullseye question. Thanks for the perspective.


Did you consider Node.js at all, and why?


We did consider it. At the time it seemed like a large Node codebase would be too hard to maintain. The tooling is getting better over time though. If we revisited today I think the argument for Node would be more compelling, and if we revisited after ES6 became standard the argument would become more compelling yet.


Any plans to open source the rails->go interop layer as a middleware plugin?

At some point we're going to face this issue, and I'm sure others will too. I'd hate to reinvent the wheel for parsing double encoding etc. that you've already done.


May be Rails 5 will help with that.


Twitter never used Jruby in any significant capacity. Note that the article she cited ( http://www.infoq.com/articles/twitter-java-use ) said we were evaluating jruby, but chose not to use it because the tooling around MRI was much better (in substantial part due to the efforts of twitter's backend team at the time).

We probably could have made a great jruby memcached/MySQL/thrift client, but it wasn't clear that doing so would have much performance win, as jruby itself wasn't dramatically faster than MRI. It would have, however, made it really easy for us to offload intense bits of code to java code, which probably would have been a faster upgrade path than rewriting in scala as we did.


Our production thrift code generator (now at https://github.com/twitter/scrooge) was JRuby for about a year, before Robey decided it was an abomination and rewrote it in scala.

JRuby was easy because you can Maven require it from a Java project. Ruby already has a Thrift IDL parser, so I just stole the AST from that, and used an ERb template to write out corresponding scala. The whole thing was maybe 200LOC.

But yeah, that's the only JRuby that ever did anything production related at Twitter.


ahh, thanks for the clarification. :) (edited the post to be more accurate)


I'll add another data point. We're in the process of moving our entire API stack to Go from Python. First it was in Django, then Falcon, now more & more pieces are in pure Go, with a little cgo sprinkled for good measure. Apart from being a language that's easy to pick up if you're familiar with Python, Go is obviously a heckuva lot faster and way easier to deploy.

We cut down our EC2 instance usage by 2/3 with more improvements yet to come. One machine alone can handle 1000 API calls / second - and our API calls are performing complex calculations, not just disk I/O.

It also allows us to deploy our API within customer's networks if they choose, which we previously accomplished using Virtual Machines - which sucked.


Afaik, the jvm is often praised for being the gold standard in terms of vm, both performance wise and in terms of tools available to instrument it. I'm curious as to why the parse engineers were not looking forward to using it. I've always thought using it was a good point for a language...


In all seriousness, I also wonder the same thing. The JVM is often praised for being the gold standard, because it is. It is superior in every way to anything else, with Erlang's a distant second.

The article mentioned that due to the asynchronous nature of goroutines, instrumentation and metrics publishing was not a problem. I do not think this just applies to Go, but any primarily async languages -- like node.js, etc. However, this is still not close the JVM's incredibly sophisticated support for metrics and insights for your running applications.

Re: Go, I can see the value in having concurrency built-in, but I see http://www.paralleluniverse.co/'s libraries like Quasar and Comstat ultimately becoming the defacto standard in modern day Java programming.

With Java8 in rapid adoption and the upcoming changes in Java9, alongside pragmatic languages like http://kotlinlang.org with incredible tooling out of the box, Java is looking like a mighty fine eco-system to get started with.

That being said, it is extremely frustrating developing on a mature and advanced eco-system. As an engineer transitioning to Java from Python, I constantly have a lingering feeling that I'm not going to deliver idiomatic code and I don't know the best way to do things. Since I'm not aware of why design decisions were made in a certain way, probably to preserve backwards compatibility, I always ask myself why this was the best way and it's a bit difficult to research why. I also needed to familiarize myself with various patterns and Java's idiosyncrasies (1 public class per file, for example).

My 2cents.


Between this and Pivotal moving Cloud Foundry CLI tools to Go, I keep seeing more reasons to add Go to my CV. Ruby will always be fun, but I'm guessing there will be some serious $$ & interest in Go projects in the coming years (even more than currently).


I've been trying to learn java spring boot to add to my CV. How do you feel about java?


There's a million companies using Java, I doubt it'll stop being a good addition to your CV for a long time.


There's also a million companies using .NET. I bet roughly 92% of them are extremely bureaucratic, boring places to work for as well...

Go is used in fewer places, but those places seem to be very interesting/exciting/dynamic places to work for. My kind of places.

YMMV and all that...


That is exactly how I feel about Java: lots of companies use it, so I should know it.

Go is a pleasure to work in. Java, less.


> Go is a pleasure to work in. Java, less.

Sometimes I wish I'd stopped at Go instead of then exploring Racket, Ocaml, Clojure, Haskell, and Scala so I could continue to hold your opinion.

The end result still matters the most to me, but being so aware of how languages are getting in my way and making me do busywork is exhausting.


I only compared Go with Java.


I know, I was just adding on that I am sometimes jealous of you being able to say "Go is a pleasure to work in" when it isn't so much of a pleasure for me anymore.


“The MongoDB Go driver is probably the best MongoDB driver in existence, and complex interaction with MongoDB is core to Parse.”

It's also the only one that's not maintained by MongoDB Inc. Coincidental? :)

PS: And yes, `mgo` by Gustavo Niemeyer is pretty incredible.


Gustavo is the best. ^_^


I think author should see MongoDb C# drivers.


Perhaps unsurprisingly, I'm very fond of the Haskell mongoDB package. Here's an example:

    import Database.MongoDB
    import Control.Monad.Trans (liftIO)
    
    main = do
       pipe <- connect (host "127.0.0.1")
       e <- access pipe master "baseball" run
       close pipe
       print e
    
    run = do
       clearTeams
       insertTeams
       allTeams >>= printDocs "All Teams"
       nationalLeagueTeams >>= printDocs "National League Teams"
       newYorkTeams >>= printDocs "New York Teams"
    
    clearTeams = delete (select [] "team")
    
    insertTeams = insertMany "team" [
       ["name" =: "Yankees", "home" =: ["city" =: "New York", "state" =: "NY"], "league" =: "American"],
       ["name" =: "Mets", "home" =: ["city" =: "New York", "state" =: "NY"], "league" =: "National"],
       ["name" =: "Phillies", "home" =: ["city" =: "Philadelphia", "state" =: "PA"], "league" =: "National"],
       ["name" =: "Red Sox", "home" =: ["city" =: "Boston", "state" =: "MA"], "league" =: "American"] ]

    allTeams = rest =<< find (select [] "team") {sort = ["home.city" =: 1]}
    
    nationalLeagueTeams = rest =<< find (select ["league" =: "National"] "team")
    
    newYorkTeams = rest =<< find (select ["home.state" =: "NY"] "team") {project = ["name" =: 1, "league" =: 1]}

    printDocs title docs = liftIO $ putStrLn title >> mapM_ (print . exclude ["_id"]) docs


I would love to know about Parse's stack's, please, consider sharing with us :) (http://stackshare.io/trending/tools)


The article makes some very good points, and I'm surprised you don't talk more about the deploy advantages for instance. But I was a bit annoyed by little things that I think are inexact:

- the "one-process-per-request" meme along the post applies only to some ruby app servers (there are event loop and threaded models too, think thin, puma, passenger in some modes) and I guess reading between the lines that it's mostly a problem of thread-safety and async support, because of the gems Parse used to have, right? I'm sure that limits options at some point anyway, but the statement is misleading and not really explained, I'd love to hear more details

- I don't understand how the comments in the little Go file snippet applies in any way to "ruby" ; it may be rails caching mechanisms, or a specific gem, but I have a hard time mapping those very specific details to something intristic to ruby, it seems more like grumpy ruby bashing, like you'd have done php bashing 5 years ago

As all rewrite stories, I think there's a part of envy/excitement over the new cool tech you want to use (and that's fair! pleasure give you huge productivity boosts), and also a part of success related to the fact you know the kind of things you failed in the first version, so you won't make the same mistakes the 2nd time.

I'd love to hear finer details on those points! Great article overall anyway


You are totally right, most of the stuff that really hurt us was Rails middleware magic, not Ruby itself. I should have been more precise -- grumpy rails bashing, not grumpy ruby bashing! FWIW we still use Ruby on Rails for our website and it's great for that.

I'm hoping to get some followup posts from the backend eng team on specific interesting problems we ran into during the rewrite.

& yes deploys with go are the freaking bomb :)


It's a decent article but the justification for rewriting is totally discredited by not even having tried JRuby. Many apps drop right in and get true concurrency, better GC, and faster performance for free. It sounds like JRuby wasn't even given a chance, and I know they never contacted the JRuby team to talk about it.


This bit jumped out at me: "200 API servers ... to serve 3000 requests per second". That's only 15 RPS per server. Is that normal for Rails?


Rails can serve data far in excess of 15RPS. I've built apps that serve plenty of responses in <15ms. That's one app server on one CPU core. Of course, if you do a lot in a given request it's pretty easy to get that number down as low as you want.


Those numbers kinda surprised me too.

We serve millions of requests per day and have some slow responses (75+ ms) but on any given day our servers handle 175 requests per second without breaking a sweat. =/


Keep in mind that we're a platform, not a website, and we have very little control over things like schemas, slow queries, inefficient looping requests that our developers send us. We can't optimize 500 thousand apps' queries for them.

So for example if an app does something bad like performing a full table scan on every request against a 300 million collection, so every request to that backend is timing out at 30 seconds, and there are thousands of them per second, well -- pretty soon your fixed pool will be full of requests timing out to that backend.


This is totally a great explanation, thanks! This sort of thing is why I hate working on platforms (but also why it's so in-demand).


(we can, and do, do a lot of things to limit the impact any one app can have ... but there are limits. Async is much better for our API model.)


This should pretty much be able scale with RAM and processor using a preforking web server, you are effectively running many copies of your application and the web server is routing them.

Many organizations scale out too soon or for the wrong reasons, and some just have ineffecient database queries and other things that result in bad request times that they could also optimize -- which helps the user regardless of scale with faster page loads.


we had to way overprovision to handle even momentary spikes in availability from any backing store. we aimed to run at around 20% unicorn worker utilization under normal conditions.


But doesn't this sound less about over provisioning and more about optimizing the code you already had?


optimizing won't help you here, unfortunately. the process-per-request model is fundamentally flawed past a certain scaling point.


Jumped out at me too. I did contract work for 2 years helping the API team that powers the bible app (bible.com), and they peak at over 5,000+ RPS on Sunday morning and never dropped below 2,000 RPS 24/7. Their stack was an old Kohana 2 PHP app run on about 18 physical servers at Softlayer. It boggles my mind that you would need 200 servers for this.


Starting a business with rails makes sense and still do. And later moving to another technology that scales better makes sense too. After all, you will probably be a millionnaire at the time you need to scale your product so why worry ?


It's less about having the money and more of having other resources like time, people, and skills. Or having enough understanding on when you need to execute the growth plan. And suffering the consequences if you time it wrong or execute poorly.

But of all problems to have, there are many worse than exploding growth.


Rails does not require a process per a request model. It is great that you had success moving to Go, but you could have moved to a different model with rails and likely solved your problem.


It is seriously amazing to me that "running MRI rails with threads" was not on the list. At the very least, before ruling these things out completely based upon research alone they should have prototyped some of the easier solutions (and potentially deployed them to one or two machines) to prove their hypotheses. Just saying "JVM tuning is hard, lets re-write the API" is the type of thing that falls into all the typical traps of second system syndrome.

Of course, on the other side of things, everything feels rosy -- but counterfactually all the effort they spent on this could have potentially gone elsewhere if they resolved their scalability issues with Rails in a simpler matter. (Or even better, contributed those solutions back to the community.) This was a move that fortunately worked out, but it sounds pretty high risk to me and is the type of thing that can kill companies if they bet wrongly.


> Stuff like doubly encoded URLs

Could you elaborate on this? It sounds a bit scary. Does this mean that Rails tries to decode a URL several times until it can't be decoded? If so, isn't this problematic if some (arguably crazy) person tries to send "%2F" literally, not "/"? I'm half sure I'm misinterpreting, so here to ask.


I wonder if had Microsoft open sourced C# earlier the choice would have been different. Seems like it.


C# isn't cool.


Didn't even look at Haskell. Does any expert know how Haskell would fare in this situation?


Disclaimer: Not an expert, just someone who has some "Real World" (dayjob) Haskell experience.

Like Go using say warp[0] and async[1] likely with similar performance numbers but with less code, more static typing guarantees, and simpler[2] code. Like Go though, you'd deploy with a static binary. This is just a wild guess though, I would need to know specifics of the Go application they've created.

0: http://hackage.haskell.org/package/warp 1: http://hackage.haskell.org/package/async 2: I find redundant code like Go requires[3] to be more complex. Haskell code can be complex, but simple, straight-forward, not trying to be complex Haskell is very simple and concise. 3: Well, you can use interfaces and lose type safety. Or you can use reflection and make things dog slow.


What tools are you using to do API management on top? Have you seen the Open-Source KONG? http://github.com/mashape/kong


"How We Moved Our API From Ruby to Go and Saved Our Sanity"... Right, so using Go will alleviate severe mental illness? Why isn't Golang all the buzz in professional psychologist and psychiatrist circles?


it's more like, getting paged in the middle of the night over and over and over is a known mental health problem :)


so true. just reading your comment was enough to make me feel frustrated and anxious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: