Heroku is great! However…

I like Heroku. We’ve recently made our first deployment on it, and all things considered, I don’t think we could’ve made a better platform choice at this time. Deploying to Heroku taught me quite a few things, easily the most important of them was hearing about the 12 Factor App methodology. If you haven’t heard of it1, it’s basically a way to design web applications thus that deploying them will be less of a pain in the arse. The methodology’s manifest was written by @hirodusk, Heroku’s CTO, who probably knows something about cloud deployments. The best thing about it is that it’s elegantly platform/language neutral – so theoretically you don’t need to deploy a 12 Factor App to Heroku, you can deploy to any 12 Factor App compatible PaaS (had there been any), or even roll your own 12 Factor App platform and deploy there. Then again, why not focus on your core business and pay Heroku to use their awesome 12 Factor App platform? Well, ugh. Like I said, if you’re thinking about deploying to Heroku then you probably should, but I’d like to warn you that I don’t think Heroku is an excellent implementation of a 12 Factor App platform.

It’s a good implementation, yes, sometimes maybe just good enough. It saddens me to say it, but I feel there’s significant disparity between the force and clarity I see in the 12 Factor App theory and what I perceive as murky, surprising or faulty implementation details in the Heroku practice. I will try to illustrate exactly what I mean in this post, after these public service announcements: (1) I’ll be using 12 Factor & Heroku jargon freely, so you’d really gain the most from this if you’re familiar with both; (2) the Heroku deployment I did is in Python, so the only stack I know is Celadon Cedar; (3) some of the pain points I mention are related and may be fixed by a single Heroku change, but they still hurt in different ways and finally (4) I’m not looking for cheap shots: I know Heroku’s been grilled for their recent outages, but if you dig in my Twitter feed you’ll see proof that this post has been in the making for a while now, and has nothing to do with this or that outage (maybe a topic in itself, but not now).

Handling of static assets is a broken mess

Other people wrote enough about strategies to deal with the fact that Cedar has no ‘official’ way to serve static files as Bamboo had. I can’t say I’m happy Heroku’s own documentation about this important issue – for example, the introduction to django document conveniently sidesteps the issue altogether. But of all the various solutions out there, I didn’t find one that alleviates the much uglier (from a 12 Factor App perspective) underlying issue. See, your statics will either reside in your slug or elsewhere. If you plan on putting them in the slug, you can either compile2 them during slug creation (with a custom buildback, what else), or locally on your computer and push them. Slug creation time is at best an annoying environment to work with and debug on, if not for anything else (and there is a lot of else), then because of the lack of Polyglotism and the prevalence of multi language asset management tools (also, good luck building Flash/Silverlight/other necessary evils during slug creation).

So you opt to build locally, ignoring the alarm bells as you shatter factors I, V and X (at least). How will you get the built statics to Heroku? You’ll commit and push build products to your VCS, right <puke/>? And who guarantees two deployments will be identical, since you’re developing on Linux and your coworker on OSX? This goes on. The ramifications of serving your statics from an external storage service are the same, but now you also need to get your files to the storage service. But how? You aren’t supposed to have your S3 secrets on your dev machine, and Heroku’s support for configuration-during-slug-creation has serious caveats and a frightening warning attached to it. This is a serious wart, and I don’t think writing documentation will solve it (it will be better than nothing!), because I don’t see how this can be solved with the building blocks Heroku offers today (the buildpack mechanism as it is, git transport, etc). An nontrivial change is needed for truly awesome 12 Factor static support. I don’t always expect much from my PaaS, but when I do, it’s because I pay five cents per dyno hour.

There’s no buffering proxy in Cedar

This one isn’t even funny. If you’re not sure what a buffering reverse proxy is or why you need one if your server uses sync workers, you must read this. Summarizing for the impatient, sync workers are resource hogs, so you never want them idling about waiting clients (which might be slow). A common pattern is to have a cheap async “thing” buffering between your sync workers and the wild Internet. As far as “async things” go, nginx is a great choice, and indeed Heroku’s previous stack, Bamboo, used to use it. However, in Cedar it disappeared, with very serious ramifications. Don’t get me wrong: I’m not too happy with Bamboo’s behaviour, either: it’s very possible an app wouldn’t want buffering: sometimes you want to read the request as it comes along (for example, for various Comet techniques), and you’re designed to handle it (mostly by making the app itself your “async thing”). Bottom line, a one-HTTP-layer-fits-all treatment is at best an inconvenience, and possibly something much worse.

Indeed, you could solve this by simply using async workers, but one does not “simply” use async workers (there are serious ramifications which are out of scope for now). Choosing your threading model based on a missing feature in your platform sucks, not to mention if you already have a working and field-proven codebase you’re migrating to this platform. I guess the best approach to solve this would be to have a Routefile at the root of the project describing the routes to reach your app: this URL over buffered HTTP, this URL over raw HTTP, maybe even that port over plain old raw UDP. But even if you don’t have this feature, I’d expect warnings, explanations and best-practices to be sprayed all over the documentation. Alas, the documentation doesn’t help even one bit (on the contrary sometimes; I judge this FAQ answer to be somewhere between misleading and plain wrong). To top it all, it seems that activating Heroku’s SSL support suddenly makes you go through something which smells to me like remnants from Bamboo, with sudden appearance of nginx and buffered requests. Again, all in utter silence from the documentation.

Polyglot programming’s great, but not on Heroku

I hear the term polyglot programming in relation to Heroku all the time. “A true polyglot platform”, they say. I don’t get these claims. Heroku isn’t polyglot. The buildpack mechanism (a good idea in itself) is built in such a way that buildpacks are given a chance to detect if your push matches their language, but the first matching buildpack is used to create your app’s slug, ignoring all other buildpacks and potential languages. So your app really has to be in one language. The workarounds are horrendous, the least-horrific one is done by the official Ruby buildpack which “vendors” node.js into the slug if a certain gem (execjs) is required in the Gemfile. Of course, it’s just a package specific hack, it inflates the slugs needlessly, and since it’s just a hack, similar support isn’t built into the Python buildpack for the PyExecJS Python package. Admittedly, for the latter point, the brilliance of custom buildpacks eases the pain it sure as heck doesn’t cure it.

A thousand cuts: the smaller things

  • I think separating you from your application (i.e., you’re unable to ssh into a running slug) is brilliant and leads to better design overall. But when I worked on XIV’s grid storage product, I fought to keep controlled ssh access to the discrete storage modules open (for us developers, not for customers), and I think Heroku should provide the same. Show me a big warning, let me click through whatever, but then let me bloody attach my strace/pdb/etc to my running instance.
  • I applaud factor X, dev/prod parity, but Heroku and especially its add-ons force me to integrate with backing services with unknown and possibly changing configuration. I think a Vagrant box providing a Cedar-like dev environment with some add-ons could help here. This bit me once as I misused the parsed results of REDISTOGO_URL and ignored the password (it was a bit more complex than that, but I see you yawning). A trivial bug really, but it was time consuming to diagnose, because it worked fine against my password-less development Redis (which now does require a password, thank you very much).
  • Factor III, configuration, says you should store configuration in the environment. But Heroku offers nothing to help you set this environment up during development, and share it across a team of five developers. Sure it’s a small thing to arrange on your own, but together with other small cuts regarding local development, I feel Heroku does too little to accommodate multi-developer teams, period (update: since I first wrote this, @kennethreitz wrote autoenv, which is a good first step in the right direction).
  • Putting everything (your language’s runtime, “vendored” binaries, possibly static assets) in your slug makes it easy to go past the 25mb warning mark, and forces you to be tight fisted about slug size. I understand the cause of this limitation, but if every dyno had more stuff built into it, things could be easier and I wouldn’t be counting mere megabytes of disk space in 2012 (just node, Python, Ruby, some C libraries… don’t think I’m asking for much).
  • I’ve been pulling my hair out wondering why my video processing worker dynos are getting R14 “Memory Quota” exceeded warnings, and only after I instrumented my code to take memory snapshots (ps -ef + cat /proc/meminfo at relevant timings) I got to suspecting that buffer cache is counted against me in my memory quota (why isn’t this documented? why can’t Heroku’s support resolve my ticket asking for explanations regarding my ps -ef outputs for a few weeks now?). We get false R14 positives all the time.
  • git is a terrific DVCS, but a poor deployment transport – see for example above how it complicates things with deploying compiled static files, or how it makes pushing private submodules impossible (at least now there’s some labs support for submodules).

There’s more, of course, but this post is already insanely long and I feel I got the main things off my chest. I humbly argue that some of these issues actually have a very material impact on Heroku app development/deployment while being not so hard for Heroku to fix, or at least make much more bearable or even just thoroughly documented. You could say many (all?) these things aren’t so bad if you’re a single developer developing a very early prototype and expecting little load, but I think these issues quickly become much worse in a complex professional project with multiple developers and even modest-medium load, which is who I imagine is Heroku’s prime target audience, rather than some Jekyll blog that gets 50 hits per day.

I’d like to close this post with repeating that I like Heroku, and at our current size, I definitely like it more than the alternatives (because deployment is such a bitch). They have many awesome things going that I didn’t list here, others wrote enough favorite reviews as it is. But Heroku still has a lot of hard work to do to get their implementation of a 12 Factor App platform on par with the elegance of the theory itself. I get the feeling the “D” stack beyond Cedar is right around the corner; I’ll be so happy to use a better Heroku 12 Factor stack. Or anyone else’s 12 Factor stack, for that matter. I’m an unloyal bastard this way. Challenge accepted, anyone?


1 If you’re not familiar with The 12 Factor App, I wholeheartedly recommend you go read it right now, it’s an important read. If you’ve got your act together with regard to web app deployment you probably do many/all of these things instinctively, but still, the 12 Factor App puts things in terseness and clarity I didn’t see before and I feel insisting on it (almost to the letter) really helped me design a better app.

2 The way I see it, the toughest thing about static assets is the fact that most of them became sneakily dynamic without us watching. Sprites, CoffeeScript, LESS/scss, heck, even .swf if you have to use them (webcams, anyone?) – they’re all compiled assets with a source form and a built form (sometimes more than one; minimized vs unminimized CoffeeScript, for instance). I think this complicates separation of build time vs. run time and minimization of dev/prod parity. I wrote more about this previously.


Comments

12 responses to “Heroku is great! However…”

  1. I’m happy to read that I’m not the only one wondering how to manage my static files on Heroku 🙂 Still wondering what is the correct way to do it… Thanks for sharing your thoughts.

  2. Have you happened to learn more about how memory is calculated at Heroku? What command outputs refers to actual usage?

    1. Here’s a part the answer I eventually got from Heroku (it took a while, but the answer is indeed solid; like I said, I like them…):

      The total memory usage that we report consists of “rss + swap + cache”, and in your case the amount of cache memory is what triggers the R14 alerts. Indeed, the large files that your processes manipulate may be the cause for the high cache usage.

      Thanks for bringing this up. It is unfortunate that we currently do not offer a good way for you to introspect your detailed memory usage (swap, cache, etc), but we are going to discuss it internally and hopefully improve the situation soon. Also, we know that you do not have direct control on how much cache memory is being used by your dyno, but rest assured that we are looking at it.

      1. This is actually does not make so much sense. It says: “Memory used above 512MB will go into swap”. So in that case, why are we billed for swap usage? I am not an export on this, but how do we supposed to manage cache memory? Isn’t it at OS level?

        1. Several things:
          1. Indeed, you can not manage cache memory directly, it’s largely up to the kernel and the hoops you can possibly go through to change your cache memory usage aren’t worth it.
          2. The reason I called it “R14 false positives” in the post is that I never saw one of these R14 for much more than 512.0mb. Sometimes 512.5mb, sometimes even 513.1mb, never much more. Why is this important? Because it means little (if any) of my process really goes into swap. Lets say my dyno uses 400mb of RSS memory, and the kernel starts allocating cache buffers for my processes. When the kernel reaches about 112mb of cache buffers, two things happen: (1) the R14 alert triggers and (2) the kernel stops allocating cache buffers, and starts eating into these buffers if I allocate more RSS memory. So the net effect on me is just the alert, not much more. That’s not so good, but not horrific.

  3. I´m also getting this R14 error, i have reduced my gunicorns and i still get about 130% memory usage, i saw you instrumented code to take memory snapshots (ps -ef + cat /proc/meminfo at relevant timings), could you provide specifics on how to do this?
    I´m kind of lost on this issue, i don´t know where to start.

    1. What I did was fairly simple; after gathering some memory information I just stashed it in Redis (I guess pretty much any backing store would do) with a unique key name and wrote the stored key to the log. Later on, I’d look at the log, find an ‘interesting’ situation (right before or right after an R14 error) and lookup the data on Redis using that unique key name.

      See this (rough) code for example. Replace logging with whatever logging infrastructure you use (on Heroku, “print” could do the trick as well) and replace the import and invocation some_map_like_backing_store with whatever backing store you like.

  4. cesarvargas00 Avatar
    cesarvargas00

    Why are you updating your website? It was pretty good!!!

    1. Thanks for the kind words.

      I’m not updating due to many reasons, I just can’t find the time lately. Coming to think of it, I stopped writing roughly at the same time my son was born, probably related. 🙂

      1. cesarvargas00 Avatar
        cesarvargas00

        Ohh, ok… It was for a good reason then!
        Anyways, going to keep the bookmark if for any reason you start writing!

        😀

  5. Hey, did you also consider Openshift (made by RedHat guys) while choosing Heroku? Because, me and my friends were talking about hosting our website somewhere and one of them, Binayak, already used Heroku once. He said it was great, but you need to install their set of toolsets though and (apparently at that time) also db storage is not available without entering credit cards.

    Our criteria was to have a free host with small amounts of everything 🙂
    So, next we checked out Openshift and it was pretty great, with an open platform and all. There is hardly any Openshift-specific setup, you just git push and things work. I am pretty happy with it.

    Well, turns out the post is from 2012. But anyway, I am not going to delete all that I wrote.

  6. Hope you still look at the comments 🙂