I wish someone wrote django-static-upstream… maybe even… me!

2011/12/28 § 7 Comments

I used to think serving static files (aka static assets) is really easy: configure nginx to serve a directory and you’re done. But things quickly became more complicated as issues like asset compilation, CDNs/scalability, file-specific custom headers, deployment complexity and development/production parity rear their ugly heads. Judging by the huge number of different asset management packages on djangopackages, it seems like I’m not the only one who ran into this problem and felt not-entirely-satisfied with all the other solutions out there. Things actually took a turn for the worse ever since I started drinking the Heroku cool-aid, and for the live of me I just can’t make sense of their best-practice with regard to serving static files. The heroku-django quickstart conveniently sidesteps the issue of statics, and while there are a few support articles that lightly touch on neighboring subjects, nothing I found was spot-on and hands-on (this is an exception to the rule; the Heroku cool-aid is otherwise very tasty and easy to drink, to my taste so far). Ugh, why can’t there be a silver bullet to solve all this? Let me tell you about my “wishlist” for the best static serving method evar.

First, I want to be able to take any checkout from my VCS, maybe run an easy bootstrap function, and get a working development environment with statics served. In production, I want to serve my statics from a CDN with aggressive caching, so I need some versioning system, but I’d like to minimize deployment complexity and I want fine-grained cache invalidation of my statics. I want my statics to be served the same way (same headers, same versioning mechanism) in development and production without having to update two different locations (i.e., my S3 syncing script and my development nginx configuration). I also don’t want to have to “garbage-collect” my old statics from S3 every so and so days. Like I said, I’d like some of my statics to be served with some bells and whistles, like various custom headers (Access-Control-Allow-Origin, anyone?) or gzip compression. Speaking of bells and whistles, how about a whole marching band, since I want to serve statics that require compilation (minify/concatenate/compile scss+coffe/spritalize/etc), but I don’t want to have to rerun a ‘build process’ every time I touch a coffee script file in development. Finally, and this isn’t something I’m not very adamant about, but I prefer my statics to be served from a different subdomain (static, not www), I think it’s cleaner, I don’t need my clients’ cookies with every static request and it allows for some tricks (like using a CDN with support for a custom origin).

And nothing I found does all that, definitely not easily. In my dream, there’s a package called django-static-upstream, which is designed to provide a holistic approach to all these issues. I’m thinking:

  • a pure Python/django static HTTP server (probably just django.views.static with support for the bells and whistles as mentioned above), and yeah, I think I should bloody use this server as a backend to serve my in production
  • a “vhost” middleware that will replace request.urlconf based on the Host: header; if the host starts with some prefix (say, static), the request will be served by the webserver above
  • a couple of template tags like {% static "images/logo.png" %} that will create content-hashed links to the static webserver (i.e., //static.example.com/829dd67168a3/images/logo.png); the static server will know to ignore the content-hash bit
  • this isn’t really up to the package, but it should be built to support easily setting a custom-origin supporting CDN (like CloudFront) as the origin URL; this is both to serve the statics from nearby edges CDN but also (maybe more importantly) to serve as a caching reverse proxy so the dynamic server will be fairly idle
  • support for compiling some static types on the fly (coffeescript, scss, etc) and returning the rendered result; the result may have to be cached using django’s cache (keyd by a content hash), but this is more to speed up multi-browser development where there is no CDN to serve as a reverse caching proxy than because I worry about production

So now I’m thinking maybe I should write something like this. There are two reasons I’m blogging about a package before I even wrote it; first, since I wanted to flesh out in my mind what is it that I want from it. But second, and more importantly, because I’d like to tread carefully (1) before I have the hubris to start yet another assets related django package, (2) before I start serving static content with a dynamic language (what am I, mad?) and (3) because compiling static assets on runtime in violation of the fifth factor in Adam Wiggins’ twelve-factor app manifesto (what, you didn’t read it yet? what’s wrong with you?). These are quite a few warning signs to cross, and I’d like to get some feedback before I go there. But I honestly think I’ll be happier if I had a package to do all this, I don’t think writing it should be so hard and I hope you’d be happy using it, too.

The ball’s in your court, comment away.

Mailing list debates considered harmful

2010/05/29 § 24 Comments

Recently, one particular thread at python-dev has piqued my interest. The thread discusses the possible pronouncement of PEP 3148 (re. the addition of futures to stdlib), and has been going on for over a hundred messages with no end in site. I’d like to use some examples from that thread in this post, but it could have been another thread from another list. I do not want to discuss the contents of the PEP itself or even the contents of the discussion – if I thought I had interesting opinions about these matters, I’d say them in public on the mailing list. I am interested in talking about the form of the discussion, which I think models many similar discussions in other threads and other mailing lists. To make a long story short, I feel the broad OSS community is largely under equipped for massive online debate.

We need a better way to debate in large, decentralized and open groups, because this is a lot of what we do, probably more than actually coding. Mailing list, newsgroups and bug-tracker comments are a good enough place to debate things which are one or more of (a) under a relative consensus, or (b) not too interesting to many of the listening audience, or (c) too arcane to actually be debated by many of the listening audience. The problem lies with the “hot” topics, about which it seems everyone has something to say and which are arguably the most important topics to debate effectively (to be able to do things, to conserve time, etc). The nature of these large, decentralized and open debates is that the participants vary a lot from one another. Some are well known members of the community and some passersby, some of them are better informed and some don’t support your opinion, some of them easier to take offence and some easier to deal it, some of them care a lot about the subject under discussion and some have nothing good on TV, some read and reply to new posts in near-real-time and some are married. In these kinds of debates you see the tools we currently use in all their crippled anti-glory; they’re just not meant to handle these things.

Because our tools don’t rise to the challenge, once a discussion goes beyond a few posts it’s hard to keep track of who is arguing for and who is arguing against (indeed in many cases people may do both). It’s hard to gauge whether the silent majority generally agrees with any particular argument (‘+1’ and ‘-1’ are a hack to alleviate this, not a terribly good one). Gauging the correct weight to attribute to what people are saying (and people vary a lot, as I said) is difficult and relies mostly on one being familiar or Googling the speakers. It’s even hard to simply keep the discussion on the same track: more often than not, the discussion creeps onto completely unrelated issues (maybe important ones), and ‘steals’ the focus from the original, probably also-important, subject. In short, we’re arguing in a mess and I’m not sure we apply our hackerish instincts of science, openness and rule-of-the-competent to how we make decisions.

I’m by no means an expert on these matters, so I Googled a bit (wasn’t sure which keywords to use) and looked at Wikipedia, the source of all knowledge which covers everything and is never wrong or inaccurate(™). It seems to me like the challenge of harnessing collective intelligence is well known, but it doesn’t seem like any of the vast array of ‘collaboration software‘ solutions out there tackles this particular issue in manner that befits a group of hackers working on an OSS project (what’s the name for a group of hackers? a pod of hackers? a pack of hackers? a pride of hackers? but I digress). IBM has a natural-language-processing 100,000 ton behemoth which can crunch the collective opinions of hundreds of thousands of people, but in typical IBM fashion it seems to also require a few millions dollars and the sacrifice of a virgin PhD (abundant resource) to operate. If indeed I didn’t miss an existing solutions, it seems obvious that a solution is needed. Maybe I missed a good solution, but the fact of the matter is that hackers around the globe still debate in mailing lists, groups, wiki-discussion-pages, tracker issues, forums, IRC channels, etc, all of which seem to me like the wrong tool for the job.

I think we need a rather simple solution, and luckily I think we’re pretty much the right crowd to implement it. It appears to me that even a naive analysis of the problem is sufficient to implement a ‘good enough’ solution, or at least a ‘better than now’ one. Consider this: a single subject (introduce futures into stdlib) can be debated from many different angles: Is the terminology chosen adequate, or should futures be called something else in Python? Should this be a part of stdlib or a part of PyPI? Should futures be implemented as one object or two separate instances, one on the emitting side and one on the consuming side? Each one of these angles can be argued for or against: futures is a good name because it’s a well-established name in computer science, futures is a bad name because Python already has __future__, and so on, and so forth.

I’m thinking of a tool a bit like Rietveld, only not intended to discuss the very specific implementation aspects of patches, but rather to discuss opinions. I think my naive analysis of ‘a debate’ reveals three abstractions: “what’s under discussion” (PEP 3148), “why don’t we just accept whatever it is that’s under discussion as-is” (because futures is the wrong terminology, etc) and “who has an opinion about any of the ‘why’s” (I do, I think ‘deferreds’ is a better idea, etc). I think these three abstractions cover most of what we need to implement a ‘debating tool’, if coupled with some development of the idea that people’s opinion carry different weights (the ‘who’s differ; Guido carries more weight than I do, and thought should be given to how the implementation represents that). My assessment is that implementing something like this shouldn’t be too much work for web-technology-savvy people, even a small team could probably finish it in one sprint.

I agree that some questions remain about the interaction of this with current tools (this isn’t a mailing list or bug tracker or anything else’s replacement, much like Rietveld doesn’t replace any of them), as well as maintaining ‘law and order’ in the discussion (among the target audience I’m talking about, I think this isn’t such a big deal). Also, there are lots of possible additions to this concept, and maybe some should be added, but I think simplicity is key here. Part of the whole idea is that it won’t turn into incredibly complex ‘jamming’ software. And finally, no, I don’t know how to monetize it other than adwords and no, I don’t mind if you implement it so long as you keep it free as in speech. That’s mostly it, I could post the post right now and be reasonably satisfied with it. Ten minutes later someone would come along, point me at a website that already does it, and then I’d just have to advocate using that website on python-dev (the debate over whether to use the debating-website for future debates is a self-hosting one). But no. I didn’t just write this post and let ‘good enough’ be; I had to expose myself to public humiliation by creating a mockup UI for this imaginary website.

I hope that in this case a picture will be worth more than the thousand words I wrote here (1250+ words after editing, I must be doing something wrong), so here goes (*gulp*, I know it’s not the best mockup UI you’ve seen):

Mockup UI for a technical debating website


Thanks for listening, I’ll await your comments.

EDIT: I found this on the interwebs. Not exactly what I was thinking of, and maybe not streamlined enough to really use as a debating tool in online communities, but probably better than a mailing list.

Where Am I?

You are currently browsing the Ideas category at NIL: .to write(1) ~ help:about.