Posts

Thursday, July 3, 2008

Divmod Tech: Making the "Next Gen" Grade

Last night, after I already posted the latest Twisted in the News, I came across another post that would have made the list had I found it sooner. However, this is a good opportunity to give it a little extra attention.

The title of the post is "Next Gen Web Dev: Playing with Python Twisted/Nevow/Athena" and I gotta say, that made my day :-) Between that post and Colin Alston's post that I mentioned in the News, Nevow had a good week. And people are appreciating it for the right reasons. It may not be the easiest web framework to use and certainly not the best documented, but when you need the flexibility to interact with your (Twisted) web server in particular ways as well as benefit from the functionality that COMET provides, Nevow comes out shining.

It's also refreshing to see new developers entering the community who not only see the potential of these tools (designed with that potential in mind) but are capable of taking advantage of it immediately. If nothing else, the author of that post has motivated me to finally merge the Athena tutorial to trunk in order to bring the publicly available and published content in sync with the new code that's in the branch.


Monday, June 30, 2008

This Word, "Scaling"

You keep using that word.  I do not think it means what you think it means.
        — Inigo Montoya
It seems that everyone on the blogosphere, including Divmod, is talking about "scaling" these days.  I'd like to talk a bit about what we mean ­— and by "we" I mean both the Twisted community and Divmod, Inc., — when we talk about "scaling".

First, some background.

Google Versus Rails

Everyone knows that Scaling is a Good Thing.  It's bad that Rails "doesn't scale" — see Twitter.  It's good that the Google App Engine scales — see... well, Google.  These facts are practically received wisdom in the recent web 2.0 interblag.  The common definition of "scaling" which applies to these systems is the "ability to handle growing amounts of work in a graceful manner".

And yet (for all that I'd like to rag on Twitter), Twitter serves hojillions of users umptillions of bytes every month, and (despite significant growing pains) continues to grow.  So in what sense does it "not scale"?  While that's going on, Google App Engine has some pretty draconian restrictions on how much an application can actually do.  So it remains to be seen whether GAE will actually scale, and right now you're not even allowed to scale it.  Why, exactly, do we say that one system "scales" and the other doesn't, when the actual data available now says pretty much the opposite?

A GAE application may not scale today, but when Our Benefactors over at the big "G" see fit to turn on the juice, you won't have to re-write a single line of your code.  It will all magically scale out to their demonstrably amazing pile of computers — assuming you haven't done anything particularly dumb in your own code.  All you have to do is throw money at the problem.  Well, actually, you throw the money at Google and they will take the problem away for you, and you will never see it again.  It accomplishes this by providing you with an API for accessing your data, and forbidding most things that would cause your application to start depending on local state.  These restrictions are surprisingly strict if you are trying to write an application that does things other than display web pages and store data, but that functionality does cover a huge number of applications.

Rails, on the other hand, does not provide facilities for scaling.  For one thing, it doesn't provide you with a concurrency model.  Rails itself is not thread safe, nor does it allow any multiplexing of input and ouptut, so you can't share state between multiple HTTP connections.  Yet, Rails encourages you to use "normal" ruby data structures, not inter-process-communication-friendly data structures, to enforce model constraints and do other interesting things.  It's easily to add logic to your rails application which is not amenable to splitting over multiple processes, and it's hard to make sure you haven't accidentally done so.  When you use the only concurrency model it really supports, i.e. locking and using transactions via the database, Rails strongly encourages you to consider your database connection global, so "sharding" your database requires significant reconsiderations of your application logic.

These technical details are interesting, but they all point to the same thing.  The key difference between Rails and GAE is the small matter of writing code.  If you write an application with Rails, you probably have to write a whole bunch of new code, or at least change around all of your old code, in order to get it to run on multiple computers.  With GAE, the code you start with is the code you scale with.

Economics of Scale

The key feature of "scalability" that most people care about is actually the ability of a system to efficiently convert money to increased capacity.  Nobody expects you to be able to run a networked system for a hundred million users on a desktop PC.  However, a lot of business people — especially investors — will expect you to be able to run a system for a hundred million users on a data-center with ten million cores in it.  Especially if they've just bought one for you.

Coding is an activity that is notoriously inefficient at converting money into other things.  It's difficult to predict.  It's slow.  But most unnervingly to people with money to invest, pouring money on a problematic software project is like pouring water on an oil fire: adding more manpower to a late software project makes it later.  If you have a hard software problem, you want to identify it early and add the manpower as soon as possible, because you won't be able to speed things along later if you start running into trouble.

So, the thing that pundits and entrepreneurs alike are thinking about when they start talking about "scalability" is eliminating this extra risky phase of programming.  Investors (and entrepreneurs) don't mind investing some money in a "scaling solution", but they don't want to do it when they are in the hockey-stick part of the growth curve, making first impressions with their largest number of customers, and having system failures.  So we're all talking about what hot new piece of technology will solve this problem.

At a coarse granularity, this is a useful framing of the issue.  Technology investment and third-party tools really can help with scaling.  Google and Amazon obviously know what they're doing when it comes to world-spanning scale, and if they're building tools for developers, those tools are going to help.

As you start breaking it down into details, though, problems emerge.  Front and center is the problem that scalability is actually a property of a system, not an individual layer of that system, infrastructure or no.  Even with the best, sexiest, most automatic scaling layer, you can easily write code that just doesn't scale.  As a soon-to-be purveyor of "scalability solutions" myself, this is a scary thought: it's easy to imagine a horror story where a tiny, but hard to discover error in code written on top of our infrastructure makes it difficult to scale up.

That error need not be in the application code.  The scaling infrastructure itself could have some small issue which causes problems at higher scales.  After all, you can do extensive testing, code review, profiling and load analysis and still miss something that comes up only under extremely high load.

Does Twisted Scale?

Just about any answer to this question that you can imagine is valid, so I'll go through them all and explain what they might mean.

No.

Applications written using Twisted can very easily share lots of state, require local configuration, and do all kinds of things which make them unfriendly to distribution over multiple nodes.  Since there is no 'canonical' Twisted application (in fact, you might say that the usual Twisted application is simply an application unusual enough to be unsuited to a more traditional LAMP-type infrastructure), there's no particular documented model for writing a Twisted application that scales up smoothly.  None of the included services do anything special to provide automatic scaling.  There are no state-management abstractions inside Twisted.  If you talk to a database in a Twisted application, the normal way to do it is to use a normal DB-API connection.

When I discussed Rails above, I said that the reason it doesn't scale is that it's too easy, by default, to write applications that don't scale.  Therefore we must conclude that Twisted doesn't scale.

Yes.

Twisted is mainly an I/O library, and it uses abstract interfaces to define application code's interface with sockets and timers.  Twisted itself includes several different implementations of different strategies for multiplexing between these timers, including several which are platform-specific (kqueue, iocp), squeezing the maximum scale out of your deployment platform, even if it changes.

I said above that infrastructure is scalable if it lets you increase your scale without changing your code.  It would make sense to say that Twisted scales because it allows you to increase the number of connections that you're handling by changing your reactor without changing your code.

You could also say that Twisted is scalable because it is an I/O library, and communication between different nodes is almost the definition of scale these days.  Not only can you write scalable systems easily using Twisted's facilities, you can use Twisted as a tool to make other systems scale, as part of a bespoke caching daemon or database proxy.  Several Twisted users use it this way.

Maybe.

Being mostly an I/O library, Twisted itself is rarely the component most in need of optimization.  Being mostly an implementation of mechanisms rather than policies, Twisted gives you what you need to achieve scale but doesn't force, or even encourage you, to use it that way.

For the most part, it's not really interesting to talk about whether Twisted scales or not.  The field of possibilities of what you can do with Twisted is too wide open to allow that sort of classification.

What about Divmod? Does Mantissa scale?

Mantissa, lest you have not heard of it already, is the application server that we are developing at Divmod.  Mantissa is based on Twisted, among other components.  However, there's a big difference in what the answer to the "scaling" question means than it means to Twisted.

Twisted is very general and can be used in almost any type of application, from embedded devices to web services to thick clients to system management consoles.  It's almost as general as Python itself — with the notable exception that you can't use Twisted on Google App Engine because they don't allow sockets.  As part of being general, Twisted doesn't dictate much about the structure of your application, except that it use an event loop.  You can manage persistent state however you want, deal with configuration however you want.

Mantissa, on the other hand, is only for one type of application: multi-user, server-side applications, with web interfaces.  You might be able to apply it to something else but you would be fighting it every step of the way.  (Although if you wanted to use Mantissa's components for other types of applications, the more general parts decompose neatly into Nevow and Axiom.)  So the question of "does it scale" is a bit more interesting, since we can talk about a specific type of application rather than a near-infinite expanse of possibilities.  Does Mantissa scale to large numbers of users for these types of "web 2.0" applications?

Unfortunately, the fact that the question is simpler doesn't make the answer that much simpler, so here it is:

Almost...

Mantissa has a few key ingredients that you need to build a system that scales out. The biggest one is a partitioned data-model.  Each user has their own database, where their data is stored.

A very common "web 2.0" scaling plan — perhaps the most common — is to have an increasing number of web servers, all pointed at a single giant database with an increasingly ridiculous configuration — gigabytes of RAM, terabytes of disk, fronted by a bank of caching servers.  This works for a while.  For many sites, it's actually sufficient.  But it has a few problems.

For one thing, it has a single point of failure.  If your database server goes down, your service goes down.  Your database server isn't a lightweight "glue" component, either, so it's not a single point of failure you can quickly recover if it goes down.  Even worse, it means that even in the good scenario, where you can scale to capacity, your downtime is increased.  Each time you upgrade the database, the whole site goes down.  This problem gets compounded because a lot of sites are append-only databases with increasingly large volumes of data to migrate for each upgrade.

Another issue is that it increases load on your administrators, because they are responsible for an increasingly finicky and stressed database server.  This may actually be a good thing — administrators are not programmers, after all, and are therefore a more reliable and easier resource to throw money at.  Unfortunately there are (almost by definition) fewer things that admins can do to improve the system.  Because the admins can't actually solve the root problems that make their lives difficult, it's easier for them to get frustrated and leave for an environment where they won't be so stressed.

The reason websites choose this scaling model is that popular frameworks, or even non-frameworks like "let's just do it in PHP", make it easy to just use a single database, and to write all the application logic to depend on that single database as the point of communication between system components.  So the scaling plan is just working with the code that was written before anybody thought about scaling.

If you write an application with Mantissa today, it's easiest to toss the data into different databases depending on who it is for, so when you get to dealing with the "scaling" part of the problem, you can put those databases onto different computers, and avoid the single point of failure.  Moreover, when you write an application with Mantissa, you get "global" services like signup and login as part of the application server, so your application code can avoid the usual schema pitfalls (the "users" table, for example) which require a site to have a single large database.

There's only one problem with that plan.

... but not quite.

In my humble opinion, Mantissa offers some interesting ideas, but there are a few reasons you won't get scaling "for free" with Mantissa if you use it right now, today.

You may be noticing about now that I didn't mention any way to communicate between those partitioned chunks of data.  This is what I've been spending most of my last few weeks on.  I have been working on an implementation of an "eventually consistent" message-passing API for transferring messages between user databases in a Mantissa server.  You can see the progress of this work on the Divmod tracker, where the ticket is nearing the end of its year-long journey, and already in its (hopefully) final review.

I'm particularly excited about this feature, because it completes the Mantissa programming model to the point where you can really use it.  It's the part of the system that most directly impacts your own code, and thereby allows you to more completely dodge the bullet of modifying a bunch of your application's logic when you want to scale.  There might be some dark corners — for example, a scalable API for interacting with the authentication system — but those should only affect a small portion of a small number of applications.  Unfortunately communication between databases is not the only issue we have remaining.

There's more to the scaling problem than getting the application code to be the right shape.  The infrastructure itself needs to present a container that does the heavy lifting of scalability for the code that it contains.  For example, Mantissa needs a name server and a load balancer that will direct requests to the appropriate server for the given chunk of data.  It also needs a sign-up and account management interface that will make an informed decision about where to locate a new user's data, and be able to transparently migrate users between servers if load patterns change.  Finally there are enhanced features, like replicating read-only data to multiple hosts, for applications (for example, a blogging system) which have heavy concentrations of readers on small portions of data.

Finally there are problems of optimization.  We haven't had much time to optimize Mantissa or Athena, and already on small-scale systems we have seen performance issues, especially given the large number of requests that an Athena page can generate.  We need to make some time to implement the optimizations we know we need, and when we start scaling up our first really big system, I'm sure that we'll discover other areas that need tweaking.

Why Now?

I'm fond of saying that programming is like frisbee, and predictions more specific than "hey, watch this!" are dangerous.  So you might wonder why I'm talking about such a long-running future plan in such detail.  You might be wondering why I would think that you'd be interested in something that isn't finished yet.  Perhaps you think it's odd that I've described the challenges in such detail rather than being more positive about how awesome it is.

While I certainly don't want to publicly commit to a time-frame for any of this work to be finished, I do feel pretty comfortable saying that it's going to happen.  The design for scalability I've discussed here has been a core driving concern for Mantissa since its beginning, and it's something that's increasingly important to our business and our applications.

I'm being especially detailed about Mantissa's incompleteness because I want to make sure that potential users' expectations are set appropriately.  I don't want anyone coming to the Divmod stack after having heard me say vague things about "scalability", believing that they'll get an application that scales to the moon.

I do think that this is an exciting time for other developers to get involved though.  Mantissa is at a point where there are lots of bits of polish that need to be added to make it truly useful.  Starting to investigate it for your application now will give you the opportunity to provide feedback while it's still being formed, before a bunch of final decisions have been made and a lot of application code has been written to rely on them.

More Later...

I've got more to say about scaling, Twisted, and Mantissa, of course.  In particular I'd like to explain why I think Mantissa is an interesting scaling strategy and how it compares to the other ones.  At this rate, though, I'll only write one blog post this year!  I'm sure you hope as much as I do that the next one won't be so long...

Friday, June 27, 2008

So You Want Your Code to Be Asynchonous? A Twisted Interview

Prologue

This blog post was taken from a chat on a Divmod IRC channel couple weeks ago. Let's start with my opening comments to JP about what I hoped we could accomplish in the interview.

[1:47pm] oubiwann:exarkun: developers/users have started to understand Twisted, see the benefits of an async paradigm, and want to start writing their code making the best possible use of twisted's event driven nature
[1:48pm] oubiwann:they know how to write code using deferreds, and they're ready to get writing...
[1:48pm] oubiwann:except they're not
[1:48pm] oubiwann:because they don't know python internals
[1:49pm] oubiwann:they don't know what python can actually be used with deferreds because they don't know what requirements there are for python code that it be non-blocking in the reactor
[1:50pm] oubiwann:so you're going to help us understand the pitfalls
[1:50pm] oubiwann:how to make best guesses
[1:50pm] oubiwann:and where to look to get definitive answers

Change Your Mind


Before we go any further, I want to share a few comments and answer two questions: "Who is this for?" and "What do I need to know for this to mean something to me?" This post is for anyone who wants to write async code with Twisted and the answer to the second question is open-ended.

Let me start with what is often interpretted as afrontery: read the source code. Despite how that may have sounded, it's not another RTFM quip. The Twisted source code was specifically designed to be read (well, the code from the last two years, anyway). It was designed to be read, re-read, absorbed, pondered, and turned into living memes in your brain.

Understanding tricky topics in conceptually dense fields such as mathematics, physics, and advanced programming requires immersion. When we commit to really learning something difficult in programming, when we take the big step and dive in, we are surrounded by code. At a conceptual level, I mean that literally: it is a spacial experience. This is not something that is typically taught... the lucky few do this their on the own; the rest have to slowly build their intuition through experience in order to get comfortable and be productive in code space.

Our school systems tend to train us along very linear lines: there's a right answer, and a wrong answer. Don't rock the boat. Don't make the teacher uncomfortable. Follow the rules, do your homework, and don't ask too many questions. We carry these habits with us into our professional lives, and it can be quite the task to overcome such a mindset.

Experience is multidimensional. Learning is experience, not rules. When you really jump into this stuff, it will surround you. You will have an experience of the code. For me, that is a mental experience akin to looking at something from the perspective of three dimensions verses 2. When I've not dedicated myself to understanding a problem, the domain, or the tools of the domain, everything looks very flat to me. It's hard to muddle through. I feel like I have no depth perception and I get easily frustrated.

When I do take the time, when I make the investment of attention and interest, the problem spaces really do become spaces, ones where my mind has a much greater freedom of movement. It's not smart people who do this kind of thing, it's committed people. Your mind is your world and it's up to you to make it what you want. No one on a mail list or IRC channel can do that for you. They can help you with the rules, provide you with valuable moral support, and guide you along the way. However, a direct experience of the code as a living world of mind comes from taking many brave leaps into the unknown.

Interview in a Blender

Jean-Paul Calderone graciously set aside some time to talk with me about creating synchronous code in Python, particularly, using the Twisted framework. As has been said many times before, simply using Twisted or deferreds doesn't make your code asynchronous. As with any tricky problem, you have to put some time and thought into what you want to accomplish and how you want to accomplish it.

I'm going post bits of our chat in different sections, but hopefully in a way that makes sense. There's some good information here and some nice reminders. More than anything, though, this should serve as an encouragement to dig deeper.

Why Would I Ever Need Async Code?

There are a couple short answers to that:
  • Your application is doing many long-running computations (or runs of a varying/unpredictable length).
  • Your application runs in an unpredictable environment (in particlular, I'm thinking of network communications).
  • Your application needs to handle lots of events
[1:55pm] oubiwann:exarkun: so, what's the first question a developer should ask themselves as they begin writing their Twisted application/library, txFoo?
[1:55pm] dash:"would everyone be better off if I just stopped now?"
[1:55pm] exarkun:oubiwann: I'm not sure I completely understand the target audience yet
[1:56pm] exarkun:my question is kind of like dash's question
[1:56pm] exarkun:why is this person doing this?
[1:57pm] oubiwann:exarkun: the audience is the group of software developers that are new to twisted, have a basic grasp of deferreds, and want their code to be properly async (using Twisted, of course)
[1:57pm] oubiwann:they don't have anything more than a passing familiarity of the reactor
[1:57pm] oubiwann:they don't know python internals

Protocols, Servers, and Clients, Oh My!

If your application can use what's already in Twisted, you're on easy street :-) If not, you may have to write your own protocols.

Let's get back to the chat:

[1:57pm] exarkun:So `foo´ is... a django-based web application?
[1:58pm] exarkun:... a unit conversion library?
[1:58pm] oubiwann:sure, that works
[1:58pm] oubiwann:unit conversion lib
[1:58pm] oubiwann:(which could be used in Django)
[1:58pm] exarkun:at a first guess, I'd say that there's probably no work to do
[1:58pm] exarkun:how could you have a unit conversion library that's not async?
[1:58pm] exarkun:that'd take some work
[1:59pm] oubiwann:let's say that the unit calculations take a really long time to run
[1:59pm] exarkun:Hm. :)
[1:59pm] idnar:you'd probably have to spawn a new process then :P
[2:00pm] exarkun:basically. probably the only other reasonable thing is for twisted-using code to use the unit conversion api with threads.
[2:00pm] exarkun:so then the question to ask "is my code threadsafe?"
[2:00pm] oubiwann:what about a messaging server
[2:00pm] oubiwann:that sends jobs out to different hosts for calcs
[2:01pm] dash:that's not going to be a tiny example
[2:01pm] exarkun:for that, the job is probably to take all the parsing and app logic and make sure it's separate from the i/o
[2:01pm] exarkun:so "am I using the socket/httplib/urllib/ftplib/XXXlib module?"
[2:03pm] exarkun:is another question for the developer to ask himself
[2:06pm] exarkun:they probably need to find the api in twisted that does what they were using a blocking api for, and switch to it
[2:07pm] exarkun:that might mean implementing a protocol, or it might mean using getPage or something
[2:07pm] exarkun:and pushing the async all the way from the bottom up to the top (maybe not in that direction)
[2:08pm] oubiwann:by "bottom" are you referring to protocol/wire-level stuff?
[2:08pm] oubiwann:exarkun: and by "top" their module's API?
[2:09pm] exarkun:yes
[2:10pm] exarkun:oubiwann: the point being, can't have a sync api implemented in terms of an async one (or at least the means by which to do so are probably beyond the scope of this post)

Processes

We didn't really talk about this one. Idnar mentioned spawning processes briefly, but the discussion never really returned there. I imagine that this is fairly well understood and may not merit as much pondering as such things as threads.

Which brings us to...

Threads

Thread safety is the number one concern when trying to provide an asynchronous API for synchronous code. Here are some starters for background information:
Discussing threads consumed the rest of the interview:

[2:12pm] oubiwann:exarkun: so, back to your comment about "is it threadsafe" (if they are doing long-running python calculations)
[2:13pm] oubiwann:what are the problems we face when we don't ask ourselves this question?
[2:13pm] oubiwann:what happens when we try to run non-threadsafe code in the Twisted reactor?
[2:14pm] exarkun:The problem happens when we try to run non-threadsafe code in a thread to keep it from blocking the reactor thread.
[2:16pm] oubiwann:so non-thread safe code run in deferredToThread could...
[2:16pm] oubiwann:have data inconsistencies which cause non-deterministic bugs?
[2:16pm] dash:have the usual effects of running non-threadsafe code
[2:16pm] exarkun:have any problem that using non-thread safe code in a multithreaded way using any other threading api could have
[2:16pm] dash:like that, yeah
[2:17pm] exarkun:inconsistencies, non-determinism, failure only under load (ie, only after you deploy it), etc
[2:18pm] dash:i smell a research paper
[2:18pm] oubiwann:so, next question: how does one determine that python code is thread safe or not?
[2:19pm] glyph:a research *paper*?
[2:19pm] exarkun:heh
[2:19pm] glyph:research *industry* more like
[2:19pm] oubiwann:exarkun: or, if not determine, at least ask the right sorts of questions to get the developer thinking in the right direction
[2:20pm] dash:glyph: Heh heh.
[2:20pm] exarkun:oubiwann: well, is there shared mutable state? if you're calling `f´ in a thread, does it operate on objects not passed to it as arguments?
[2:20pm] exarkun:oubiwann: if not, then it's probably safe - although don't call it twice at the same time with the same arguments
[2:20pm] exarkun:oubiwann: if so, who knows
[2:20pm] dash:with the same mutable arguments, anyway
[2:23pm] oubiwann:exarkun: so, because python and/or the os doesn't do anything to make file operations atomic, I'm assuming that reading and writing file data is not threadsafe?
[2:24pm] exarkun:don't use the same python file object in multiple threads, yes.
[2:24pm] exarkun:but certain filesystem operations are atomic, and you can manipulate the same file from multiple threads (or processes) if you know what you're doing
[2:25pm] oubiwann:what about C extensions in Python? any general rules there?
[2:25pm] oubiwann:other than "if they're threadsafe, you can use them"
[2:25pm] exarkun:that's about all you can say with certainty
[2:26pm] exarkun:for dbapi2 modules, look at the `threadlevel´ attribute. that's about the most general rule you can express.
[2:26pm] exarkun:there's some stuff other than objects that gets shared between threads too that might be worth mentioning
[2:26pm] exarkun:at least to get people to think about non-object state
[2:27pm] oubiwann:such as?
[2:27pm] exarkun:like, process working directory, or uid/gid
[2:30pm] • oubiwann looks at deferToThread...
[2:31pm] • oubiwann looks at reactor.callInThread
[2:33pm] • oubiwann looks at ReactorBase.threadpool
[2:38pm] oubiwann:hrm
[2:38pm] oubiwann:internesting
[2:39pm] oubiwann:never took the time to trace that all the way back to (and then read) the Python threading module
[2:40pm] oubiwann:exarkun: are there any python modules well known for their lack of threadsafety?
[2:42pm] exarkun:oubiwann: I dunno about "well known"
[2:42pm] exarkun:oubiwann: urllib isn't threadsafe
[2:42pm] exarkun:neither is urllib2
[2:43pm] exarkun:apparently random.gauss is not thread-safe?
[2:43pm] exarkun:you generally start with the assumption that any particular api is not thread-safe
[2:44pm] exarkun:and then maybe you can demonstrate to your own satisfaction that it's thread-safe-enough for your purposes
[2:44pm] exarkun:or you can demonstrate that it isn't
[2:45pm] exarkun:grepping the stdlib for 'thread' and 'safe' is interesting
[2:45pm] oubiwann:I wonder if the stuff available in math is threadsafe....
[2:45pm] oubiwann:exarkun: heh, I was just getting ready to dl the source so I could do that :-)
[2:46pm] exarkun:the math module probably is threadsafe
[2:46pm] exarkun:maybe that's another generalization
[2:46pm] exarkun:stdlib C modules are probably threadsafe
[2:49pm] oubiwann:hrm, looks like part of random isn't threadsafe
[2:51pm] oubiwann:random.random() is safe, though
[2:53pm] oubiwann:exarkun: I really appreciate you taking the time to discuss this
[2:53pm] exarkun:np
[2:53pm] oubiwann:and thanks to dash, glyph, and idnar for contributing to the discussion :-)

Summary

Concurrency is hard. If you want to use threads and you want to do it right and you want to avoid pitfalls and have bug-free code, you're going to be doing some head-banging. If you want to use an asynchronous framework like Twisted, you're going to have to bend your mind in a different way.

No matter what school of throught you follow for any given project, the best results will come with full committment and emersion. Don't fear the learnin' -- embrace the pain ;-)


Tuesday, June 10, 2008

Bazaar with Subversion and Combinator

For the past couple days, I've been experimenting with using Bazaar and Combinator more or less simultaneously. As you may know by now, Combinator is a tool that wraps some of Subversion's ugliness (mostly merging), helps manage branches, and sets Python paths for development environments. We use it extensively (almost exclusively) at Divmod.

One of my recent side projects has evolved into useful code more quickly than I had anticipated, so I thought I'd put it up on Launchpad in the Twisted Community Code. This, of course, led to questions about one-time imports, mirroring, and dual bzr/svn management. I eventually opted for the last, using the bzr plugin bzr-svn. Not having a lot of experience with Bazaar, I was at a bit of a loss, at first: there don't seem to be any dummy docs to get us beginners up to speed.

Through some painful, time-consuming trial and error and a couple dead ends, I arrived at a process that works for me, and codified it in a script. The comments in that script seemed generally useful, and given the dearth of docs, I thought I'd turn the comments into a blog post.


The Plugin

Once I figured out the right way to use bzr-svn, it was actually much easier than I thought it would be. Here are the basics: you need to have bzr installed and then you need to install bzr-svn, which is actually a bzr plugin and not a separate tool. When you have bzr-svn installed, you will have additional bzr commands at your disposal which, as you might guess, let you interoperate with an svn repository.


Two Become One

So here's how you get started: create your Subversion branch (we use Combinator) and get your working dir ready to code. You can either add dirs and files now, or do that later; it doesn't matter.

Then, in this working directory, perform a bzr checkout:

bzr co . bzrtest
cd bzrtest

This will create a Bazaar branch from your Subversion (Combinator) branch. 'bzrtest' (or whatever you name it) is your new bzr+svn branch and it is here where you'll be doing all of your work, committing, pushing to Subversion, and (in my case) pushing to Launchpad.

If your Subversion repository has a long history, you probably don't want to perform a 'bzr update' -- that'll just end in tears (it could take days to finish, use up lots of memory, require multiple restarts, and consume disk space by the gigaliter).


Launchpad

For my project, I had already registered a branch on Launchpad via the web interface, so I was ready to push the new Bazaar branch just created with the checkout command above:

bzr push lp:~oubiwann/txevolver/dev --use-existing-dir

I then logged into the web interface again, and set this newly pushed branch as the main development effort for the project. All future pushes (during this development phase) will now be done with the following command:

bzr push lp:txevolver

Future commit-push cycles just look like this:

bzr commit --local -m "My message"
bzr push lp:txevolver
Keep in mind that you can do multiple commits with Bazaar before you push to a server.


The Divmod Repo

Once you've done a local commit (or many local commits), you're ready to start pushing changes to your Subversion repository. This is where you use one of the commands that is provided by the bzr-svn plugin:

bzr svn-push svn+ssh://myRepo

And in my case, that's the following:

bzr svn-push \
svn+ssh://divmod.org/svn/Divmod/branches/genetic-programming-2620/Evolver

If you have done more than one local commit since your last push, you'll see a series of commits made to your svn repo after you issue the 'svn-push' command.


All Together Now

The script I mentioned at the beginning of this post is here. With it, I run a single command which extracts my commit message from the ChangeLog diff, commits locally, pushes to the Divmod svn repo and then pushes to Launchpad. A single command does everything I need, now: maintaining changes in both a bzr repo that can be easily branched by others on Launchpad as well as in my Subversion branch at work.

Once this project is ready to merge to trunk (if, in fact, it's final home is to be the Divmod svn repo), I'll do an svn up in the Combinator-created branch, unbranch, and commit to trunk. Upon the suggestion of JP, I'll probably also clean up the bzr-svn-created svn props, but other than that, overhead seems to be zero.


Wednesday, May 28, 2008

Twisted and Divmod: A Cheater's Setup Guide

I've been helping a few folks out on IRC lately. They've wanted to know how to setup Twisted and Divmod without doing any installs, running directly from SVN. They've been in luck, because that's actually how we develop at Divmod :-)

Here are the Cliff Notes (this stuff is available on the wikis, but it's spread out):

Install the dependencies:
pycrypto 2.0
SQLite 3.2.1
PySQLite 2.0
PyTZ 2005m-1
PIL 1.1.6
Get the Divmod code first (we'll get Twisted next):
mkdir ~/lab
cd ~/lab
svn co http://divmod.org/svn/Divmod/trunk Divmod/trunk
Set the Combinator env vars (if you want to persist this, then you'll need to put it in your .profile or shell .rc file):
eval `python ~/lab/Divmod/trunk/Combinator/environment.py`
Have Combinator start "tracking" Divmod and Twisted, thus managing PYTHONPATH for them (note that chbranch will detect that Twisted has not been checked out and will do so automatically):
chbranch Divmod trunk
chbranch Twisted trunk svn://svn.twistedmatrix.com/svn/Twisted/trunk
Get the new project dirs into the env:
eval `python ~/lab/Divmod/trunk/Combinator/environment.py`
Executing the whbranch command should give you the following:
Divmod: trunk
Twisted: trunk
If you start up a Python interpreter, you'll be able to import from twisted, mantissa, axiom, etc.

Update: the instructions have been edited and shortened, thanks to insight from Glyph.


Tuesday, May 27, 2008

Mantissa: An Alternative to LAMP

I first started drafting this post a few months ago, out of excitement for the work that JP and Glyph have been doing in the Divmod open source stack codebase. I was planning on entering the acronym fray with a title like "*MAP: An Alterantive to LAMP" where *MAP (pronounced "starmap") would be "Any OS, Mantissa, Axiom, and Python." A good friend of mine whose opinion I value said that *MAP was a terrible name, and after chatting about it with Glyph, he commented "Why not keep it really simple? Just say 'Mantissa.'"

And so it is :-)

For those that don't know, Mantissa is the Twisted application server and Axiom is a Twisted-based object database. By virtue of what are called "deferreds," Twisted allows you to write highly concurrent applications. Mantissa -- the Divmod stack (Mantissa entails Python, Twisted, and Axiom because it requires them) -- provides developers a means of scaling their Twisted-based, asynchronous applications. This means that you can go from experiments or prototypes to multi-node production deployments with the same set of tools and code.

As such, this is a direct competitor for LAMP. Here are some questions about that: What is the value of a full stack? Why is an alternative to LAMP good or needed? What is a good alternative?

Stacked Development Value

What does a full stack give us, as developers? From a practical perspective, it:
  • eliminates the overheard involved in setting up a system in preparation for development
  • provides a development toolset
  • provides a context within which design patterns have been established and utilized
In other words, we can do things like pop in a CD, install an OS, have it meet all the software dependencies for our development tasks (since we're talking about LAMP, we mean development for the web), and either know how to build what we need or who to ask that can point us in the right direction. LAMP gives us this and, thanks to OS distributions like Ubuntu, gives it to us cheaply through simple button-pushing.

Do notice, however, that I said nothing about "going live" or "pushing to production"...

An Engineer's Perspective

In a recent conversation, Sean Reifschneider of tummy.com had this to say about LAMP:
"The problem with the LAMP stack is that it's not a solution for the worst case scenarios. It's great for development: you throw it all together and start writing code. It's fairly okay for low-volume production use. But you need to plan for DoS attacks, search engine bot crawls, and spammer email address harvesting. Default LAMP installs fall over under such conditions."
This is a point that bears repeated belaboring: the network is violent and unpredictable. Connectivity can go away at any moment due to causes at pretty much all layers of the OSI model. The best practices for deploying applications in a production environment that keep this in mind are vast and varried. This is the domain of systems experts.

Sean made further comments concerning Google, that App Engine is so great because you write your code and then just throw the whole thing in their grid, and bam! instant scalability, protected by the (hopefully) same mechanisms that protect all of Google's publicly-facing web assets.

LAMP distributions productized and made freely available the otherwise painstaking process of compiling and installing a Linux kernel, Apache, a database, and your preferred programming language. The painstaking process was one that developers engaged in for software development. But what about the ones that systems engineers engage in for production deployments?

Google has addressed this in a "small way": massive in infrastructure support, but small in features. Knowning Google's penchant for incremental and steady service improvements, they've got plans for additional features. But I think everyone can agree that they're not going to try to meet everyone's needs all the time. Regardless, they are moving in the right direction: innovating a new platform.

Something for Both

On just this topic (innovating or finding a new platform), Albert Wenger of Union Square Ventures said the following:
"What then is needed? A platform that is created from the ground up ... What would such a platform look like? It would be hosted and (nearly) infinitely scaleable. It would provide object storage that’s as simple as saying 'here’s an object, store it' ... user authentication, authorization and access control. Flexible processing of pretty URLs. Easy creation and maintenance of page templates. Ability to send emails and process bounces. Handling of RSS feeds (inbound and outbound). Support for mobile access and possibly even voice capabilities."

Anyone that knows the Divmod software will know why this tickled us so: we have an object database (Axiom) with built-in user authentication, we've got object publishing (even with pretty URLs) and templating with Nevow, we've got mail services, feed support, mobile access and SIP. However! This isn't an advertisement; it's an illustration. The platform is part of the network, and in a sense, it is the network. Considerations for rapid application development need to be regarded very highly; I think it's fairly uncontested common knowledge that LAMP has proved this. Just as highly, though, we need to consider the needs of systems and of the engineers that are integrating them.

Google is making parts of its infrastructure available to developers now. With the dual ease of development and deployment, they are innovating engineering for us. They are only one of many, however. We need to be asking ourselves what our applications are, what the network is, what services are, and what our dev teams and engineers need.

Epilogue

This brings me to what I want for my birhtday :-) Hey IBM! Sun! I want access to a Blue Gene (a la Project Kittyhawk) or a Sun Grid. I want to prove the efficacy of LAMP alternatives in the changing internet, of Python's continued pertinence, Twisted's developmental power and Mantissa's deployment capabilities.


Friday, May 16, 2008

We Are Your Twisted Think Tank

I'm pleased to announce that we've got a new Divmod site up! We're still making tweaks, but it's ready for public viewing, and open for business.

This change currently doesn't affect our subscriber services... but it will, very shortly :-) JP's working on that now.

Anyone who knows us, knows that we know Twisted. We really know it. And how could we not, with Twisted superheroes like Glyh and JP? We've been solving very interesting problems for the past couple years, and other companies have availed themselves of this expertise. We're no longer trying to hide from our destiny as "the Twisted company."

We've found that providing specialized consulting services has not detracted from our core competency as software developers, but has rather done quite the reverse: provided a great deal of insight and clarity. The two activities have established a complementary feedback mechanism for growth and invention.

Technorati Tags: , ,

Tuesday, April 8, 2008

The Problem with and Solution to Google's App Engine

I know everyone is all aglow with the new web development offering from Google, but let me do the unpopular thing and put some things into perspective: there are limitations.

In fact, the limitations that exist will prevent me from using App Engine with all of my projects, save one (that one being a very simple web site). First, the limitations that prevent me from using App Engine (from one of their FAQs):

  • Sockets are disabled with Google App Engine
  • The system does not allow you to invoke subprocesses, as a result some os module methods are disabled
  • Threading is not available
This means that I can't write a deferred wrapper for their data layer, I can't use Twisted for such things as XML-RPC or AMP-based communications, and I can't use an async templating system (like Nevow). I'm stuck with CGI and blocking code. And for all but the simplest projects, that's a big "No Thank You" from me.

This doesn't mean that I won't use it -- I will. I have one project that this will be perfect for... but it's for someone else, not me.

However, these limitations are actually good news :-) Here's the silver lining:

As Glyph as alluded to in his recent blog post (and in our tweets), we've recently completed a massive week-long BizDev Divmod sprint in Boston. One of the results of this is based on community feedback we've had over the last year, and which culminated at PyCon 2008 in Chicago with multiple requests for particular services from The Twisted Company. That result is a set of tools, features, and management options folks will be able to use with our software (app server, smart object db, network services, etc.). People really want to start using our stuff in cloud/grid computing environments. They need support for multiple and diverse network services, inter-store communications, massive deployments, etc. Two months before PyCon, we started working on tickets to support this, and we're making excellent progress toward providing the requested features.

We're still unclear as to which parts of this will be open source, as that will be driven by a combination of business and community demand. Regardless, Google's lack of support for this stuff has (for now) left the field wide open for us. And that, folks, is a big "Thank You Google!" :-)

Technorati Tags: , , , , , , , , , , , ,

Friday, March 28, 2008

We're in the Kitchen, Cookin Ur Mealz...

As Glyph just said, our community/development site is "reloaded" with a fresh look and the beginnings of some new structure. This comes as a result of many interdependent activities in the offing here at Divmod, most of which are still in the oven (and we wouldn't want to ruin the surprise now, would we?).

At PyCon, many of you approached us about more than the Twisted stuff we've been working on, and we had some good conversations. We've listened to all of you and have been making Herculean strides to provide a clearer view into what we do and how it can help you. We've got a long way to go regarding site content improvements and enhancements to documentation, both of which are genuinely at the top of our list right now. Yes, we've got a vision; but that is truly nothing without the continued interest and support of curious and creative folks like you, who want to architect extraordinary software.

There's been an extraordinary amount of energetic development, conversation, brainstorming and contribution that's been happening on IRC, at our offices, and in the code base, by employees and community members, at work and at play -- it's terrific to be a part of something so genuine and organic (the rumors of inorganic overlords have been greatly exaggerated).

Come take a look! ... And stay for a while :-)

P.S. If there are any CSS-on-IE Super Freaks in the House, will you please stand up? We're trying to work on some styling oddities that exist on the site and we could use some help!

Update: Paul Hummer has been amazingly helpful and got us up and running on IE when he had spare moments today. Thanks Paul!

Technorati Tags: , , , , , , , ,

Thursday, February 21, 2008

Nevow/Mantissa Sprint at PyCon 2008 in Chicago

After a last minute rush, there will be a significant Divmod representation at PyCon this year, complete with sprinting! We're focusing on a community release for Mantissa with improvements in many areas of the Twisted application server as well as Nevow. Those without experience are just as welcome as seasoned hackers, since part of our release goal is to make Mantissa as easy to understand as possible :-)

Sign up to join us here:

We've added a bunch of useful links that you can use to prepare for the sprint (which starts after the PyCon talks, on Monday). If you have any questions, please come by #divmod on irc.freenode.com to get clarifications. You can also participate in the sprint remotely via #divmod, if you're not making it to PyCon this year.

Check out the complete list of sprints here:
Special thanks to Facundo Batista for setting things up for us!

Technorati Tags: , , ,