PlaidCloud Beta
Tartan Solutions is pleased to announce the limited beta of PlaidCloud, our new application cloud.
After a great deal of development effort in PHP we have basically hit the point that it becomes more effort to make PHP do what we want compared to finding another language with the capabilities we need. Akorn has always been envisioned from the beginning as a platform for building robust applications supporting heavy computational and data requirements. PHP has been absolutely fantastic for all of the front end and much of the processing. However, when it comes to long running processes or heavy computational applications PHP just is not suited for that. Yes, we could force it to work but after some long cost/benefit analysis we decided that we needed another engine to power the backend of Akorn. We looked at all the usual suspects including C++, Ruby, Java, Perl, Python, and others. We decided Python provides us with the combination of attributes we need. This is not to knock any of the other languages in any way. Python just provides us with the foundation necessary to realize the full Akorn vision with the least amount of overhead.
In our evaluation process we scoured the web for information and even called in some life-lines from those who have used various languages. I think this exchange with a long time user of Python, Perl, and PHP is illustrative, at least for us, why we are making the move to Python.
On to details:
Perl seems like a strong contender but Python seems more in line with building out applications rather than a set of scripts. So we have landed on Python as our choice thus far.
Perl v. python: definitely a well-hashed debate, but I'll just say this: I developed scientific tools and software in perl largely for use by myself and the members of my lab for five years. One day, mostly on a lark, I spent about 6 hours with an online intro called dive into python (I believe it's just diveintopython.com <http://diveintopython.org/> ). That month, I reimplemented the majority of what was good about the previous 5 years' development, and extended my core codebase in several directions I had just never been able to get perl to “go”. The keys are modularity and readability; a few simple language builtins for commenting, and the interactive interpreter. It wasn't until I started grinding on things that I discovered my python code also executed somewhere between 10 and 1000x faster than the equivalent perl (I'll get into why in the “arrays” section below
So that is the background. What I'm most concerned with is development speed and run-time speed. PHP has been super easy to use and we have made a lot of progress with it. It is also fast and runs on virtually all platforms. I've heard Perl and Python are supposed to be equivalent or better. Here are some quick questions:
When I was at pitt, I worked primarily in PHP, essentially as a way to get data into and out of SQL databases (postgre and mysql). Again, everything I did in two years could be replicated in literally a few dozen lines of python–primarily this is because the portability and readability mentioned above mean that tools developed out in the user community (of which there are many) are robust and generally very well documented, and the core library includes fantastic “batteries included” database interfaces (more below). Again, it's only icing that execution speeds over PHP are again orders of magnitude!
Are you using Python for serving http data? If so, do you find it fast and reliable? Are you running through Apache?
I'm actually not doing anything webby at all anymore, but a guy in my lab has been pushing out things left and right on Django; he uses the django builtin server for dev, then goes to apache for production; he's almost as evangelical about the python as I am, so I'm guessing he's happy with what he's getting. Execution speed is definitely better than ruby, but I think the general vibe is still that RoR is one class act, and that django is just playing behind their second fiddle to the integration of the VS .Net overall development platform, but as I say, I haven't played with any webbishness lately. I'll put the question about overall satisfaction to him, and see if he has any specifics to offer.
How is python in handling multiple database connections to different databases?
A-mazing. Single class handles all database connections as instances; can be pushed to separate threads or even processes (actually, can even be handled across multiple physical machines from the same parent process using ipython, pp or even straight-up MPI bindings). connections can be to different underlying databases and are treated transparently, for instance, I have a piece of code that creates lists of database connection objects, and maintains directionally-propagated synchrony–the spiffy thing is that the participant connections are a mix of sqlite, mysql and ms sql server, and my code need never know about any of that. The DB interface is also very extensible, so you're encouraged to roll DB interface objects inheriting from the base library class that implement specific types of queries in more-or-less skeletal form, and which handle when exactly you execute and commit different queries differently, and also make coding easier (for instance, a DB interface subclass for dealing with a log DB might know about peak new entry creation periods, and hold off on self-refreshing rate, in order to relieve pressure on the DB, or else increase refresh rate to reflect more accurate data, and it could do so dynamically, by computing peak use periods using the underlying DB dynamically, but you'd never interact with any of this–you just supply say a start and end for a pull, and it handles the rest)
How do you like the string manipulation/ regex functionality?
regex are the best I've ever used. You can actually make python regex readable (whitespace and comments within regex!) Also, capture term naming allows you derive dictionaries from capture events instead of $1, $2, $3…
Sadly, however, python's string methods are so good, I actually find myself using regex less and less
The one thing I will say is that python string objects have a _lot_ of methods, so it's worth calling dir(string) from the interactive interpreter to get a full list; it's easy to discover that you've been working around something for weeks that's actually implemented in the base string object itself!
How do you like the array (dictionary, map) functions?
Starting to get old, I'm sure, but container operations in python are outrageously great. python has a concept called list comprehension that allows you to iterate through any iterable (list, dictionary, set, array, whatever), perform a set of arbitrarily complex manipulations, and return the results as a new list, all with a single statement, in a single line. In essence, think free-form map with conditionals. Also, in addition to the native list object which is dynamic, garbage collecting, etc., there is a second list-type object called an array which is fixed-size, strongly typed, and serially stored in memory. This object is essentially a C array in python, with attendant builtin functionality for serial and list operations (like adding 5 to every item in the list, or taking the difference between each i and i+1, or whatever). This means it's exactly as big in memory as you tell it to be, so you can work with enormous amounts of data with no fear of swapping, and it means that its computing “on the metal”, with C execution times (it's trivially easy to pick up 1000x performance improvements, and better than that in very for loop rich code)
How do you like the file manipulation functions (read, write, get size/date)?
File functions are quick, and full featured. I haven't yet needed to make a call outside of the python base library to get info from or about a file. Python also has builtin support for gz and tar (and bz, and zip), which means you can call python's open() method on a tarball or compressed file without needing to decompress (or even know it was compressed in the first place). In the case of a tarball, you get a directory handle object that's indistinguishable from a directory inode on the filesystem (actually, sometimes very handy, since the tarball will be cached in ram, you can get some serious performance boosts from tarballing directories with tons of little files you hit a lot) Have you used any JSON encoding or decoding? If so, how is the performance?
Python has a ludicrously straightforward json handler that basically just serializes and deserializes json directly to and from python dictionaries. As far as specific JS - python interaction, it seems like the dominant model is apache-django for the server-side work, then jquery to construct local app functionality, json as the interchange and storage medium, and a python framework called twisted to establish persistent, secure connections between clients and server to handle unpredictable or asynchronous exchange. It actually looks reasonably mature (more so than the last time I read about this sort of thing in late 2007). I may try to make up a little side project in the next couple of weeks to play with some of this, and if I do I'll let you know what things look line on the ground.
What IDE do you use to edit code?
I use textedit on mac, and emacs on *nix. There are several IDEs for python, but the ipython interactive interpreter in the enthought python distribution is such a better way to debug and explore code that I just use a programming text editor for the typing part :)
What are some hidden gems about Python that you like?
numpy arrays and list comprehension, the docstring (http://epydoc.sourceforge.net/docstrings.html), the interactive interpreter for piloting tricky code like regexes, the fact that every builtin object can be inherited from with trivial ease (for instance, list could be subclassed to contain a subclassed dictionary that knew how to populate itself from a certain database, and that wrote changes in the instance object to the DB with specified frequency, or that knew how to construct itself from flatfiles by regex, etc.). I've also developed a real love for python set objects (like lists, but elements are unique, and with subset/superset/union/diff and so on)
What are some gotchas about Python that you wish you could change?
lists and dictionaries are not autovivifying, so if you refer to a key or index that hasn't been written yet (i.e. d[x] += 1 when there is no key x in d), you get an exception. This means you do a lot of assembling sets of keys, and deriving lists from all keys with empty values, then populating, or else checking each key to make sure it exists (technically you do this with try: except: blocks, so the execution time is actually better than autovivification, which is why they do it, but it's a pain to code every time).
Also, to my knowledge, there's no way to load a module from an arbitrary path, it either has to be in the same directory (or in a child dir thereof) as the running program, or it has to be in a path in the PYTHONPATH env variable. This is a bit irritating, sometimes.
The interpreter doesn't give RAM back to the OS readily–if you clear up a huge object, don't expect to see that snap back to free memory until systemwide free overhead starts to get tight.
OH! my big one: I write a lot of linux command line utilities, and the command line options parser (the argv object) is definitely not as mature as getopt::long from perl. I spent like three days re-implementing what I like from getopt::long, and I've seen similar projects from others, but this sort of thing is fairly subject to personal taste (how you like it to handle option defaults, whether you like single and double-dash flags, etc), so I ended up not liking any of what I could find out there. Knowing what you know now about Python would you choose to use it again or some other language?
Again, and again, and again :)
Is it possible to deploy the .pyc files without the .py files? If so, is there a way to make the .pyc files difficult to reverse engineer?
Yeah, byte-compiled python is binary, so it's already pretty hard to crack open, but in addition to this, there are a number of encryption tools that make the .pyc files even harder to dig through. (and of course, the bytecode .pyc is all you need to run; no need to distribute the .py unless you want to)