Joy of Python
# first: grep '^......$' /usr/share/dict/words > words
f = open('words')
lines = f.readlines()
lines = [line.strip().lower() for line in lines]
letters = 'ecmlobkidepx'
def hasall(line):
llist = list(letters)
for letter in list(line):
try:
llist.remove(letter)
except:
return False
return True
for line in lines:
if hasall(line):
print(line)
Can you guess what game I’m playing?
SSL for your domain on Google App Engine
As Jon and I prepared to launch Voost, we knew we needed an always-on SSL solution. In the age of Firesheep, it is simply no longer reasonable to provide any kind of authenticated service, no matter how trivial, over unencrypted channels. Even though our website hands off authentication (Facebook and BrowserID) and purchasing (WePay) to third parties, there are two good reasons why our online athletic registration system needs always-on SSL:
- Even if the authentication and purchase processes are securely encrypted, it’s not always clear to the user. This is especially true for purchase flows that run embedded in a “normal” page like WePay and Stripe - the URL bar shows no padlock.
- Every time you think to yourself “who would bother exploiting this loophole to annoy other users?”, the answer is probably sitting in the cafe next to you. One thing I learned at EA is that in any community of appreciable size, there are always a handful of sociopaths that delight in causing trouble for their fellow humans. As unlikely as it is, the chance that someone could steal an auth token and use it to change somebody’s race entry is simply not acceptable - it could ruin the day that an athlete has spent months training for.
Unfortunately, lack of support for SSL has been a consistent complaint of serious Google App Engine developers since the beginning. You can publish your app to the world as https://yourobscureappid.appspot.com/, but not https://www.yourdomain.com/ - and appspot.com in the URL bar looks like amateur hour to all of your users who actually care about security and encryption.
Google has had SSL on the roadmap since I first started using App Engine three years ago. The feature is actually in the “trusted tester” phase - but unfortunately this particular test group is nearly impossible to get into. My repeated enquiries through various channels both official and personal have failed - and due to open source work I’m about as well-connected as anyone gets outside the G-plex. Good luck.
Fortunately, you don’t actually need Google’s help to have SSL on your custom domain.
We discovered CloudFlare because their Platform Lead is one of Jon’s cycling teammates (John Roberts has since become a valuable advisor to us, although we’re still paying list price!). Their business is normally about providing worldwide CDN services and DOS attack protection, but they will proxy SSL for paying customers.
The basic information you should know:
- SSL services are offered as part of the “Pro” plan, not the free plan. It costs $20/mo for the first domain, $5/mo for each additional.
- CloudFlare generates the SSL certificate for you. They actually generate a cert that includes 20+ domains together (check the cert details at https://www.voo.st/ for an example), which makes a fair bit of sense. Otherwise they would need a separate IP address per customer. End-users don’t see this unless they closely examine the certificate.
- With GAE, you use the “Flexible SSL” option instead of the “Full SSL” option. This provides encryption between the browser and CloudFlare, but plain HTTP between CloudFlare and Google. Voost sends passwords and credit card numbers directly from the browser to the respective service providers (encrypted), so this half-SSL approach effectively solves our realistic threat models. If Echelon really wants to know what marathons you’re signed up for, they can just crawl the public registration records.
- In theory, you can use CloudFlare to serve naked domains on GAE. We’re cautious so we didn’t choose to try this, despite the fact that adding “www.” practically doubles the length of our domain name (sigh).
- You use CloudFlare’s DNS servers for your domain rather than pointing individual DNS records at CloudFlare. This is actually a bonus - CloudFlare’s management interface is rather nice, in sharp contrast to other hosted DNS systems I have used (I’m looking at you, Rackspace). It also makes setup quite a lot easier than it is with other CDN systems.
Here’s the HOWTO:
- Sign up for the “Pro” plan at CloudFlare ($20/mo). Add your website domain. You should have a panel that looks like this:
Note the DNS settings, CloudFlare settings, and Page rules menu items on the dropdown - you will need those in a moment.
- Go to your domain registrar and point your domain’s nameserver entries at CloudFlare’s DNS servers. You will manage DNS for your domain entirely at CF.
- In the CloudFlare DNS settings for your domain, add the usual entries for Google App Engine. For example, a CNAME for www -> ghs.google.com.

- In the CloudFlare settings for your domain, set SSL to “Flexible SSL”.

- After a few minutes you should successfully be able to visit https://www.yourdomain.com/. However, you aren’t done.
- You still need to redirect http:// requests to https:// requests. This will require a Page Rule. There’s a special Page Rule for doing exactly this:

- Last but not least, you must do something about the naked domain. Google Apps has a built in redirector for naked domains to www (or whatever), but it does not work with SSL - it just hangs. Instead we will add a CloudFlare PageRule which takes care of naked domain redirection for us - both SSL and non-SSL. As a bonus, it will preserve the remainder of the URL.
Map yourdomain.com/* to https://www.yourdomain.com/$1 like this:
At this point you should be up and running. You can turn on and off CloudFlare per-domain by clicking on the little cloud in DNS settings (see the right side of the picture in #3). Note that it may take a few minutes for DNS entries to propagate when you make changes.
Since CloudFlare IP addresses will show up as the request origin in your GAE logs, CloudFlare adds special headers to each request. Here is an example:
CF-Connecting-IP: 173.190.104.220
CF-IPCountry: US
CF-Visitor: {"scheme":"https"}
X-Forwarded-For: 173.190.104.220
Will we continue to use CloudFlare when Google rolls out SSL for App Engine? Without any visibility into what Google plans to offer, that’s a difficult question to answer. However, it’s hard to imagine switching:
- The SSL prices Google floated as a trial-balloon in a user survey were several times higher than what CloudFlare charges.
- CloudFlare offers compelling features in its own right beyond SSL:
- CF has a global edge cache with documented behavior. GAE has some sort of built-in edge cache, however, its behavior is undocumented and has been experimentally discovered to be rather quirky.
- CF offers a lot of nifty little performance-enhancing tricks like rewriting our pages to eliminate extra whitespace, minifying css, etc. We compile and minify our own JS but it’s nice not to have to worry about the rest.
- CF has analytic tools that are significantly more sophisticated than the GAE dashboard. It’s like Google Analytics-lite but based on actual traffic served rather than cookies and javascript.
- CF’s threat-protection could become incredibly valuable if we ever needed it. GAE’s builtin make-your-own-IP-blacklist is not especially reassuring.
- CloudFlare answers their support email in minutes (as in, less than 5). It’s shocking.
Our experience is still young, but so far all lights are green.
The Facebook Platform Is A Trainwreck, Example #871
There are few APIs more painful to work with than the Facebook API. As the author of a couple deeply integrated Facebook applications and an opensource Java integration library, I suffer with it almost every day. The problem is not that the API is buggy and inconsistent. The problem is that Facebook doesn’t care if the API is buggy and inconsistent.
I could pick from hundreds of examples from the last few years, but here’s why Facebook wasted a couple hours of my valuable time today:
The Javascript SDK lets you subscribe to events ‘auth.login’ and ‘auth.logout’. They are documented like so:
- auth.login - fired when the user logs in
- auth.logout - fired when the user logs out
Pretty simple, right? Of course it isn’t. In fact, auth.logout fires when you log out and when you log in. Depending on how complicated your authentication system is, it may take you a couple hours of debugging to figure this out.
You might consider logging this as a bug in Facebook’s issue tracker. Lo and behold, someone already tried this:
- https://developers.facebook.com/bugs/291382317558560
- https://developers.facebook.com/bugs/148271701940380
According to Facebook, this broken behavior is By Design. And watch Facebook’s typical issue response flow:
- Close as Won’t Fix without comment.
- Ignore developer comments asking for clarification.
- Developer posts a duplicate bug asking for clarification.
- Close as By Design with a cryptic explanation.
- Ignore developer comments asking for clarification.
It’s worth looking at the “explanation”: If you look at the actual source instead of the minified source, the param “c” is really called “both”, so that’s totally correct. This is by design. This makes absolutely no sense to anyone outside of the Facebook bubble: Facebook doesn’t publish non-minified source for their javascript SDK. They used to, but this project appears to be all but abandoned. The production source has diverged significantly from the github source, which has been annotated “We have no plans to update this repository until December 2011.” Well it’s xmas, where’s my f*cking gift?
I wish this was an isolated incident, but nearly every experience I’ve had with the Facebook platform is equally unsatisfying. Bugs are closed or simply ignored. Of the issues which get a response, nearly always the first round is “need a repro case” even though the bug is *clearly* spelled out in the description. Do I need to drive down to Palo Alto and type it in on your keyboard?
Another trick that Facebook’s platform support engineers have learned is that bugs in “need repro” state automatically close after a week. So even when you provide shell scripts that a 12 year-old could copy-paste to demonstrate the issue, all they need to do is wait you out and your little problem will simply go away. Of course, when I tried to search for one of my issues to illustrate this point, here’s what I get:

Sigh. Every little step is a struggle.
Edit: Second bug was a link to the wrong issue. Fixed.
The Unofficial Google App Engine Price Change FAQ
I don’t work for Google, but I read the mailing lists and pay attention. Also, I show up at places where Google buys beer. Here’s what I’ve learned:
What is changing?
Google is changing the way it charges for App Engine. Previously, you were charged for three things:
- Bandwidth in/out
- Data stored in the datastore
- “CPU time” of all your programs
The new billing model charges you for:
- Bandwidth in/out
- Data stored in the datastore
- Wall-clock time spent running application server instances
- Number (and type) of requests to the datastore
In addition, there is a 15-minute charge ($0.02) every time* an instance starts and $9 per application per month (as a minimum spend) if you enable billing.
* It isn’t quite “every time”. See this thread for more.
How are these pricing models different?
For most GAE applications, the biggest charge has always been “CPU time”. This number was composed of two parts:
- CPU time directly consumed by your frontend web application instance
- A crude approximation of CPU time consumed by the datastore and other APIs (“api_cpu_ms”)
In fact, api_cpu_ms has never been a real measure of CPU activity - the numbers are based on simple heuristics like “each index write costs 17ms”. So the change from api_cpu_ms billing to per-request billing for the datastore really isn’t much of a change.
The most significant change to the pricing model is that instead of billing you for CPU time consumed by your frontend web application, you will now be charged for every minute of wall-clock time that each instance runs, irrespective of how much CPU it consumes.
What was wrong with the old pricing model?
The old pricing model charged for the wrong thing. The overwhelming vast majority of web applications use very little CPU; processes spend most of their time blocked waiting for I/O (datastore fetches, url fetches, etc). The App Engine cluster is not limited by CPU power, it is limited by the number of application instances that can be fit into RAM. This is particularly problematic with single-threaded instances like the current Python servers and Java servers without <threadsafe>true</threadsafe>, which require multiple instances to serve concurrent requests.
Charging for CPU time created architectural distortions. Imagine your single-threaded Python application makes a URL fetch that takes 2s to complete (say, Facebook is having a bad day). You would need 200 instances (consuming large quantities of precious RAM) to serve 100 requests per second, but you would pay almost nothing because instances blocked waiting on I/O consume nearly no CPU time. Google’s solution was simply to refuse to autoscale your application if average request latency was greater than one second, essentially taking down your application if third-party APIs slow down.
Because the new instance-hour pricing model more accurately reflects real costs, App Engine can now give you those 200 instances — as long as you’re willing to pay for them.
Is this a move away from usage-based billing?
No. You’re still being charged for resources you consume, but now you’re being charged for the scarce resources that matter (occupied RAM) rather than the overabundant resources that are irrelevant (CPU time).
But my instance only uses 20MB! Why not charge per megabyte-hour?
It’s tempting to imagine that Google can just add instances to a machine until it runs out of RAM, then start adding instances to the next machine. In practice, you can’t architect a system like this. Your frontend instance may use 20MB now but nothing stops it from growing to 100MB without warning. If a couple application instances did this suddenly, it could push the box into swap and effectively halt all instances running on it. An application instance must reserve not just the RAM it actually uses, but the RAM it *could* use. Oversubscribing creates the risk of incurring unpredictable performance problems, and at the huge scale of App Engine, even low-sigma events become inevitable. I suspect that Google is very conservative about oversubscribing RAM reservations, if they do it at all.
Will my bill go up?
Almost certainly, especially if you are using single-threaded instances (Python, or Java without <threadsafe>true</threadsafe>). Google really was charging an absurdly low price for App Engine before, letting us occupy many hundreds of megabytes of RAM for pennies a day. It was nice, but it wasn’t sustainable.
Does this mean App Engine is more expensive than other hosts?
It depends. If you’re looking at just the cost of computing, then yes GAE will be more expensive than services like AWS. On the other hand, you can run large-scale applications without the need to hire a system administrator or DBA - so that frees up a couple hundred thousand dollars per year from the budget.
It’s also really hard to make an apples-to-apples comparison. With the high-replication datastore, GAE provides a redundant, multihomed, fault tolerant system that can transparently survive whole datacenter crashes. It’s already happened. Setting up an equivalent system requires significant engineering effort, and you have to pay someone to wear a pager.
As App Engine has matured, it has gone from being a low-end hosting solution to a high-end hosting solution. Compared to a VPS at dreamhost, App Engine is very expensive. Compared to building your own HRD, App Engine is still comically cheap.
What can I do to lower my bill?
Google has created an article for this, but here’s some blunt advice.
There are two aspects to this, driven by the two separate aspects of billing:
- Lower the number of instances your app needs
- Reduce your datastore footprint
Most developers freaking out about their bill on the appengine mailing list are Python users shocked by the number of instances running to serve their application. You may be able to optimize this somewhat by tuning the scheduler but this will at best provide a small improvement. The stark reality is that single-threaded web servers are a huge expensive waste of resources. Processes that occupy big chunks of RAM while they sit around blocked on I/O don’t scale cost-effectively.
The only practical way to significantly lower your instance count is to use multithreading. This allows one instance to serve many concurrent requests, theoretically right up until you max out a CPU. Dozens of threads can block on I/O within a single process without consuming significant additional chunks of precious RAM.
- If you are using Java, put <threadsafe>true</threadsafe> in your appengine-web.xml NOW
- If you are using Python, beg/bribe/extort your way into the Python 2.7 Trusted Tester group
If you have turned on multithreading, it’s unlikely that the scariest line item in your bill will be instance-hours. Instead you will be wondering why you are being charged so much for the datastore. The good news is that there’s nothing new about optimizing datastore usage, this is what you should have been doing all along:
- Cache entities in memcache when you can.
- Remove unnecessary indexes.
- Denormalize where you can. It’s much cheaper to load/save one fat entity than 20 small ones.
Why are developers complaining about the scheduler?
The scheduler is a red herring. It may very well have issues, but no amount of tweaking the scheduler will change the fact that in order to serve multiple concurrent requests with a single-threaded process, you need to run multiple instances. At best, the scheduler can trade off a crappy user experience for a lower bill.
Forget about the scheduler. Turn on multithreading ASAP.
Is there any good news about the price change?
Google says that higher prices will allow them to increase their commitment to App Engine and devote more resources to its development. This is probably true. If you’re building a business (as opposed to a hobby), the new pricing is probably not going to make or break you; salaries and overhead are probably still your biggest concern. However, the addition of significant new features (say, cross-entity-group transactions or spatial indexing) could allow you to improve your product in ways that were too expensive or difficult before. To the extent that more money means more features sooner, paying more might be worth it. Time will tell.
New blog
I intend to retroactively import much of my old LiveJournal and Similarity blogs. But that may take a while.
The Oracle Speaks
Why is this so funny?

The Oracle Speaks, Part 2
Welcome back to 1992!
