Measuring Vocabulary Richness (or, Trying Out Django on Heroku)
October 1st, 2011 | Published in Code, Python | 3 Comments
When Heroku announced Python support this past week, I was interested in seeing how the deployment process worked compared to how Heroku handles Ruby apps. Then a post highlighted by the Python Weekly newsletter caught my eye.
Swizec Teller’s entry, “Measuring vocabulary richness with Python“, described an algorithm by George Udny Yule in a 1944 paper entitled “The statistical study of literary vocabulary.” Yule created a way to quantify the diversity of vocabulary in a given text, and Teller translated that formula into straightforward Python code.
So I made a simple Django app that accepts text via a form and uses Teller’s code to calculate Yule’s I score of vocabulary richness. It uses the really useful Natural Language Toolkit; the only oddity is that when developing locally on a Mac, the standard installation of NLTK via pip is borked, so you need to specify a file to download in your requirements.txt. You can find the demo app here.
I’m not offering a judgment on using Heroku or other instant deployment-type services; most of them seem pretty easy to use but out of my price range for anything significant. But it’s nice to know that services like Heroku, ep.io and others offer enough flexibility to do stuff like natural language parsing.
October 2nd, 2011 at 11:24 am (#)
Was just about to do a similar experiment myself, probably using this little demo app. I’ve been enjoying ep.io, but I’m curious how the setup time, cost and performance of Heroku compares.
One thing these types of services could end up being good for is building specific services–like language analysis or text extraction–that require time to process and which can be abstracted out of other apps and used via web service. Something to think on, anyway.
October 3rd, 2011 at 10:07 am (#)
So what’s the next step? Having it slurp up people’s Twitter accounts and give them a score?
November 27th, 2011 at 1:19 am (#)
I am curious, what does such a thing cost? I never really thought about it until I read this page. Is it that expensive?
I would hope that as CPUs performance improves that the cost and performance will improve as well.
Interesting reading.
Christopher