« How to Stay in Good Shape - Get Poor or Get Cheap | Main | With Miles to go Before I Sleep »

August 26, 2006

Playing with the AOL Data

Like every other geek out there, I've been playing around with the recently released AOL search data. But unlike many other people, I'm not searching for particularly onerous individuals or amusing search patterns. Rather, I've been creating toy databases with the data and running statistical reports on it, comparing load times for different table arrangements, and generally being a huge nerd. It's great!

I spent this morning figuring out how to use shell scripting to extract, transform, and reload the data in an automated fashion. Even though MySQL supports fullText indexes and searches, it doesn't support inline transformations of text in an efficient manner (I'm assuming you could perform a search for a rough set of results, then do a sub-query on every line that's returned to further refine the set... except that such a process would be glacially slow when working with result sets that include millions of lines). However, Linux shell tools are incredibly adept at working with strings. So hey, why not tie the two sets of tools together to create the outcome that I want? Or even better, use the command line tools to feedback into the db and create a new table with today's results? Which is what I ended up doing. Now they're immediately available for searches in the future, or for mashing up into new reports. So much fun, hehe...

Posted by ashusta at August 26, 2006 04:19 PM

Comments

Post a comment




Remember Me?