Within a few hours, using Sinatra and the MongoDB Ruby driver, I had a little prototype working. Each hit was a single MongoDB operation, an upsert based on the host, with year, month, day, and hour information stored in nested hashes. The nested hashes were updated in the operation using $inc. It did not do much, but it was pretty cool.
MongoDB being fast, fun, easy, web scale and having a “fire and forget” behavior convinces everyone to build new site analytics tools using it.
Ignoring for a second insignificant aspects like business models and feature sets, I do have a technical question: how can you scale such a tool? Sharding by site or referrer or user will not work. Sharding by id or timestamp would spread your data all over and make aggregate functions painful. So what’s the plan?
Original title and link: MongoDB and Site Analytics (NoSQL databases © myNoSQL)