Foursquare’s move from querying the production databases to a data analytics system using Hadoop and Hive with Redis playing the role of a cache:
- Provide an easy-to-use end-point to run data exploration queries (using SQL and simple web-forms).
- Cache the results of queries (in a database) to power reports, so that the data is available to everyone, whenever it is needed.
- Allow our hadoop cluster to be totally dynamic without having to move data around (we shut it down at night and on weekends).
- Add new data in a simple way (just put it in Amazon S3!).
- Analyse data from several data sources (mongodb, postgres, log-files).
One of the most often heard complains about NoSQL databases is about their reduced querying capabilities. Running reports and analysis against the production servers is only going to work when you have little data and the set of queries is limitted and stable over time. Otherwise you’ll want to run these against a copy of your data to avoid bringing down production databases and avoid corrupting data.
Original title and link: Hadoop, Hive and Redis for Foursquare Analytics (NoSQL databases © myNoSQL)