Pinterest: All content tagged as Pinterest in NoSQL databases and polyglot persistence

Project Secor: Long-term S3 storage for Kafka logs

A new project open sourced by Pinterest, Secor:

Project Secor was born from the need to persist messages logged to Kafka to S3 for long-term storage. Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.

Original title and link: Project Secor: Long-term S3 storage for Kafka logs (NoSQL database©myNoSQL)


Polyglot persistence at Pinterest: Redis, Membase, MySQL

Pinterest architecture

I’ve created the diagram above based on this very brief answer on Quora:

We use python + heavily-modified Django at the application layer.  Tornado and (very selectively) node.js as web-servers.  Memcached and membase / redis for object- and logical-caching, respectively.  RabbitMQ as a message queue.  Nginx, HAproxy and Varnish for static-delivery and load-balancing.  Persistent data storage using MySQL.  MrJob on EMR for map-reduce.

Data from October 2011 showed Pinterest having over 3 million users generating 400+ million pageviews. There are plently of questions to be answered though:

  1. what is node.js used for? what is RabbitMQ used for?

    Note: the whole section in the diagram about node.js and RabbitMQ is speculative.

  2. is Amazon Elastic MapReduce used for clickstream analysis only (log based analysis) or more than that?

  3. how is data loaded in the Amazon cloud?

    Note: if Amazon Elastic MapReduce is used only for analyzing logs, these are probably uploaded regularly on Amazon S3.

  4. why the need for both Redis and Membase?

Original title and link: Polyglot persistence at Pinterest: Redis, Membase, MySQL (NoSQL database©myNoSQL)