ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Twitter's Real-Time URL Fetcher Using Cassandra and Memcached

Twitter’s real-time URL fetcher, code named SpiderDuck, is an excellent example of how NoSQL databases fit in the architecture of today’s systems:

Metadata Store: This is a Cassandra-based distributed hash table that stores page metadata and resolution information keyed by URL, as well as fetch status for every URL recently encountered by the system. This store serves clients across Twitter that need real-time access to URL metadata.

SpiderDuck is also using memcached:

Memcached: This is a distributed cache used by the fetchers to temporarily store robots.txt files.

SpiderDuck Architecture Cassandra Memcached

Original title and link: Twitter’s Real-Time URL Fetcher Using Cassandra and Memcached (NoSQL database©myNoSQL)

via: http://engineering.twitter.com/2011/11/spiderduck-twitters-real-time-url.html