Twitter’s real-time URL fetcher, code named SpiderDuck, is an excellent example of how NoSQL databases fit in the architecture of today’s systems:
Metadata Store: This is a Cassandra-based distributed hash table that stores page metadata and resolution information keyed by URL, as well as fetch status for every URL recently encountered by the system. This store serves clients across Twitter that need real-time access to URL metadata.
SpiderDuck is also using memcached:
Memcached: This is a distributed cache used by the fetchers to temporarily store robots.txt files.
Original title and link: Twitter’s Real-Time URL Fetcher Using Cassandra and Memcached ( ©myNoSQL)