NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Yahoo! Sherpa: Status and Advances

Information about Yahoo! PNUTS/Sherpa is so rare. Except the original PNUTS architecture paper (PDF) and the Sherpa: Cloud Computing of the Third Kind slides (PDF), it’s difficult to find something else. But in September, the Yahoo! Developer blog posted an update about Sherpa.

Sherpa Status

  • 500+ Sherpa tables
  • 50,000+ Sherpa tablets(shards) in operation
  • tablets can be copied, moved, or split dynamically
  • multi-data center
  • multi-tenant: it must support different ranges of read/write ratios
  • 75000 requests per second
  • supports heterogeneous servers
    • some are SSD exclusively
  • Sherpa users have access to monitoring tools to examine their app latency and throughput SLA. This implies each application could negotiate different SLAs with the infrastructure team.
  • Sherpa default storage engine is MySQL/InnoDB
    • storage access have been abstracted
    • Yahoo! tested BDB, BDB-Java, and Log-Structured Merge (LSM) Tree developed by Yahoo Labs backends
  • Sherpa relies on a reliable messaging system
    • can guarantee reliable in-colo and cross-colo transactions

Sherpa Advances

Selective Record Replication

  • previous versions supported Table-level replication
  • new version support Record-level replication
  • designed for efficiency (minimize costs of transfer and storage) and legality (ensure copyright regulations, other legal concerns)
  • replication locations are declarative/static
  • Sherpa maintains a ‘stub’ of the record in locations that do not have a full copy
    • the stub is updated only when a record is created, deleted, or when replication rules change
    • the stub is used for routing requests


  • Support for full table backups has been added
  • Point-in-time recovery planned
  • Also planned full, cross-colo, and automatic table restoration

Task Manager using Sherpa Tables for Task State

  • Sherpa has added a general workflow manager to execute long-running tasks
  • It is used for backup and restore operations

The complete post can be read here.

Considering Yahoo! has always been a big proponent of open source projects, it is a pitty that we don’t have the chance to hear more often and more details about PNUTS/Sherpa.

Original title and link: Yahoo! Sherpa: Status and Advances (NoSQL database©myNoSQL)