Sherpa: All content tagged as Sherpa in NoSQL databases and polyglot persistence
Monday, 12 December 2011
Yahoo! Sherpa: Status and Advances
Information about Yahoo! PNUTS/Sherpa is so rare. Except the original PNUTS architecture paper (PDF) and the Sherpa: Cloud Computing of the Third Kind slides (PDF), it’s difficult to find something else. But in September, the Yahoo! Developer blog posted an update about Sherpa.
Sherpa Status
- 500+ Sherpa tables
- 50,000+ Sherpa tablets(shards) in operation
- tablets can be copied, moved, or split dynamically
- multi-data center
- multi-tenant: it must support different ranges of read/write ratios
- 75000 requests per second
- supports heterogeneous servers
- some are SSD exclusively
- Sherpa users have access to monitoring tools to examine their app latency and throughput SLA. This implies each application could negotiate different SLAs with the infrastructure team.
- Sherpa default storage engine is MySQL/InnoDB
- storage access have been abstracted
- Yahoo! tested BDB, BDB-Java, and Log-Structured Merge (LSM) Tree developed by Yahoo Labs backends
- Sherpa relies on a reliable messaging system
- can guarantee reliable in-colo and cross-colo transactions
Sherpa Advances
Selective Record Replication
- previous versions supported Table-level replication
- new version support Record-level replication
- designed for efficiency (minimize costs of transfer and storage) and legality (ensure copyright regulations, other legal concerns)
- replication locations are declarative/static
- Sherpa maintains a ‘stub’ of the record in locations that do not have a full copy
- the stub is updated only when a record is created, deleted, or when replication rules change
- the stub is used for routing requests
Backup/Restore
- Support for full table backups has been added
- Point-in-time recovery planned
- Also planned full, cross-colo, and automatic table restoration
Task Manager using Sherpa Tables for Task State
- Sherpa has added a general workflow manager to execute long-running tasks
- It is used for backup and restore operations
The complete post can be read here.
Considering Yahoo! has always been a big proponent of open source projects, it is a pitty that we don’t have the chance to hear more often and more details about PNUTS/Sherpa.
Original title and link: Yahoo! Sherpa: Status and Advances (©myNoSQL)
Friday, 23 September 2011
Memcached and Sherpa for Yahoo! News Activity Data Service
Mixer, the recently announced Yahoo’s new data service for news activities, uses Memcached and Sherpa for its data backend. Plus a combination of asynchronous libraries and task execution tools:

The data processing model and the clear separation between read and write data solutions is not only compelling, but essential for maintaining the SLA (max. 250ms/response):
Memcache maintains two types of materialized views: 1) Consumer-pivoted, and 2) Producer-pivoted. Consumer-pivoted views (e.g. user’s friends’ latest read activity) are refreshed at query time by refresh tasks. Producer-pivoted views (e.g. user’s latest read activity) are refreshed at update time (i.e. when “read” event is posted). And producer-pivoted views are used to refresh consumer-pivoted views.
Sherpa is Yahoo!’s cloud-based NoSql data store that provides low-latency reads and writes of key-value records and short range scans. Efficient range scans are particular important for the Mixer use cases. The “read” event is stored in the Updates table. The Updates table is a Sherpa Distributed Ordered Table that is ordered by “user,timestamp desc”. This provides efficient scans through a user’s latest read activity. A reference to the “read” record is stored in the UpdatesIndex table to support efficient point lookups. UpdatesIndex is a Sherpa Distributed Hash Table
Original title and link: Memcached and Sherpa for Yahoo! News Activity Data Service (©myNoSQL)
Thursday, 17 June 2010
Cassandra, HBase, and PNUTS Compared
From ☞ Amandeep Khurana a nice matrix comparing characteristics of Cassandra, HBase and PNUTS:
Recently, in a course that I’m taking at UC Santa Cruz, I got a chance to present the PNUTS paper and compare the system with Bigtable and Dynamo.
The Cassandra, HBase, and PNUTS matrix includes details about:
- data model
- consistency model
- ACID semantics
- storage
- replication
- fault tolerance
- scalability
- aplicability
These systems are also part of the ☞ Yahoo! Cloud Serving Benchmark, a benchmark that has been open sourced recently and is available on ☞ GitHub
You should also check Cassandra and HBase Compared