memcached: All content tagged as memcached in NoSQL databases and polyglot persistence
Thursday, 23 September 2010
Membase Releases the 4th Beta, Featuring Memcached Buckets
The forth beta release of Membase, the scalable Memcached big brother, featuring a memcached buckets:
You now can create buckets in your Membase Server cluster that behave exactly like memcached, which means you can use Membase Server as a drop-in replacement for your existing memcached setup. In a single cluster you can now share the resources between memcached buckets and membase buckets.
The compatibility with memcached is a clear statement that Membase is after memcached users. But so is Redis!
Original title and link: Membase Releases the 4th Beta, Featuring Memcached Buckets (NoSQL databases © myNoSQL)
Monday, 20 September 2010
Redis Benchmark Ported to Memcached
Salvatore Sanfilippo:
This is a straightforward port of redis-benchmark to memcache protocol.
This way it is possible to test Redis and Memcache with not just an apple to apple comparison, but also using the exactly same mouth… :)
Does it mean that Redis is going after Memcached? I guess Membase is after the same users, so we will have some interesting competition.
Original title and link: Redis Benchmark Ported to Memcached (NoSQL databases © myNoSQL)
Thursday, 22 July 2010
What Are Virtual Buckets?
A fantastic article explaining the rationale behind and usage of virtual buckets for scaling memcached:
- Never service a request on the wrong server.
- Allow scaling up and down at will.
- Servers refuse commands that they should not service, but
- Servers still do not know about each other.
- We can hand data sets from one server another atomically, but
- There are no temporal constraints.
- Consistency is guaranteed.
- Absolutely no network overhead is introduced in the normal case.
As described in what is membase, virtual buckets/vbuckets is exactly the approach used by membase, Riak, and probably every solution involving consistent hashing.
via: http://dustin.github.com/2010/06/29/memcached-vbuckets.html
Tuesday, 20 July 2010
Heroku Encourages Polyglot Persistence
Heroku published an article preaching polyglot persistence through a Database-as-a-Service approach:
Database-as-as-service is one of the coming decade’s most promising business models. […] DaaS also goes hand-in-glove with polyglot persistence. Thanks to database services, you won’t need to learn how to sysadmin/DBA for every datastore you use – you can instead outsource that job to a service provider specializing in each database.
While it definitely sounds exciting to be able to use all these NoSQL databases , we should always keep in mind the cost of complexity even if DaaS will help alleviate some of the complexity of heterogeneous systems.
The article includes also some interesting use cases for a couple of NoSQL databases:
- Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
- Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
- Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
- Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
- If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
- High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Casandra and Riak.
These are good examples, but you can find many more in our coverage of NoSQL uses cases and the per-product case studies: CouchDB case studies or MongoDB case studies, etc.
Heroku Encourages Polyglot Persistence originally posted on the NoSQL blog: myNoSQL
Wednesday, 30 June 2010
Japanese Blogs Post Benchmarks on Membase, Memcached, Tokyo Tyrant and Redis
Two japanese blogs[1]
have published some benchmarks comparing the newly released membase with memcached, Tokyo Tyrant and Redis.
Unfortunately both of them are just new examples of useless benchmarks:
- only 1000 keys
- the benchmark doesn’t vary the size of keys and values
- no concurrency
- no mixed reads/writes
I’d strongly suggest anyone planning to build a solid benchmark to take a look at these NoSQL benchmarks and performance evaluations to learn how to build useful/correct ones[2]
.
- The two benchmarks are published ☞ here and ☞ here. Unfortunately I don’t read Japanese and I’ve used Google Translator (which pretty much didn’t work) (↩)
- Another useful resource about building correct benchmarks is Jan Lehnardt’s ☞ Benchmarks: You are Doing it Wrong (↩)
Wednesday, 23 June 2010
What is Membase?
It is kind of difficult to figure out a complete description of what Membase is as the ratio of signal to noise in today’s announcement is still very low[1]. Anyways, here is what I’ve been able to put together:
- a cache using memcached protocol
- Apache licensed open source version of NorthScale Membase Server[2]
- project homepage is membase.org and (some) code can be found on GitHub
- can persist data
- supports replication (note: source code repository contains a reference to master-slave setup)
- elastic, allowing addition and removal of new nodes and automatic rebalancing
- used by Zynga and NHN[3], which are also listed as project contributors
While details are extremely scarce, this sounds a lot like Gear6 Memcached.
Membase Persistency
According to this paper the execution of a write operation involves the following steps
- The set arrives into the membase listener-receiver.
- Membase immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.
- The data is cached in main memory.
- The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).
- Set acknowledgment return to application.
There is also:
In membase 1.6, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media.
A couple of comments:
- it looks like a write operation is blocking until data is completely replicated
- it is not completely clear if “hot data” is persisted to disk on a write operation or only once it’s becoming “cold”
Membase Replication
Membase uses the notion of virtual buckets or vBucket (currently it supports up to 4096) which contains or owns a subset of the key space (note this is similar to Riak Vnodes[4]). Each vBucket replication can be configured independently, but at any time there is only 1 master node that coordinates reads and writes.
Membase Rebalancing
Membase runs on each node a couple of “processes” that are dealing with data rebalancing (part of a so called: cluster manager). Once it is determined that a master node (the coordinator for all reads and writes for a particular virtual bucket) becomes unavailable, a Rebalance Orchestrator process will coordinate the migration of the virtual buckets (note: both master and replica data of the virtual bucket will be moved).
When machines are scheduled to join or leave the cluster, these are placed in a pending operation set that is used upon the next rebalancing operation. I’m not sure, but I think it is possible to manually trigger a rebalancing op.
- Sources:
(↩)
- NorthScale Unleashes Membase Server (NorthScale blog)
- NothScale, Zynga team up on NoSQL (CNET)
- Open Sourced Membase Joins NoSQL Party (GigaOm)
- NorthScale Releases High-Performance NoSQL Database (marketwire.com)
- NorthScale Membase Server web page (↩)
- While I read that
“Membase is currently serving data for some of the busiest web applications on the planet.”
, I couldn’t find any other users besides Zynga and NHN. (↩) - Riak is using a similar notion: vnode. While the terms are the same you should not confuse Riak buckets for membase buckets though. (↩)
Tuesday, 13 April 2010
Memcached on top of Redis?
I read a couple of posts[1] talking about Gear6 Memcached native query support and Redis integration. Anyways, based on the details I’ve found so far[2], what I understand is:
- Gear6 memcached provides an enhanced API that allows querying the key/value space
- Gear6 memcached is looking to support more data types by using Redis support for types like lists, sets, ordered sets, hashes[3]
- or Gear6 is looking to provide commercial support for Redis
These left me with the question: why would you use memcached on top of Redis?
Possible answers:
- if the integration would preserve the same memcached API (nb I am not sure though this would be possible) then
- such a product might be useful for projects needing both RDBMS and Redis (note: but in the end the project would still need to be aware of both storage APIs)
such a product might be useful for transitioning towards Redis alone
the integration would just add features missing from the current version of Redis (f.e. elastic scaling, sharding, etc.)
Do you see any other reasons for using memcahed on top of Redis?
References
-
[1] Posts:
- ☞ NoSQL player questions big data (nb the title has pretty much nothing to do with the article)
- ☞ Gear6 Enhances Memcached to Include Native Query Support and Redis Integration
-
[2] The only documentation I’ve found about cache query is ☞ here and the only mention to Redis integration found ☞ here talks only about support for Redis: (↩)
Gear6 currently offers commercial support for Memcached. If you are interested in purchasing support for Redis please contact us.
Gear6 will soon contribute a number of enhancements to the Redis community.
- [3] You can read more about Redis data types ☞ here (↩)
Thursday, 25 March 2010
Usecase: Superfeedr uses Redis to replace MySQL+Memcached
We all love “war stories” and the one of Superfeedr using Redis to replace their MySQL + Memcached setup is really interesting. The features that made Redis fit better Superfeedr scenario were mostly additions to the Redis 1.2.0 release
- the Append Only File persistence (find out how to migrate to Append Only File)
- sorted sets
- basic id-based sharding
And I’m pretty sure that Superfeedr will benefit of Redis Virtual Memory once it becomes available considering their deployment on 2GB slices at Slicehost.
via: http://blog.superfeedr.com/datastore/memcache/mysql/performance/redis/redis-at-superfeedr/
Tuesday, 16 February 2010
A Very Specific Benchmark: Files vs MySQL vs Memcached vs Redis vs MongoDB
This sort of very specific benchmarks are valid/interesting if and only if:
- they simulate extremely close the real life scenario that will be supported by the final application
- they are not generalized to compare the overall performance of the NoSQL stores
- the NoSQL store is correctly configured to fulfill the app requirements (f.e. durability)
- it is understood that the driver has an impact on the results
In this case the benchmark measured requests/s for a usecase of session storage for a Tornado-based web app. You can see the results below:
| Reference | MySQL | Memcached | MongoDB | Redis |
|---|---|---|---|---|
| 1626 req/s | 1353 req/s | 1473 req/s | 1582 req/s | 1418 req/s |
Note: The benchmark doesn’t provide enough details about the drivers used.
via: http://milancermak.posterous.com/benchmarking-tornados-sessions-0
Tuesday, 12 January 2010
Simple way to memcache (almost) all database queries
I’d probably argue that’s way too simple (and probably not so useful). What I’d do is:
make sure that I don’t have duplicates in the cache
Supporting this behavior is not so complex: a query cache will just store the keys of the results and use the multi_get for fetching these. In case objects are missing from the cache then you go on an execute the query making sure that this time you are caching each result object and the set of keys.
look into using a key-value store
This would definitely be more fun, not to mention that it might give you extremely good results. Take for example Redis, which is sometimes called memcached on steroids and check this benchmark results to get an idea of the performance.
via: http://bakery.cakephp.org/articles/view/simple-way-to-memcache-almost-all-database-queries
Thursday, 7 January 2010
Redis Virtual Memory
A couple of days before 2009 ended, Salvatore Sanfilippo ( @antirez) has announced his intention to implement virtual memory in Redis. In his message to the Redis user group, he has also mentioned some of the goals or advantages of virtual memory in Redis:
- If the dataset access pattern is not random, but there is a bias towards a subset of keys (let’s call this subset of keys the “hot spot”), with VM Redis can deliver performance similar to the case where you have in memory only the hot spot, using only the memory required to hold the hot spot.
- Your hotspot is much bigger than your RAM, but you are willing to pay a performance penalty because you want to use Redis.
Today, Salvatore has reported that the first phase of implementing virtual memory in Redis was completed and the Redis Twitter-clone app is already running on this new version.
According to the initial plan, the first phase is a blocking implementation VM.
This means that Redis will work as usually, but will have a new layer to access keys that will be able to understand if a key is in memory or swapped out on disk: when Redis tries to access an on-disk key, it will block to load the key from disk to memory (this includes not only I/O, but also CPU time needed to convert the serialized object into the memory representation).
Right now it is not yet decided if this is just an intermediary step before implementing a non blocking VM or it will become part of a release.
While I am neither a concurrency nor a Redis expert, I must confess that my previous experience with a similar solution to Redis single threaded approach was disappointing — I am referring to the Jackrabbit, the Apache JCR implementation where we had to circumvent the serialized single threaded access for read only clients. On the other hand, there are other well known systems (f.e. memcached) which are using the same solution (some will point out that as opposed to Redis, memcached is never touching the disk, while Jackrabbit has a behavior much closer to Redis).
Anyway, we will always have around these Redis benchmarks for sanity checks.
Monday, 28 December 2009
Terrastore: A Consistent, Partitioned and Elastic Document Database
Terrastore is a very young Apache licensed document store solution built on top of the Terracotta (an in-memory clustering technology) that released its 0.2 version a couple of days ago.
I had the opportunity to chat with Sergio Bossa (@sbtourist) and have him answer a couple of questions about Terrastore.
Alex: What is it that made you create Terrastore in the first place?
Sergio: I wanted a scalable document store with consistency features, because I think that’s an uncovered topic/space in current implementations, which are all geared toward BASE.
Being a document database, Terrastore belongs to the same category as CouchDB, MongoDB, and Riak. In some regards (f.e. partitioning), Terrastore is similar to Riak. You should also check [1] to find out more about Terrastore and the CAP theorem.
Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution. A Terrastore client doesn’t need to know where the data is: it can contact whatever Terrastore node and requests will be routed to the proper node holding the value (note: this is similar to the way Dynamo, Project Voldemort, Cassandra and other distributed stores are working)
At this point, more people have joined the chat and so more interesting questions and answers were coming up.
Alex: Considering Terrastore is built on top of Terracotta, is it an in-memory storage making it somehow similar to Redis?
Sergio: Correct, it stores everything in memory, but it is persistent as well. It is not as fast as Redis mainly due to some overhead related to its distributed features.
Paulo Gaspar: Terrastore looks very much like a persistent, transactional Memcached service.
Sergio: Persistent, transactional, and partitioned/sharded. An interesting difference is that afaik Memcached partitioning is done client side, while Terrastore has builtin support for data partitioning, distribution and access routing.
Terrastore is already HTTP and JSON friendly [2] and the future might bring support for the memcached protocol too.
Please see the following resources to learn more about Terrastore:
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling