Tokyo cabinet: All content tagged as Tokyo cabinet in NoSQL databases and polyglot persistence
- Dynamo (key-value)
- Voldemort (key-value)
- Tokyo Cabinet (key-value)
- KAI (key-value)
- Cassandra (column-oriented/tabular)
- CouchDB (document-oriented)
- SimpleDB (document-oriented)
- Riak (document-oriented)
A couple of clarifications to the list above:
- Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
- Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
- CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.
Original title and link: Which NoSQL Databases Are Robust to Net-Splits? ( ©myNoSQL)
In a post yesterday about NoSQL comparisons, I was asking when was the last Tokyo Cabinet release. It looks like from that family of products, the ones going forward are Kyoto Cabinet and Kyoto Tycoon as Mikio Hirabayashi ☞ has announced on Twitter the release of Kyoto Cabinet 1.2.5 and Kyoto Tycoon 0.9.9:
released Kyoto Cabinet 1.2.25 and Kyoto Tycoon 0.9.9, which feature asynchronous replication!
Original title and link: New versions of Kyoto Cabinet and Kyoto Tycoon Released (NoSQL databases © myNoSQL)
So, if Tokyo Cabinet got Kyoto Cabinet as a successor, Tokyo Tyrant got Kyoto Tycoon as its successor. But this time it is not only an implementation language port, as Kyoto Tycoon also behaves as a cache system with support for auto expiration (something similar to memcached). Moreover Kyoto Tycoon is offering a RESTful-style interface.
You can read more about Kyoto Tycoon ☞ here.
Update: Brenden Grace has ☞ a post to which Mikio Hirabayashi, Tokyo and Kyoto creator, responded.
Since the release of the Kyoto 1.0 which can be considered the successor of Tokyo Cabinet, I haven’t heard much from the Tokyo Cabinet/Tyrant world (except some political news or some furniture related announcements on craigslist, but these are not really of interest for the NoSQL community)
Some time ago I had a chance to discuss with Florent Solt (@florentsolt), Chief Architect at ☞ Netvibes, about their usage of Tokyo family (Tokyo Cabinet and Tokyo Tyrant). While I don’t have enough details about the Tokyo market, I’d be ready to speculate that Netvibes is probably one of the biggest users of the Tokyo products family.
To give you an quick overview of the Netvibes system here are some interesting points in random order:
- Netvibes uses Tokyo Tyrant, never Tokyo Cabinet directly
- Netvibes architecture is a master-slave architecture (due to weird things in master-master)
- Netvibes is using its own sharding method
- Netvibes maes use of Tokyo Cabinet hash, btree and tables storages
- only feeds related informations are in Tokyo databases (feeds, items, read/unread, …)
- other informations are still in a MySQL database (accounts, tabs, pages, widgets, …)
- to schedule crawling events, a queue has been implemented with a Tokyo Tyrant server and lua
- Netvibes is using a custom transparent proxy (ruby + eventmachine) to move/migrate data between servers
And now the Q&A part:
nosql: It sounds like initially all data lived inside MySQL. What made you look to alternative storage solutions?
Florent: Exactly. We started looking at an alternative when we reached MySQL limits. It was mostly disk space fragmentation issues (with blobs) and raw speed for insert.
nosql: How did you choose Tokyo Cabinet and Tokyo Tyrant?
Florent: We did some research, but 1.5 years ago, there were less solutions than now.
So we did some benchmarks, based on our own data (very important) and our architecture. We tried : Hadoop, CouchDB, Tokyo Tyrant, File system only (it was only to have a raw comparison with IMHO one of the most simple way to store data) and MySQL.
In terms of budget, responsiveness and knowledge gap, Tokyo was the winner.
nosql: What data has been moved to Tokyo?
Florent: We are using Tokyo for our feeds backend. Everything related to feeds such as feed items, enclosures, read/unread flags are stored in Tokyo. Same goes for the data structures we need to crawl all these feeds, such as a queue.
nosql: What criteria have you used to make this separation?
Florent: The separation was not clearly related to Tokyo, it was product decision. We wanted to implement this feed backend as a standalone module. We only interact with it trough an API.
nosql: How have you migrated existing data?
Florent: Indeed, initially feeds data were in MySQL tables.
The migration was simple, in terms of logic, but long and difficult to achieve. The main point was when an unknown data was requested from the new backend, a fallback query asked MySQL for the data, and finally saved everything in Tokyo. It sounds easy, but in reality there were many specific cases and strange issues.
nosql: You are using Tokyo hash, btree and tables. Would you mind giving some examples for what kind of data lives in each of them and how have you decided that is the best option?
Florent: When you really understand each structures it’s pretty easy to pick the best choice. For example:
- When we need only raw speed, we use a hash.
- When we need complex key strategies (based on prefix), we use btree.
- When we need conditional queries, we use tables.
For example, feeds (url, title, author, …) are stored in a Table. Same goes for the feed items and enclosures.
The queue is a Hash, to keep the focus on the speed. The first implementation was based on a BTree, but we improved our algorithms to have guessable keys only and prevent key scanning. There are also some lua functions linked to hide implementation and keep the whole thing fast too.
Flags (where we store read/unread data) are stored in a BTree with a lua extension because we are scanning keys a lot.
nosql: Can you speak a bit more about the in-house sharding solution you are using?
Florent: Sure. Tokyo does not come with sharding or dynamic partitioning implementation, so we built our own solution. It’s feed or user centric. For example, we know that the feed table will always fit on one dedicated server, whatever the number of feeds. So, for each feed we store where (the id of the shard server) its items are.
For the flags, same logic, for a given user we know where his flags are. It makes it easy to add new shards, because it’s a line in a configuration file. And we have created all the scripts we need to move data from one shard to another (migration, auto-balance, …)
nosql: What lessons have you learned that you’d have liked to know before using Tokyo?
Florent: Very difficult to say as we have learned so much with this project.
Maybe the most important point would be to know how Tokyo Tyrant servers would manage the load and what are the best practices to prevent common speed issue, that was what we learned the hard-way.
nosql: Any numbers about Netvibes Tokyo deployment you can share with us?
Florent: About numbers, you already know that it’s a sensitive information :-). I can’t say more than those numbers in my slides.
nosql: Fair enough. Thank you so much Florent!
Unfortunately both of them are just new examples of useless benchmarks:
- only 1000 keys
- the benchmark doesn’t vary the size of keys and values
- no concurrency
- no mixed reads/writes
Kyoto Cabinet, the successor of Tokyo Cabinet has reached the first stable release: 1.0. The ☞ announcement is pretty reach in details and provides code samples fot all currently supported bindings (C, C++, Java, Python, Ruby, Perl).
Kyoto architecture looks quite interesting and is depicted below:
Mikio Hirabayashi, lead developer, speaking about Kyoto Cabinet vs Tokyo Cabinet:
Kyoto Cabinet has the following features. Especially, Windows support is remarkable.
- time efficiency: Throughput of updating is more than 100 millions query-per-second.
- space efficiency: Footprint for each record is 8-16 bytes in the hash DB, 2-4 bytes in the tree DB. concurrency: The hash DB uses read-write lock for each record. The tree DB uses read-write lock for each page.
- usability: Generic operations of database by interface like the “Visitor” pattern are provided.
- robustness: Manual transaction, auto transaction, and auto recovery are provided.
- portability: UNIX-like systems (Linux, FreeBSD, Solaris, Mac OS X) and Windows (VC++) are supported. language bindings: C++, C, Java, Python, Ruby, and Perl are supported.
Compared with Tokyo Cabinet, KC is superior in concurrency, usability, and portability. Although time efficiency for single-thread is better in TC, I recommend KC from now on because multi-core/many-core CPU has been popular. However, I will keep on maintaining TC and fix bugs if they are found.
While Kyoto Cabinet sounds really interesting, I cannot stop asking myself if is this the time to move away from Tokyo Cabinet?
Update: Jan Lehnardt was quick to point me to a CouchDB-based URL shortner on ☞ GitHub.
Update 2: Mathias Meyer shared with us ☞ Relink: a solution built on top of Redis with Sinatra
Update 3: Aaron pointed out ☞ little, another solution using Redis and Node.js
I’m pretty sure there are many more such projects so please post a link to the project in the comment section and I’ll update the post.
I’m starting to forget how many Twitter NoSQL-enabled apps I’ve mentioned on the NoSQL blog — fortunately the consistent tagging helps, so you can find them all under the tag Twitter — but every time I’m finding a new one I feel like posting about it.
The author concludes with some Tokyo Cabinet lessons learned:
Lack of auto-expiration when using as mostly a key-value cache is annoying
Would definitely use it again for this type of task