NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Tokyo cabinet: All content tagged as Tokyo cabinet in NoSQL databases and polyglot persistence

Which NoSQL Databases Are Robust to Net-Splits?

Answered on Quora:

  • Dynamo (key-value)
  • Voldemort (key-value)
  • Tokyo Cabinet (key-value)
  • KAI (key-value)
  • Cassandra (column-oriented/tabular)
  • CouchDB (document-oriented)
  • SimpleDB (document-oriented)
  • Riak (document-oriented)

A couple of clarifications to the list above:

  1. Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
  2. Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
  3. CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.

Original title and link: Which NoSQL Databases Are Robust to Net-Splits? (NoSQL database©myNoSQL)

Tokyo Cabinet - Built-In Comparison Functions

Tokyo Cabinet’s 4 comparison functions (tccmplexical, tccmpint32, tccmpint64, tccmpdecimal) analyzed:

For most projects, the standard lexical sort works perfectly- but for those of us using Tokyo Cabinet to power another DBMS, or for use in tandem with other systems that rely on numerical ids, it might be worth it to write our own comparison functions.

As a side note, Tokyo Cabinet is using a B+tree implementation, which is something Redis is looking to add.

Original title and link: Tokyo Cabinet - Built-In Comparison Functions (NoSQL databases © myNoSQL)


New versions of Kyoto Cabinet and Kyoto Tycoon Released

In a post yesterday about NoSQL comparisons, I was asking when was the last Tokyo Cabinet release. It looks like from that family of products, the ones going forward are Kyoto Cabinet and Kyoto Tycoon as Mikio Hirabayashi ☞ has announced on Twitter the release of Kyoto Cabinet 1.2.5 and Kyoto Tycoon 0.9.9:

released Kyoto Cabinet 1.2.25 and Kyoto Tycoon 0.9.9, which feature asynchronous replication!

Frank Denis

Original title and link: New versions of Kyoto Cabinet and Kyoto Tycoon Released (NoSQL databases © myNoSQL)

Another NoSQL Comparison: Evaluation Guide

The requirements were clear:

  • Fast data insertion.
  • Extremely fast random reads on large datasets.
  • Consistent read/write speed across the whole data set.
  • Efficient data storage.
  • Scale well.
  • Easy to maintain.
  • Have a network interface.
  • Stable, of course.

The list of NoSQL databases to be compared: Tokyo Cabinet, BerkleyDB, MemcacheDB, Project Voldemort, Redis, and MongoDB, not so clear.

The methodology to evaluate and the results definitely not clear at all.

NoSQL Comparison Guide / A review of Tokyo Cabinet, Tokyo Tyrant, Berkeley DB, MemcacheDB, Voldemort, Redis, MongoDB

And the conclusion is quite wrong:

Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs.

There were a couple of people asking for more details about my comments on this NoSQL comparison, so here they are:

  1. the initial list of NoSQL databases to be evaluated looks at the first glance a bit random. It includes some not so used solutions (memcachedb), some that are not , while leaving aside others that at least at the high level would correspond to the characteristics of others in the list (Riak, Membase)
  2. another reason for considering the initial choice a bit random is that while scaling is listed as one of the requirements, the only truly scalable in the list would be Project Voldemort. The recently added auto-sharding and replica sets would make MongoDB a candidate too, but a search on the MongoDB group would show that the solution is still young
  3. even if the set of requirements is clear, there’s no indication of what kind of evaluation and how was it performed. Without knowing what and how and it is impossible to consider the results as being relevant.
  4. as Janl was writing about benchmarks, most of the time you are doing it wrong. Creating good, trustworthy, useful, relevant benchmarks is very difficult
  5. .
  6. the matrix lists characteristics that are difficult to measure. And there are no comments on how the thumbs up were given. Examples: what is manageability and how was that measured? Same questions for stability and feature set.
  7. because most of it sounds speculative here are a couple of speculations:
    1. judging by the thumbs up MongoDB received for insertion/random reads for large data set, I can assume that data hasn’t overpassed the available memory. But on the other hand, Redis was dismissed and received less votes due to its “more” in-memory character
    2. Tokyo Cabinet and Redis project activity and community were ranked the same. When was the last release of Tokyo Cabinet?
  8. I’m leaving up to you to decide why the conclusion — “Although MongoDB is the solution for most NoSQL use cases”” is wrong.

Original title and link: Another NoSQL Comparison: Evaluation Guide (NoSQL databases © myNoSQL)


Kyoto Tycoon: Tokyo Tyrant Plus Plus

Back in January I was writing about Kyoto Cabinet the successor of Tokyo Cabinet which reached the 1.0.0 stable version around May. Tokyo Cabinet needed Tokyo Tyrant for distributed environment.

So, if Tokyo Cabinet got Kyoto Cabinet as a successor, Tokyo Tyrant got Kyoto Tycoon as its successor. But this time it is not only an implementation language port, as Kyoto Tycoon also behaves as a cache system with support for auto expiration (something similar to memcached). Moreover Kyoto Tycoon is offering a RESTful-style interface.

You can read more about Kyoto Tycoon ☞ here.

Update: Brenden Grace has ☞ a post to which Mikio Hirabayashi, Tokyo and Kyoto creator, responded.

Original title and link: Kyoto Tycoon: Tokyo Tyrant Plus Plus (NoSQL databases © myNoSQL)

Netvibes: A Large Scale Tokyo Tyrant Deployment Case Study

Since the release of the Kyoto 1.0 which can be considered the successor of Tokyo Cabinet, I haven’t heard much from the Tokyo Cabinet/Tyrant world (except some political news or some furniture related announcements on craigslist, but these are not really of interest for the NoSQL community)

Some time ago I had a chance to discuss with Florent Solt (@florentsolt), Chief Architect at ☞ Netvibes, about their usage of Tokyo family (Tokyo Cabinet and Tokyo Tyrant). While I don’t have enough details about the Tokyo market, I’d be ready to speculate that Netvibes is probably one of the biggest users of the Tokyo products family.

To give you an quick overview of the Netvibes system here are some interesting points in random order:

  • Netvibes uses Tokyo Tyrant, never Tokyo Cabinet directly
  • Netvibes architecture is a master-slave architecture (due to weird things in master-master)
  • Netvibes is using its own sharding method
  • Netvibes maes use of Tokyo Cabinet hash, btree and tables storages
  • only feeds related informations are in Tokyo databases (feeds, items, read/unread, …)
  • other informations are still in a MySQL database (accounts, tabs, pages, widgets, …)
  • to schedule crawling events, a queue has been implemented with a Tokyo Tyrant server and lua
  • Netvibes is using a custom transparent proxy (ruby + eventmachine) to move/migrate data between servers

And now the Q&A part:

nosql: It sounds like initially all data lived inside MySQL. What made you look to alternative storage solutions?

Florent: Exactly. We started looking at an alternative when we reached MySQL limits. It was mostly disk space fragmentation issues (with blobs) and raw speed for insert.

nosql: How did you choose Tokyo Cabinet and Tokyo Tyrant?

Florent: We did some research, but 1.5 years ago, there were less solutions than now.

So we did some benchmarks, based on our own data (very important) and our architecture. We tried : Hadoop, CouchDB, Tokyo Tyrant, File system only (it was only to have a raw comparison with IMHO one of the most simple way to store data) and MySQL.

In terms of budget, responsiveness and knowledge gap, Tokyo was the winner.

nosql: What data has been moved to Tokyo?

Florent: We are using Tokyo for our feeds backend. Everything related to feeds such as feed items, enclosures, read/unread flags are stored in Tokyo. Same goes for the data structures we need to crawl all these feeds, such as a queue.

nosql: What criteria have you used to make this separation?

Florent: The separation was not clearly related to Tokyo, it was product decision. We wanted to implement this feed backend as a standalone module. We only interact with it trough an API.

nosql: How have you migrated existing data?

Florent: Indeed, initially feeds data were in MySQL tables.

The migration was simple, in terms of logic, but long and difficult to achieve. The main point was when an unknown data was requested from the new backend, a fallback query asked MySQL for the data, and finally saved everything in Tokyo. It sounds easy, but in reality there were many specific cases and strange issues.

nosql: You are using Tokyo hash, btree and tables. Would you mind giving some examples for what kind of data lives in each of them and how have you decided that is the best option?

Florent: When you really understand each structures it’s pretty easy to pick the best choice. For example:

  • When we need only raw speed, we use a hash.
  • When we need complex key strategies (based on prefix), we use btree.
  • When we need conditional queries, we use tables.

For example, feeds (url, title, author, …) are stored in a Table. Same goes for the feed items and enclosures.

The queue is a Hash, to keep the focus on the speed. The first implementation was based on a BTree, but we improved our algorithms to have guessable keys only and prevent key scanning. There are also some lua functions linked to hide implementation and keep the whole thing fast too.

Flags (where we store read/unread data) are stored in a BTree with a lua extension because we are scanning keys a lot.

nosql: Can you speak a bit more about the in-house sharding solution you are using?

Florent: Sure. Tokyo does not come with sharding or dynamic partitioning implementation, so we built our own solution. It’s feed or user centric. For example, we know that the feed table will always fit on one dedicated server, whatever the number of feeds. So, for each feed we store where (the id of the shard server) its items are.

For the flags, same logic, for a given user we know where his flags are. It makes it easy to add new shards, because it’s a line in a configuration file. And we have created all the scripts we need to move data from one shard to another (migration, auto-balance, …)

nosql: What lessons have you learned that you’d have liked to know before using Tokyo?

Florent: Very difficult to say as we have learned so much with this project.

Maybe the most important point would be to know how Tokyo Tyrant servers would manage the load and what are the best practices to prevent common speed issue, that was what we learned the hard-way.

nosql: Any numbers about Netvibes Tokyo deployment you can share with us?

Florent: About numbers, you already know that it’s a sensitive information :-). I can’t say more than those numbers in my slides.

nosql: Fair enough. Thank you so much Florent!

Goodbye Tokyo Cabinet

A while ago I started to sound like a broken record when saying that even if data modeling in NoSQL seems too simple to be true, in fact it is once again an art we need to master as otherwise sooner or later will start complaining about it. Well it looks like that time is starting to be now.

Brian on why he stopped using Tokyo Cabinet:

I have no idea how to use a key/value store database properly. TC will take anything you dump into it, which is both a strength and a weakness.

[…] Some kinds of queries were still awkward in TC.

So if you still think schema-less means not data modeling, then you are doing it wrong!


Japanese Blogs Post Benchmarks on Membase, Memcached, Tokyo Tyrant and Redis

Two japanese blogs[1] have published some benchmarks comparing the newly released membase with memcached, Tokyo Tyrant and Redis.

Unfortunately both of them are just new examples of useless benchmarks:

  • only 1000 keys
  • the benchmark doesn’t vary the size of keys and values
  • no concurrency
  • no mixed reads/writes

I’d strongly suggest anyone planning to build a solid benchmark to take a look at these NoSQL benchmarks and performance evaluations to learn how to build useful/correct ones[2].

  1. The two benchmarks are published ☞ here and ☞ here. Unfortunately I don’t read Japanese and I’ve used Google Translator (which pretty much didn’t work)  ()
  2. Another useful resource about building correct benchmarks is Jan Lehnardt’s ☞ Benchmarks: You are Doing it Wrong  ()

Release: Kyoto Cabinet 1.0.0

Kyoto Cabinet, the successor of Tokyo Cabinet has reached the first stable release: 1.0. The ☞ announcement is pretty reach in details and provides code samples fot all currently supported bindings (C, C++, Java, Python, Ruby, Perl).

Kyoto architecture looks quite interesting and is depicted below:

Mikio Hirabayashi, lead developer, speaking about Kyoto Cabinet vs Tokyo Cabinet:

Kyoto Cabinet has the following features. Especially, Windows support is remarkable.

  • time efficiency: Throughput of updating is more than 100 millions query-per-second.
  • space efficiency: Footprint for each record is 8-16 bytes in the hash DB, 2-4 bytes in the tree DB. concurrency: The hash DB uses read-write lock for each record. The tree DB uses read-write lock for each page.
  • usability: Generic operations of database by interface like the “Visitor” pattern are provided.
  • robustness: Manual transaction, auto transaction, and auto recovery are provided.
  • portability: UNIX-like systems (Linux, FreeBSD, Solaris, Mac OS X) and Windows (VC++) are supported. language bindings: C++, C, Java, Python, Ruby, and Perl are supported.

Compared with Tokyo Cabinet, KC is superior in concurrency, usability, and portability. Although time efficiency for single-thread is better in TC, I recommend KC from now on because multi-core/many-core CPU has been popular. However, I will keep on maintaining TC and fix bugs if they are found.

While Kyoto Cabinet sounds really interesting, I cannot stop asking myself if is this the time to move away from Tokyo Cabinet?

A NoSQL Use Case: URL Shorteners

Last week, I have mentioned a post by Gleicon Moraes on ☞ building an URL shortener using MongoDB and alternatively Redis.

There is also a ☞ GitHub project by Sean Cribbs implementing the same scenario using Riak and Ruby-based Sinatra.

Update: Jan Lehnardt was quick to point me to a CouchDB-based URL shortner on ☞ GitHub.

Update 2: Mathias Meyer shared with us ☞ Relink: a solution built on top of Redis with Sinatra

Update 3: Aaron pointed out ☞ little, another solution using Redis and Node.js

Update 4: Frank Denis has just pushed live ☞ the code of another URL shortener ☞, built using Tokyo Cabinet and Sinatra. Thanks Frank!

I’m pretty sure there are many more such projects so please post a link to the project in the comment section and I’ll update the post.

Building TweetReach with Sinatra, Tokyo Cabinet and Grackle

I’m starting to forget how many Twitter NoSQL-enabled apps I’ve mentioned on the NoSQL blog — fortunately the consistent tagging helps, so you can find them all under the tag Twitter — but every time I’m finding a new one I feel like posting about it.

This time it is a presentation about building a Twitter utility using Tokyo Cabinet and ☞ Sinatra (a Ruby web framework).

The author concludes with some Tokyo Cabinet lessons learned:

  • Lack of auto-expiration when using as mostly a key-value cache is annoying

  • Would definitely use it again for this type of task

I think it is interesting to note that from the key-value stores covered here, only Redis comes with support for key expiration.

Building TweetReach with Sinatra, Tokyo Cabinet and Grackle