NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL event: All content tagged as NoSQL event in NoSQL databases and polyglot persistence

Hadoop World 2010 Tweet Analysis

Fun project using Hadoop and Twitter streaming API:

During the keynote, I quickly created an Amazon Micro EC2 instance, tapped into the Twitter Streaming API, and began downloading tweets containing the hashtag #hw2010.

After filtering out a few Halloween tweets (get it?  #hw2010?), about 1,500 tweets remained, respectable for a one-day event. 

For my (real-time) Hadoop World in Tweets I’ve used ☞ Storify and my eyes. Not as scalable as Hadoop though.

Original title and link: Hadoop World 2010 Tweet Analysis (NoSQL databases © myNoSQL)


Notes from Cassandra by Example at Devoxx

From David’s notes on Jonathan Ellis’ Cassandra by example presentation at Devoxx:

When designing a relational schema we tend to think of objects and relationships. With Cassandra we need to think of objects and the queries we want to run against them. For each type of query you will need a column family (something like a table).

Remember the role of data modeling with NoSQL databases?

Original title and link: Notes from Cassandra by Example at Devoxx (NoSQL databases © myNoSQL)


Video from A NOSQL Evening in Palo Alto

Besides all reports from a NOSQL evening in Palo Alto now we also have the video embedded below for your convenience.

Original title and link: Video from A NOSQL Evening in Palo Alto (NoSQL databases © myNoSQL)

Notes from the MongoBerlin Conference

At least 6 MongoDB talks summarized on topics like: BRAINREPUBLIC MongoDB case study, MongoDB internals, MongoDB indexing and query optimizer, MongoDB sharding internals, MongoDB replication internals, and scaling with MongoDB. I’ve found the ones on MongoDB internals quite interesting:

query optimizer:

  • it’s empirical, i.e. at first it tries all possible ways to get the results, and then remembers which one works best (it runs all algorithms in parallel and finishes as soon as one of them finishes), then reuses that knowledge in future requests
  • if the selected algorithm becomes very slow, it tries all possible ways again
  • so first time a query is called, it might be quite slow
  • on the other hand, if something changes later, e.g. an index becomes slow, Mongo will work around that

Original title and link: Notes from the MongoBerlin Conference (NoSQL databases © myNoSQL)


Presentations from NOSQL Afternoon in Japan

The Palo Alto NoSQL event has been followed by one in Japan called NOSQL afternoon in Japan. Thanks to ☞ Gemini Mobile Technology blog I have found links to the video recordings from the event:

  • ☞ Part 1: Opening Remarks, Hibari, Okuyama, Cassandra, ROMA, MyCassandra
  • ☞ Part 2: MongoDB, kumofs
  • ☞ Part 3: Couch DB, HBase/Hadoop, Closing Remarks.

Quite a lot to watch over the weekend!

Original title and link: Presentations from NOSQL Afternoon in Japan (NoSQL databases © myNoSQL)

Thoughts from NoSQL Evening in Palo Alto

I should start by saying that I love new technology as long as it proves to be useful and that it works. For example I’ve been using LiveScribe to record the great NoSQL Evening in Palo Alto event organized by Tim Anglade with InfiniteGraph support, just to discover later that it decided to lose everything. So, instead of being able to quote things from the event, I’ll have to rely on my memory, which is really really bad.

After Berlin Buzzwords, this was the largest NoSQL event I’ve participated to. With InifiniteGraph’s people support, Tim Anglade did a great job in organizing this event and gathered together in a panel quite a few leaders of the NoSQL market. Unfortunately there were a few notable absences too; Redis, HBase, Project Voldemort, Neo4j, and RavenDB being the ones I’ve missed. Anyways, knowing how difficult is to put something like this together, this is understandable.

Before getting to what I’m remembering from the event, I have to tell you that I’ve been impressed with the fact that InfiniteGraph has not pushed for their product during the event. They have been great hosts and I had enjoyable discussions with many of their people, especially Darren Wood the lead architect.

Now it is time to test how bad my memory is. In case I got things wrong, please feel free to correct me.

What triggered so much activity in the persistence space?

  • cloud computing
  • failings do lead to innovation
  • the changing nature of the applications
  • old ideas reoccurring in customer work

Simplicity has different meanings for different people

NoSQL databases try to be:

  • developer friendly
  • ops friendly
  • user friendly

As a side note, it looks like there is still a myth out there related to NoSQL databases not needing DBAs. While the title doesn’t need to be the same, every NoSQL database will need someone…

What is the market size?

The relational database market size is estimated at $27bil. Noone wanted to go on record with their NoSQL databases market estimation. The only number I’ve heard mentioned that “70% usecases can fit NoSQL solutions” (Roger Bodamer, 10gen)

(New) Data models

The data model determines the access model. This discussion continued over the dinner, when people tried to answer the question how connected is SQL to the relational model.

I’ve also seem to remember some interesting remarks about indexes:

  • indexes are a different data model that enable different access models
  • indexes will be orders of magnitude larger than real data

RAM is the new disk

There are many products out there which believe in RAM being the new disk (i.e. VoltDB, elastic caching solutions, etc.). Darren Wood (InfiniteGraph) mentioned that “graph analytics is a counter-example of using RAM storage for all data”.


Many NoSQL databases have chosen to use some form or another of open source licensing models. The reasons for doing it:

  • ease market penetration
  • there’ll always be companies willing to pay for software, support, streams of patches, etc.
  • open source gives great hiring resources

Some other disparate notes:

  • Many of the solutions are old, but the wrapping/packaging is new. For example, MarkLogic is a good proof document databases work, even if it is not one of the “cool” products.
  • Not all NoSQL databases are about size
  • OLTP is for knowledge classification; OLAP is for knowledge discovery
  • Will we have multi-purpose NoSQL databases?
  • Cold data should be on disk
  • Using the right protocol can help you skip supporting specific features. CouchDB is HTTP friendly, so it doesn’t need to directly have a caching layer
  • Are key-value stores offering too little compared to file systems?
  • SQLite has a great distribution model: it is basically everywhere.

For more accurate coverage of the event you can read:

Update: now we’ve got also the video!

Original title and link: Thoughts from NoSQL Evening in Palo Alto (NoSQL databases © myNoSQL)

NoSQL at Strange Loop

The ☞ Strange Loop conference hosted two NoSQL talks: ☞ Steve Smith on Real world modeling with MongoDB (PDF) and ☞ Billy Newport on Enterprise NoSQL: Silver Bullet or Poison Pill?.

First one was on the practical parts of NoSQL, the second offered an “enterprisey” perspective on NoSQL. Victor Olteanu has a ☞ long post summarizing the talks at the conference, including the two mentioned.

Talking about modeling with MongoDB, here is another slidedeck on this subject:

Update: I stand corrected: there were many more NoSQL talks at StrangeLoop.


Riak from Small to Large

There’s also a video of Rusty Klophaus giving this presentation at Berlin Buzzwords

Working with Dimensional data in Distributed Hash Tables

Unifying the Search Engine and NoSQL DBMS with a Universal Index

Chris Biow’s slides also available ☞ as PDF.

There were 4 more that I couldn’t tracked down

HyperGraphDB - Data Management for Complex Systems

Borislav Iordanov’s slides available ☞ here (pdf)

NoSQL At Twitter

Kevin Weil’s slides available ☞ here (pdf)

Adopting Apache Cassandra

Eben Hewitt’s slides available ☞ here (pdf)

Scaling with MongoDB

Roger Bodamer’s slides available ☞ here.

Original title and link: NoSQL at Strange Loop (NoSQL databases © myNoSQL)

Hadoop World in Tweets

I still cannot figure out how I managed to miss Hadoop World. All I got left (except being pissed off) is to follow the tweets coming from the NoSQL event.

Topics covered so far:

  • Big Data
  • Hadoop security
  • HBase
  • Case studies
    • eBay
    • Facebook: Hadoop, HBase, Hive, Scribe
    • Twitter: Hadoop, Scribe, Oozie, Pig
    • Hadoop at Chicago Mercantile Exchange
    • HP
    • comScore
    • StumbleUpon

Check out my curated Hadoop World stream.

NoSQL Frankfurt: A Quick Review of the Conference

Yesterday was the NoSQL Frankfurt conference and today we have the chance to review some of the slide decks presented.

Beyond NoSQL with MarkLogic and The Universal Index

Nuno Job (@dscape) has presented on MarkLogic — an XML server we haven’t talked too much about, its universal index, and a couple of other interesting features.

The GraphDB Landscape and sones

Achim Friedland (@ahzf) has provided a very interesting overview of the graph databases products, the goals and some scenarios for graph databases, a brief comparison of property graphs with other models (relational databases, object-oriented, semantic web/RDF, and many other interesting aspects.

Data Modeling with Cassandra Column Families

Gary Dusbabek (@gdusbabek) has covered data modeling with Cassandra (the topic I’m still finding to be one of the most complicated).

Neo4j Spatial - GIS for the rest of us

Peter Neubauer (@peterneubauer) covered another interesting topic in the data space: geographic information (GIS) in graph databases.

Even if GISers suggested this integration some time ago Neo4j announced recently support for GEO.

Cassandra vs Redis

Tim Lossen (@tlossen) slides compare Cassandra and Redis from the perspective of a Facebook game requirements. All I can say is that the conclusion is definitely interesting, but you’ll have to check the slides by yourselves.

Mastering Massive Data Volumes with Hypertable

Doug Judd — who impressed me with his fantastic Hypertable: The Ultimate Scaling Machine at the Berlin Buzzwords NoSQL conference — gave a talk on Hypertable, its architecture and performance. The presentation also mentioned two Hypertable case studies: Zvents (an analytics platform) and (spam classification)[1]:

More presentations will be added as I’m receiving them.

  1. Just recently I’ve posted about Hadoop being used for spam detection.  ()

Original title and link: NoSQL Frankfurt: A Quick Review of the Conference (NoSQL databases © myNoSQL)

Hypertable: The Ultimate Scaling Machine

Fantastic presentation by Doug Judd covering not only Hypertable but also other really scalable NoSQL databases:

Session was recorded at Berlin Buzzwords conference. Here is the list of my favorite presentations from the event.

Original title and link for this post: Hypertable: The Ultimate Scaling Machine (published on the NoSQL blog: myNoSQL)

Cassandra Summit Through The Eyes of an HBase Emeriti Committer

Bryan Duxbury summarizing Cassandra summitvideos and slides available here — :

Jonathan Ellis’s “state of the union” talk was interesting for a variety of reasons. The struggles they’re having with hinted handoff seems to be one of the classic symptoms of trying to build a project around someone else’s whitepaper – it’s good to hear that they’re starting to overcome the difficulties and actually get a good feature into play. I was also really pleased to see that we’d managed to take care of two out of three of Cassandra’s chief complaints about Thrift. (Cassandra’s near-switch to Avro has me nervous.)

Original title and link for this post: Cassandra Summit Through The Eyes of a HBase Emeriti Committer (published on the NoSQL blog: myNoSQL)


Riptano Publishes Videos and Slides from Cassandra Summit

Riptano, the company offering services for Cassandra, has posted links to videos and slide decks from Cassandra summit. 8 videos and 9 slide decks from speakers like Jonathan Ellis (Riptano, Cassandra), Stu Hood (Rackspace), Gary Dusbabek (Rackspace), Kelvin Kakugawa (Digg), Noah Silas and John Watson (Mahalo). They represent probably the most well known Cassandra users.

I haven’t had the time to watch them myself, so please do let us know which ones are the must see.

Riptano Published Videos and Slides from Cassandra Summit originally posted on the NoSQL blog: myNoSQL