NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Erlang: All content tagged as Erlang in NoSQL databases and polyglot persistence

Why Membase Uses Erlang

On what makes Erlang one of the best environments for building distributed systems:

At the end of the day, the real question isn’t whether it would have been possible for us to implement our cluster management in another language; it’s really a question of effort and maintainability of the result. With any other environment, we would have had to reimplement at least part of what Erlang/OTP provides, while we haven’t really found ourselves reimplementing features provided by any other environment.

A NoSQL database per language ranking would look like:

  • C: 2 Redis, Tokyo Cabinet
  • C++: 3 Hypertable, MongoDB, Kyoto Cabinet
  • C#: 2 RavenDB, sones GraphDB
  • Erlang: 3 4 CouchDB, Membase, Riak, Hibari
  • Java: 8 Cassandra, Hadoop, HBase, OrientDB, Terrastore, Project Voldemort, Neo4j, Hypergraph

Original title and link: Why Membase Uses Erlang (NoSQL databases © myNoSQL)


Phoebus: Erlang-based Implementation of Google’s Pregel

Chad DePue about Phoebus, the first (?) open source implementation of Google’s Pregel algorithm:

Essentially, Phoebus makes calculating data for each vertex and edge in parallel possible on a cluster of nodes. Makes me wish I had a massively large graph to test it with.

Developed by Arun Suresh (Yahoo!), the project ☞ page includes a bullet description of the Pregel computational model:

  • A Graph is partitioned into a groups of Records.
  • A Record consists of a Vertex and its outgoing Edges (An Edge is a Tuple consisting of the edge weight and the target vertex name).
  • A User specifies a ‘Compute’ function that is applied to each Record.
  • Computation on the graph happens in a sequence of incremental Super Steps.
  • At each Super step, the Compute function is applied to all ‘active’ vertices of the graph.
  • Vertices communicate with each other via Message Passing.
  • The Compute function is provided with the Vertex record and all Messages sent to the Vertex in the previous SuperStep.
  • A Compute funtion can:
    • Mutate the value associated to a vertex
    • Add/Remove outgoing edges.
    • Mutate Edge weight
    • Send a Message to any other vertex in the graph.
    • Change state of the vertex from ‘active’ to ‘hold’.
  • At the begining of each SuperStep, if there are no more active vertices -and- if there are no messages to be sent to any vertex, the algorithm terminates.
  • A User may additionally specify a ‘MaxSteps’ to stop the algorithm after a some number of super steps.
  • A User may additionally specify a ‘Combine’ funtion that is applied to the all the Messages targetted at a Vertex before the Compute function is applied to it.

While it sounds similar to mapreduce, Pregel is optimized for graph operations, by reducing I/O, ensuring data locality, but also preserving processing state between phases.

Original title and link: Phoebus: Erlang-based Implementation of Google’s Pregel (NoSQL databases © myNoSQL)

What is Riak?

From Basho’s blog:

Riak is:

  • A Database
  • A Data Store
  • A key/value store
  • Used by Fortune 100 Companies
  • Used by startups
  • A “NoSQL” database
  • Schemaless and data-type agnostic
  • Written (primarily) in Erlang
  • As distributed as you want and need it to be
  • Scalable
  • Pronounced “REE-ack”
  • Not the best fit for every project and application
  • And much, much more…

So far I’ve heard only about Riak and Mozilla, Riak at, and this atypical Riak usage for a church kiosks, but no mentions of Fortune 100 company names. Anyone knows who are they referring to?

Update: please check the comment thread for more details. It looks like the Fortune 100 company Basho is referring to is Comcast.

Original title and link for this post: What is Riak? (published on the NoSQL blog: myNoSQL)


Riak on Erjang

No, it’s not a typo. Erjang is a port of Erlang on the JVM[1] created by Kresten Krab Thorup. Now his trying to run Riak on top of it:

I’ve spent some time trying to get Riak up and running on Erjang, and here’s a note on current status. We’re very early, but it is basically “up and humping”.

One remark though, Erlang doesn’t feel like a platform that needs the JVM. But still cool!


All Erlang: Riak and Mnesia

Rusty Klophaus (@rklophaus) published a ☞ fantastic recap of the Erlang Factory London event. There were two parts that caught my attention summarizing Justin Sheehy’s presentation on Riak architecture and Ulf Wiger’s presentation on Mnesia.

Riak architecture:

There are eight distinct layers involved in reading/writing Riak data:

  • The Client Application using Riak
  • The client-side HTTP API or Protocol Buffers API that talks to the Riak cluster
  • The server-side Riak Client containing the combined backing code for both APIs
  • The Dynamo Model FSMs that interact with nodes using Dynamo style quorum behavior and conflict resolution
  • Riak Core provides the fundamental distribution of the system (not covered in the talk)
  • The VNode Master that runs on every physical node, and coordinates incoming interaction with individual VNodes
  • Individual VNodes (Virtual Nodes) which are treated as lightweight local abstractions over K/V storage
  • The swappable Storage Engine that persists data to disk

☞ Riak from the Inside

Mnesia and NoSQL

  • Deployed commercially for over 10 years
  • Comparable performance to current top performers clustered SQL space
  • Scalable to 50 nodes
  • Distributed transactions with loose time limits (in other words, appropriate for transactions across remote clusters)
  • Built-in support for sharding (fragments)
  • Incremental backup

The downsides are:

  • Erlang only interface
  • Tables limited to 2GB
  • Deadlock prevention scales poorly
  • Network partitions are not automatically handled, must recombine tables automatically

☞ Mnesia for the CAPper

CouchDB, the document database built on Erlang, was also present at the event, but I couldn’t find a report about the talk or the slides.

Tokyo Cabinet and CouchDB as Mnesia backends

If you are somehow familiar with Erlang you already know that Mnesia is a distributed database system that was designed with the following goals in mind:

  • Fast real-time key/value lookup
  • Complicated non real-time queries mainly for operation and maintenance
  • Distributed data due to distributed applications
  • High fault tolerance
  • Dynamic re-configuration
  • Complex objects

Even if the presentation is not so great (see below), Rickard Cardell’s experiments of using Tokyo Cabinet and CouchDB as Mnesia backends sound like a new and interesting usecase for NoSQL solutions. Nitrogen, Riak, Erlang

Not sure there was a need for another online presentation tool, but is interesting as it is using Riak and the source code was made available on ☞ GitHub.

To get a better idea about SlideBast, you can watch Rusty Klophaus, creator of SlideBlast, talking about Nitrogen and Riak By Example. The video is embedded below and slides can be accessed ☞ here.


Riak with Embedded Erlang

The guys from Basho, producers of Riak, have just posted about a new way of running Riak running in a fully self-contained embedded node environment.

In case you are wondering what’s so interesting about this, the video below will walk you from Riak’s compilation to the different benefits of running Riak in this mode. The one that sounded important to me is that now thanks to the lack of dependencies starting more nodes (note: on similar hardware) will be much easier. And as I was writing in The New Dimension of NoSQL Scalability: Complexity, reducing operational complexity is very important for the NoSQL systems.