ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

NoSQL

This page is a brief introduction to NoSQL offering a set of definitions of the NoSQL term and NoSQL databases, explaining the reasons behind NoSQL databases.

For various guides and tutorial for getting started with NoSQL databases, check out the NoSQL: Guides, Tutorials, Books, Papers page. If you are interested in finding who is using NoSQL solutions, check the Powered by NoSQL reference. Then choose your NoSQL database and NoSQL library.

What is NoSQL?

A list of (possible) definitions for NoSQL (also referred to as NoSQL databases or NoSQL stores):

NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage.

Wikipedia

Non-relational next generation operational datastores and databases

Dwight Merriman, CEO 10gen

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable.

nosql-databases.org

NoSQL is a term coined by Carlo Strozzi and repurposed by Eric Evans to refer to “some” storage systems. The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL.

NoSQL is about choice

NoSQL is not about any one feature of any of the projects. NoSQL is not about scaling, NoSQL is not about performance, NoSQL is not about hating SQL, NoSQL is not about ease of use, NoSQL is not about sharding, NoSQL is not about throughput, NoSQL is not about speed, NoSQL is not about dropping ACID, NoSQL is not about Eventual Consistency, NoSQL is not about CAP, NoSQL is not about open standards, NoSQL is not about Open Source and NoSQL is most likely not about whatever else you want NoSQL to be about. NoSQL is about choice

Jan Lehnardt, CouchDB

Why NoSQL?

  • Handling massive amounts of data
    • Exponential growth of newly created digital content
    • More value around data
    • Build value around data by connecting the dots
  • Connectedness
  • Information format
  • Data usage scenarios (plus open data)

Fundamental papers

NoSQL databases

Columnar Stores or Wide Column Stores

  • BigTable:
  • Cassandra: a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
  • HBase:
  • Hypertable

Document stores or Document databases

  • Colayer
  • CouchDB
  • FleetDB
  • Jackrabbit
  • Lotus Notes
  • MongoDB
  • OrientDB
  • Raven DB
  • ThruDB
  • Terrastore

Graph databases

  • AllegroGraph
  • Bigdata
  • Core Data
  • DEX: a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
  • Filament
  • FlockDB
  • HyperGraphDB
  • InfiniteGraph
  • InfoGrid
  • Neo4j
  • OpenLink Virtuoso
  • Sones
  • VertexDB
  • Trinity: a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs.

Key-Value Stores

  • Amazon SimpleDB
  • Azure Table Storage
  • Berkeley DB
  • Chordless
  • Dynomite
  • GenieDB: GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
  • GT.M / M.DB
  • HamsterDB
  • Hibari: Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.
  • KAI
  • KaTree
  • Kumofs
  • LightCloud
  • Membase
  • Memcachedb
  • Mnesia
  • NorthScale
  • Orient Key/Value Server
  • Pincaster
  • PNUTS/Sherpa
  • Project Voldemort: LinkedIn open source implementation of Amazon Dynamo key-value store
  • Redis
  • Riak: Dynamo-inspired key/value store that scales predictably and easily.
  • Scalaris
  • ScalienDB / Scalien Keyspace: a distributed, consistent key-value store
  • Tokyo Cabinet

Multi-value databases

  • OpenQM
  • Rocket U2

Object databases

  • Db4o
  • GemStone/S
  • KiokuDB
  • InterSystems Caché
  • Neo
  • Objectivity/DB
  • Perst
  • Progress
  • Versant
  • ZODB

XML databases

  • BerkleyDB XML
  • EMC Documentum xDB
  • eXist
  • MarkLogic Server
  • Sausalito: Sausalito powers XQuery in the Cloud
  • Sedna
  • Tamino
  • Xindice

Unclassified

  • CloudKit
  • FluidDB
  • Moneta
  • Perservere

Cassandra

Project description
The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
Project home
cassandra.apache.com
Data model
wide column
Distribution model
masterless cluster (inspired by Amazon Dynamo)
Persistence model
Disk
Client/network protocol(s)
Custom
Elasticity
Yes
License
Apache
Implementation language/supported OS
java
Any other exciting features
Fault tolerant, durable
Contributed by

DEX

Project description
DEX is a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
Project home
sparsity-technologies.com
Data model

Labeled Directed Attributed Multigraph

Labeled
nodes and edges belong to types
Directed
directed edges
Attributed
Nodes and Edges with attributes
Multigraph
multiple edges between the same nodes even from the same edge type.
Distribution model
not distributed
Persistence model
Disk
Client/network protocol(s)
None
Elasticity
not applicable
License
Free evaluation version available at http://sparsity-technologies.com/dex_downloads.php (limited to 1 Million nodes, no restriction on edges, and one concurrent user session). Non-restricted version licensed by Sparsity-Technologies, more information at info@sparsity-technologies.com
Implementation language/supported OS
Java and C++ / Windows and Linux
Any other exciting features
Quick answering time for complex queries , Multiple graph algorithms available, Regular expression querying, Large char objects for attributes , Materialized neighbors, Indexed or not indexed attributes, CSV import/export, Script loaders and lots of more exciting features.
Contributed by
Dàmaris Coll

GenieDB

Project description
GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
Data model
k/v, document, and column (we’re considering adding support for graph, too)
Distribution model
Masterless cluster, supporting geographically dispersed operation.
Persistence model
Disk
Client/network protocol(s)
Native API C library (accessible from PHP etc), or through MySQL
Elasticity
Sure, you can grow and shrink clusters on-line. We provide a full replica on every server, so you can scale read capacity this way, but you can only write as fast as the slowest server can keep up with, and store as much data as your smallest server.
License
Closed-source (for now)
Implementation language/supported OS
The core’s in C; the MySQL plugin is in C++ out of necessity. Primary development is on Linux, but a Solaris port exists, and BSD is in the pipeline.
Any other exciting features
We’ve aimed for pragmatism, so there are lots of little things that avoid failure cases or reduce administrative overhead, such as the write flow control system (to avoid snowballing replication queues), the ability to trade off performance/cost/semantics tradeoffs at various levels, and so on.
Contributed by
Alaric Snell-Pym

Hibari

Project description
Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.
Data model
key-value
Distribution model
Chain replication between data nodes; Cluster’s global hash managed by master/slave admin nodes
Persistence model
disk, disk+memory
Client/network protocol(s)
Client protocol
  • A native Erlang API, via Erlang’s native message-passing mechanism
  • Amazon S3 protocol, via HTTP
  • UBF, Joe Armstrong’s “Universal Binary Format” protocol, via TCP
  • UBF via several minor variations of TCP transport
  • UBF over JSON-RPC, via HTTP
  • JSON-encoded UBF, via TCP
Protocols under development:
  • Memcached, via TCP
  • UBF over Thrift, via TCP
  • UBF over Protocol Buffers, via TCP
Elasticity
?
License
Apache Public License version 2.0
Implementation language/supported OS
Erlang/OTP. RedHat, CentOS, and Fedora Linux distributions.
Any other exciting features
Chain Replication
Contributed by
Shinya Motohashi

Riak

Project description
Riak is a Dynamo-inspired key/value store that scales predictably and easily. A truly fault-tolerant system, Riak has no single point of failure. No machines are special or central in Riak, so developers and operations professionals can decide exactly how fault-tolerant they want and need their applications to be.
Project home
basho.com
Data model
key-value
Distribution model
masterless cluster (inspired by Amazon Dynamo)
Persistence model
Disk (support multiple persistence engine)
Client/network protocol(s)
HTTP, Protocol buffers API
Elasticity
yes
License
Implementation language/supported OS
Erlang
Any other exciting features
Pre-commit hooks, post-commit hooks, Links

Project Voldemort

Project description
LinkedIn implementation of Amazon Dynamo key-value store
Project home
project-voldemort.com
Data model
key-value store
Distribution model
masterless cluster (inspired by Amazon Dynamo)
Persistence model
disk (pluggable storage engines)
Client/network protocol(s)
custom
Elasticity
License
Apache 2.0
Implementation language/supported OS
Java
Any other exciting features
Pluggable serialization, pluggable storage engines
Contributed by

Sausalito

Project description
Sausalito powers XQuery in the Cloud. It is an integrated database and application server designed to run on cloud infrastructures.
Project home
28msec.com
Data model
XML
Distribution model
Masterless cluster
Persistence model
Disk
Client/network protocol(s)
XQuery / HTTP
Elasticity
Auto Scaling + Elastic Load Balancing
License
Closed source
Implementation language/supported OS
MacOS X, Linux, Windows
Any other exciting features

Sausalito provides an integrated stack to build web applications. It leverages Amazon AWS to scale up and down applications. Applications are entirely written in XQuery. This language has the following benefits:

  • It is a unified framework for all tiers; database, application logic and presentation. This property allows to provide a single-tiered application and database architecture
  • It is a functional programming language which can automatically be optimized and parallelized. This property is particularly important in cloud computing infrastructures.
Contributed by
28msec Inc.

ScalienDB

Project description
ScalienDB (the successor of Keyspace) is a distributed, consistent key-value store. You can define quorums, sets of ScalienDB nodes consistently replicating data between each other. You can define an arbitrary number of databases and tables. Data stored in a table is automatically partitioned into shards, and shards can be assigned to quorums. Access to tables are coordinated by a set of controllers, so there is no single point of failures. During development specific effort has been exercised to ensure that data is not lost no matter what happens, as long as at least one replica of the data remains accessible.
Project home
scalien.com
Data model
key-value
Distribution model
Paxos
Persistence model
Disk
Client/network protocol(s)
Custom
Elasticity
Nodes can be added to or removed from the system on the fly.
License
AGPL (server), BSD (client)
Implementation language/supported OS
ScalienDB is written in C++, with client interfaces for C, C++, Java, Python, C# and PHP. ScalienDB can run on Linux, Windows and OS/X
Any other exciting features
Web interface for administration, various consistency levels
Contributed by
Peter Schonhofen

Trinity

Project description
Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.
Project home
research.microsoft.com
Data model
graph database, hypergraph
Distribution model
one machine or hundreds of machines
Persistence model
memory-based graph store
Client/network protocol(s)
Elasticity
License
N/A
Implementation language/supported OS
N/A
Any other exciting features
Trinity supports large scale, offline batch processing. Both Synchronous and Asynchronous batch computation is supported.

Please remember that your submission should include at least one of the following points

Project description
a short one-liner description of the project
Project home
Data model
k/v, document, column, graph, xml, object, etc.
Distribution model
Single server, master/slave, p2p replication, masterless cluster, etc.
Persistence model
Disk, memory, memory with snapshoting, etc.)
Client/network protocol(s)
Elasticity
License
Implementation language/supported OS
Any other exciting features
Contributed by