NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Thoughts About Datomic

Sergio Bossa has left a great comment on Fogus’s blog post about Datomic that encapsulates in much more detail all my notes (and some more) about Datomic:

I waited for the Datomic announcement with great excitement, and I’d like now to share some thoughts, hoping they will be food for more comments or blog posts.

Datomic certainly provides interesting features, most notably:

  1. Clojure-style data immutability, separating entity values in time.
  2. Declarative query language with powerful aggregation capabilities.

But unfortunately, my list of concerns is way longer, maybe because some lower level aspects weren’t addressed in the whitepaper, or maybe because my expectations were really too high. Let’s try to briefly enumerate the most relevant ones:

  1. Datomic provides powerful aggregation/processing capabilities, but violates one of the most important rules in distributed systems: collocating processing with data, as data must be moved from storage to peers’ working set in order to be aggregated/processed. In my experience, this is a huge penalty when dealing with even medium-sized datasets, and just answering that “we expect it to work for most common use cases” isn’t enough.

    My comment: The answer to similar comments pointed to the local caches. But I think it is still a very valid observation.

  2. In-process caching of working sets usually leads in my experience to compromising overall application reliability: that is, the application usually ends up spending lots of time dealing with the working set cache, either faulting/flushing objects or gc’ing them, rather than doing its own business.

  3. Transactors are both a Single Point Of Bottleneck and Single Point Of Failure: you may don’t care about the former (which I’d do btw), but you have to care about the latter.

    My comment: The Datomic paper contains an interesting formulation about the job of transactors for reads and writes:

    When reads are separated from writes, writes are never held up by queries. In the Datomic architecture, the transactor is dedicated to transactions, and need not service reads at all!

    In an ACID system, both reads and writes represent transactions though.

  4. You say you avoid sharding, but being transactors a single point of bottleneck, when the time you have too much data over a single transactor system will come, you’ll have to, guess what, shard, and Datomic has no support for this apparently.

  5. There’s no mention about how Datomic deals with network partitions.

I think that’s enough. I’ll be happy to read any feedback about my points.

As Sergio Bossa, I’d really love to hear some answers from the Datomic team.

Original title and link: Thoughts About Datomic (NoSQL database©myNoSQL)