clojure: All content tagged as clojure in NoSQL databases and polyglot persistence
If you are running out of interesting projects to experiment with during this seasonal break, Parkour is a Clojure library for writing MapReduce jobs.
Parkour is our new Clojure library that carries this philosophy to the Apache Hadoop’s MapReduce platform. Instead of hiding the underlying MapReduce model behind new framework abstractions, Parkour exposes that model with a clear, direct interface. Everything possible in raw Java MapReduce is possible with Parkour, but usually with a fraction of the code.
Original title and link: Parkour - Idiomatic Clojure for Map Reduce ( ©myNoSQL)
I waited for the Datomic announcement with great excitement, and I’d like now to share some thoughts, hoping they will be food for more comments or blog posts.
Datomic certainly provides interesting features, most notably:
- Clojure-style data immutability, separating entity values in time.
- Declarative query language with powerful aggregation capabilities.
But unfortunately, my list of concerns is way longer, maybe because some lower level aspects weren’t addressed in the whitepaper, or maybe because my expectations were really too high. Let’s try to briefly enumerate the most relevant ones:
Datomic provides powerful aggregation/processing capabilities, but violates one of the most important rules in distributed systems: collocating processing with data, as data must be moved from storage to peers’ working set in order to be aggregated/processed. In my experience, this is a huge penalty when dealing with even medium-sized datasets, and just answering that “we expect it to work for most common use cases” isn’t enough.
My comment: The answer to similar comments pointed to the local caches. But I think it is still a very valid observation.
In-process caching of working sets usually leads in my experience to compromising overall application reliability: that is, the application usually ends up spending lots of time dealing with the working set cache, either faulting/flushing objects or gc’ing them, rather than doing its own business.
Transactors are both a Single Point Of Bottleneck and Single Point Of Failure: you may don’t care about the former (which I’d do btw), but you have to care about the latter.
My comment: The Datomic paper contains an interesting formulation about the job of transactors for reads and writes:
When reads are separated from writes, writes are never held up by queries. In the Datomic architecture, the transactor is dedicated to transactions, and need not service reads at all!
In an ACID system, both reads and writes represent transactions though.
You say you avoid sharding, but being transactors a single point of bottleneck, when the time you have too much data over a single transactor system will come, you’ll have to, guess what, shard, and Datomic has no support for this apparently.
There’s no mention about how Datomic deals with network partitions.
I think that’s enough. I’ll be happy to read any feedback about my points.
As Sergio Bossa, I’d really love to hear some answers from the Datomic team.
Original title and link: Thoughts About Datomic ( ©myNoSQL)
After saying that MongoDB’s default fire-and-forget behavior is wrong, CouchDB community welcomed this sample Clojure code showing 5500 inserts/second implemented with a fire-and-forget behavior and bulk inserts:
So I contemplated the problem some and wondered whether Clojure’s STM (Software Transactional Memory) could be leveraged. As requests come in, instead of connecting immediately to the database, why not queue them up until we have an optimal number and then do a bulk insert?