In two posts, the Tokutek guys are explaining how transactions work on TokuMX, the replacement engine they are proposing to MongoDB users—remember that Vadim Tkachenko (“MySQL Performance blog“) called TokuMX the InnoDB for MongoDB?:
- For each statement that tries to modify a TokuMX collection, either the entire statement is applied, or none of the statement is applied. A statement is never partially applied.
`, androllbackTransaction` have been added to allow users to perform multi-statement transactions.
- TokuMX queries use multi-version concurrency control (MVCC). That is, queries operate on a snapshot of the system that does not change for the duration of the query. Concurrent inserts, updates, and deletes do not affect query results (note this does not include file operations like removing a collection).
- cursors represent a true snapshot of the system
- simpler to batch inserts together for performance
- simpler for applications to update multiple documents with a single statement
- no need to combine documents together for the purpose of atomicity
✚ I’d find TokuMX’s transactions even more interesting if they would work by default at a shard level instead of cluster level. Users would need to manually configure cluster-wise transaction thus remaining in control of the performance and availability.
✚ I still have my doubts about TokuMK’s positioning, but that’s a business & marketing story.
Original title and link: TokuMX transactions for MongoDB ( ©myNoSQL)
Ricon and QCon are the 2 conferences I regret most for missing this fall. Luckly both of them are publishing videos of the talks, so I’m posting this as early as possible on Saturday morning so you’ll have a busy weekend.
- Michael Bernstein: Distributed Systems Archeology. I watched this one live, but I’ll rewatch it as it’s both extremely interesting and entertaining.
- Peter Bailis: Bad As I Wanna Be: Coordination and Consistency in Distributed Databases and Diego Ongaro: The Raft Consensus Algorithm. These are 2 guys we’ll (continue) to hear a lot in the space of distributed systems. Even if their field is consistency and transactions.
- Jason Brown: Dynamic Dynamos: Comparing Riak and Cassandra. High availability? These are basically your options.
- Jordan West: Controlled Epidemics: Riak’s New Gossip Protocol and Metadata Store and Joseph Blomstedt: Bringing Consistency to Riak (Part 2). What Basho’s is working on.
- Jeff Hodges: Practicalities of Productionizing Distributed Systems. Experiences from the field.
I just wish , for my marriage’s sake, these would have been available during the long Thanksgiving weekend.
Original title and link: Busy weekend ahead - Ricon West videos available ( ©myNoSQL)
For $2ooo, you can get TechNavio’s 40 page report about the NoSQL market:
TechNavio’s analysts forecast the Global NoSQL market to grow at a CAGR of 29.14% over the period 2012-2016.
The key vendors dominating this market space are 10gen Inc., Couchbase Inc., DataStax Inc., and Basho Technologies Inc.
That’s all you need to know.
✚ Actually, it might be useful to know that 10gen Inc. has changed its name to MongoDB Inc.
Original title and link: TechNavio’s report: Global NoSQL market 2012-2016 ( ©myNoSQL)
Today’s dose of predictions for 2014 is coming from a panel with people from Alteryx, Cloudera, Tableau Software and Revolution Analytics:
- Analysts will matter more than data scientists
- R will replace legacy SAS solutions and go mainstream
- Big Data will bring its “A game” in sports marketing
- Hadoop moves from curiosity to critical
- Gartner’s prediction that the line-of-business will drive analytics spend will happen
- Visual analytics continues to grow but users need more
- Analysts lives get more complex, but also easier
- Predictive analytics will no longer be a specialist subject
- Customer analytics is the next big marketing role
- A new analytics stack will emerge
- Location meets big data analytics
- NoSQL meets analytics
Original title and link: 14 predictions about analytics in 2014 ( ©myNoSQL)
A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:
- fast queries for known keys over a (relatively) small dataset
- high read throughput
- high write throughput
- large suite of pre-existing tools for distributed computation
- easier to perform large processing tasks
✚ Both are consistent
✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)