ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

ACID: All content tagged as ACID in NoSQL databases and polyglot persistence

Banks and the Ethernal Consistency Example or What trumps consistency

Todd Hoff extracts and expands on some thoughts about BASE vs ACID from Eric Brewer’s NoSQL: Past, Present, Future published on InfoQ:

Consistency it turns out is not the Holy Grail. What trumps consistency is:

  • Auditing
  • Risk Management
  • Availability

But the cornerstone of the availability vs consistency conversation is:

Availability correlates with revenue and consistency generally does not.

✚ Over time Michael Stonebraker has been the most prominent supporter of exactly the opposite argument.

✚ Remember Emin Gün Sirer’s The NoSQL Partition Tolerance Myth? He used the bank example too.

Original title and link: Banks and the Ethernal Consistency Example or What trumps consistency (NoSQL database©myNoSQL)

via: http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-are-base-not-acid-availability.html


MongoDB Transactions With TokuDB's Fractal Tree Indexes Engine

Interesting new direction of TokuDB pushing their storage engine based on Fractal Tree Indexes to MongoDB:

Running MongoDB with Fractal Tree Indexes (used today in the MySQL storage engine TokuDB) is fully transactional. Each statement is transactional. If an update is to modify ten rows, then either all rows are modified, or none are. Queries use multi-versioning concurrency control (MVCC) to return results from a snapshot of the system, thereby not being affected by write operations that may happen concurrently.

Original title and link: MongoDB Transactions With TokuDB’s Fractal Tree Indexes Engine (NoSQL database©myNoSQL)

via: http://www.tokutek.com/2013/04/mongodb-transactions-yes/#gsc.tab=0


Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions

Learning from Peter Bailis:

While the CAP Theorem is fairly well understood, the relationship between CAP and ACID transactions is not. If we consider the current lack of highly available systems providing arbitrary multi-object operations with ACID-like semantics, it appears that CAP and transactions are incompatible. This is partly due to the historical design of distributed database systems, which typically chose consistency over high availability. Standard database techniques like two-phase locking and multi-version concurrency control do not typically perform well in the event of partial failure, and the master-based (i.e., master-per-shard) and overlapping quorum-based techniques often adopted by many distributed database designs are similarly unavailable if users are partitioned from the anointed primary copies.

There’s also a paper (PDF) authored by Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph m. Hellerstein, Ion Stoica. These names should tell you something.

Original title and link: Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions (NoSQL database©myNoSQL)

via: http://www.bailis.org/blog/hat-not-cap-introducing-highly-available-transactions/


Cloud Computing Lets Us Rethink How We Use Data

But not everything we do in a database needs guaranteed transactional consistency.

Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day.  The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.

Do you care?

It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?

Original title and link: Cloud Computing Lets Us Rethink How We Use Data (NoSQL database©myNoSQL)

via: http://www.tomsitpro.com/articles/mapreduce-hadoop-cloud_computing-acid-relational_database,1-165.html


ACID in HBase: Row Level Operations Explained. Plus Something New

Lars Hofhansl:

HBase employs a kind of MVCC. And HBase has no mixed read/write transactions. […] When a write transaction (a set of puts or deletes) starts it retrieves the next highest transaction number. In HBase this is called a WriteNumber. When a read transaction (a Scan or Get) starts it retrieves the transaction number of the last committed transaction. HBase calls this the ReadPoint.

Understanding the behavior of read and write operations in HBase is definitely useful. Learning that an upcoming HBase version will support atomic multi operations (HBASE-3584) and even multi-row local transactions (HBASE-5229) is priceless.

For HBase atomic multi-operations:

 Delete d = new Delete(ROW);
 Put p = new Put(ROW);
 ...
 AtomicRowMutation arm = new AtomicRowMutation(ROW);
 arm.add(p);
 arm.add(d);
 myHtable.mutateAtomically(arm);

and HBase multi-row local transactions is implemented as mutateRowsWithLocks method in HRegion and can be used by coprocessors only (no client API).

Original title and link: ACID in HBase: Row Level Operations Explained. Plus Something New (NoSQL database©myNoSQL)

via: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html?spref=tw


WhySQL: MySQL/InnoDB ACID Guarantees for Evernote

Dave Engberg has published on the Evernote Techblog a post explaining why the Atomicity, Consistency, and Durability characteristics of a single replicated MySQL/InnoDB deployment are essential to the way Evernote operates.

While it’s difficult to argue about a technical decision with so little details available, I still wanted to point out a couple of things:

  1. Atomicity: most of the NoSQL databases offer atomic operation at the level of a single record. For distributed systems that do not want to rely on 2PC, it is the multi-row atomic operations that are not supported.

    The example presented in the post does not require multi-row transactions, but rather guaranteed client operation ordering. This is achievable in most NoSQL databases.

  2. Consistency: the post talks about data consistency from the perspective of data integrity guarantees through usage of foreign keys.

    In the world of NoSQL similar behavior could be achieved by different data modeling solutions. Using Cassandra as an example for the notebook deletion scenario, one could store all the notes of a notebook in a single Cassandra row, thus making the delete operation safe.

    It’s also worth mentioning that many of the eventually consistent NoSQL databases offer different consistent read and write operations.

  3. Durability: with just a few known exceptions, most NoSQL databases offer strong durability guarantees.

In conclusion, based only on the few details of the post, one could easily argument that a NoSQL database would fit the bill. But most of the time the reality behind is much different, making technical decisions a tad more complicated.

Original title and link: WhySQL: MySQL/InnoDB ACID Guarantees for Evernote (NoSQL database©myNoSQL)

via: http://blog.evernote.com/tech/2012/02/23/whysql/


Couchbase Server 2.0 Durability and Write Performance

Matt Ingenthron in a forum thread:

There is quite a bit of work ongoing to optimize some of these paths, and there are some features coming to allow you to specify that you want to block until a change is durable at either the replication or the disk IO level. I believe use that internally to the server for prioritization as well.

Right now we write things as fast as we can and we constantly scan.

I think I’ve seen this before. And I thought Couchbase Server 2.0 will be using CouchDB durable persistence engine. Couchbase Server 2.0 is still in developer preview so there’s time for this to change. But some clarifications would be welcome.

Original title and link: Couchbase Server 2.0 Durability and Write Performance (NoSQL database©myNoSQL)


Redis-Based MOM: Redis for Processing Payments

Santosh Kumar:

As you work more with Redis you soon start finding yourself building out workflows, i.e. small pieces of code that talk to each other via Redis. For someone familiar with a Service Oriented approach to building systems this should feel like deja-vu. Except, instead of using a protocol (HTTP, TCP, UDP, AMQP, ZeroMQ) we are going back to CS101 using a good old queue datastructure.

I think instead of workflows and Service Oriented, the right term is message services or MOM. Indeed, Redis’ blocking queues, the corresponding commands, and PUB/SUB support provides one with the basic building blocks of message services. But these are useful only if you don’t need specialized solutions (RabbitMQ, ActiveMQ, etc.) which will provide solutions for more complicated scenarios.

Later in the post, Kumar mentions implementing transactions for the scenario he picked, payment processing, using Redis’ multi-exec. If things haven’t changed radically lately, I’d underline the fact that Redis MULTI/EXEC/DISCARD is just a batched serialized executor and not ACID transactions.

Original title and link: Redis-Based MOM: Redis for Processing Payments (NoSQL database©myNoSQL)

via: http://santosh-log.heroku.com/2011/08/19/redis-for-processing-payments/


Comments on Urban Myths About NoSQL

Dan Weinreb comments on Michael Stonebraker’s Urban Myths about SQL (PDF) :

Dr. Michael Stonebraker recently posted a presentation entitled “Urban Myths about NoSQL”. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new “NoSQL” data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.

In fact, Michael Stonebraker bashes everything that is not his current product—this GigaOm interview is the latest example.

For now, I’m filing this away until VoltDB is sold.

Original title and link: Comments on Urban Myths About NoSQL (NoSQL database©myNoSQL)

via: http://danweinreb.org/blog/657


C in CAP != C in ACID

Alex Feinberg explains it again:

Just to expand on this, the “C” in CAP corresponds (roughly) to the “A” and “I” in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the “atomic multicast” problem which is a private instance of the general consensus problem.

In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That’s a loss of availability that’s not acceptable to many applications.

This means, that you can’t build a low-latency multi-master system that achieves the “A” and “I” guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with “floating masters” for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the “master-less” token range.

These are not binary “on/off” switches: see Yahoo’s PNUTS for a great “middle of the road” system. The paper has an intuitive example explaining the various consistency models.

Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a “table group”/”entity group” (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).

Alex and Sergio Bossa have previously had an interesting conversation on the topic of consistency from the ACID and CAP perspectives.

Original title and link: C in CAP != C in ACID (NoSQL databases © myNoSQL)


Consistency in the ACID and CAP Perspectives

Following a tweet from Nathan Marz:

The problem with relational databases is that they conflate the notions of data and views

Sergio Bossa and Alex Feinberg had a very interesting exchange about the meaning of consistency in the context of ACID and consistency in CAP theorem perspective.

Alex: @nathanmarz That’s reason for confusion between C in ACID and C in CAP: C in ACID means consistent view of data which can be done w/ quorums

Sergio: @strlen That’s a common misconception: ACID C just means your write operations do not break data constraints. It’s not about the view.

Alex: @sbtourist It also refers to not allowing reads of intermediate states i.e., serializability. W/o a quorum, an EC system could allow such.

Alex: @sbtourist On the other hand, an async system where node B is behind node A is still C in the ACID sense without being C in the CAP sense.

Sergio: @strlen Nope, that’s the isolation level (ACID I). Again, ACID C has a precise meaning and it’s about constraints.

Alex: @sbtourist Yeah, I think you are right: serializability would be “I”, with consensus (strongest form of CAP “C”) being about “A” (atomicity)

Sergio: @strlen That said, I strongly agree with you about ACID C being different than CAP C.

Alex: @sbtourist Yes. Both “consistent” and “atomic” mean diff things in DBs than they do elsewhere in systems (e.g., way that “ln -s” is atomic)

There have been many discussions about the loose definitions of the terms in the CAP theorem. Daniel Abadi exposed an interesting perspective on the subject proposing instead PACELC:

To me, CAP should really be PACELC – if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

Original title and link: Consistency in the ACID and CAP Perspectives (NoSQL databases © myNoSQL)


9 Things to Acknowledge about NoSQL Databases

Excellent list:

  1. Understand how ACID compares with BASE (Basically Available, Soft-state, Eventually Consistent)
  2. Understand persistence vs non-persistence, i.e., some NoSQL technologies are entirely in-memory data stores
  3. Recognize there are entirely different data models from traditional normalized tabular formats: Columnar (Cassandra) vs key/value (Memcached) vs document-oriented (CouchDB) vs graph oriented (Neo4j)
  4. Be ready to deal with no standard interface like JDBC/ODBC or standarized query language like SQL; every NoSQL tool has a different interface
  5. Architects: rewire your brain to the fact that web-scale/large-scale NoSQL systems are distributed across dozens to hundreds of servers and networks as opposed to a shared database system
  6. Get used to the possibly uncomfortable realization that you won’t know where data lives (most of the time)
  7. Get used to the fact that data may not always be consistent; ‘eventually consistent’ is one of the key elements of the BASE model
  8. Get used to the fact that data may not always be available
  9. Understand that some solutions are partition-tolerant and some are not

Print it out and distribute it among your colleagues.

Original title and link: 9 Things to Acknowledge about NoSQL Databases (NoSQL databases © myNoSQL)

via: http://www.evidentsoftware.com/nosql-basics-for-the-rdbms-savvy/