ACID: All content tagged as ACID in NoSQL databases and polyglot persistence
Thursday, 22 December 2011
Couchbase Server 2.0 Durability and Write Performance
Matt Ingenthron in a forum thread:
There is quite a bit of work ongoing to optimize some of these paths, and there are some features coming to allow you to specify that you want to block until a change is durable at either the replication or the disk IO level. I believe use that internally to the server for prioritization as well.
Right now we write things as fast as we can and we constantly scan.
I think I’ve seen this before. And I thought Couchbase Server 2.0 will be using CouchDB durable persistence engine. Couchbase Server 2.0 is still in developer preview so there’s time for this to change. But some clarifications would be welcome.
Original title and link: Couchbase Server 2.0 Durability and Write Performance (©myNoSQL)
Tuesday, 6 September 2011
Redis-Based MOM: Redis for Processing Payments
Santosh Kumar:
As you work more with Redis you soon start finding yourself building out workflows, i.e. small pieces of code that talk to each other via Redis. For someone familiar with a Service Oriented approach to building systems this should feel like deja-vu. Except, instead of using a protocol (HTTP, TCP, UDP, AMQP, ZeroMQ) we are going back to CS101 using a good old queue datastructure.
I think instead of workflows and Service Oriented, the right term is message services or MOM. Indeed, Redis’ blocking queues, the corresponding commands, and PUB/SUB support provides one with the basic building blocks of message services. But these are useful only if you don’t need specialized solutions (RabbitMQ, ActiveMQ, etc.) which will provide solutions for more complicated scenarios.
Later in the post, Kumar mentions implementing transactions for the scenario he picked, payment processing, using Redis’ multi-exec. If things haven’t changed radically lately, I’d underline the fact that Redis MULTI/EXEC/DISCARD is just a batched serialized executor and not ACID transactions.
Original title and link: Redis-Based MOM: Redis for Processing Payments (©myNoSQL)
via: http://santosh-log.heroku.com/2011/08/19/redis-for-processing-payments/
Friday, 8 July 2011
Comments on Urban Myths About NoSQL
Dan Weinreb comments on Michael Stonebraker’s Urban Myths about SQL (PDF) :
Dr. Michael Stonebraker recently posted a presentation entitled “Urban Myths about NoSQL”. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new “NoSQL” data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.
In fact, Michael Stonebraker bashes everything that is not his current product—this GigaOm interview is the latest example.
For now, I’m filing this away until VoltDB is sold.
Original title and link: Comments on Urban Myths About NoSQL (©myNoSQL)
Thursday, 12 May 2011
C in CAP != C in ACID
Alex Feinberg explains it again:
Just to expand on this, the “C” in CAP corresponds (roughly) to the “A” and “I” in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the “atomic multicast” problem which is a private instance of the general consensus problem.
In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That’s a loss of availability that’s not acceptable to many applications.
This means, that you can’t build a low-latency multi-master system that achieves the “A” and “I” guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with “floating masters” for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the “master-less” token range.
These are not binary “on/off” switches: see Yahoo’s PNUTS for a great “middle of the road” system. The paper has an intuitive example explaining the various consistency models.
Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a “table group”/”entity group” (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).
Alex and Sergio Bossa have previously had an interesting conversation on the topic of consistency from the ACID and CAP perspectives.
Original title and link: C in CAP != C in ACID (NoSQL databases © myNoSQL)
Wednesday, 6 April 2011
Consistency in the ACID and CAP Perspectives
Following a tweet from Nathan Marz:
The problem with relational databases is that they conflate the notions of data and views
Sergio Bossa and Alex Feinberg had a very interesting exchange about the meaning of consistency in the context of ACID and consistency in CAP theorem perspective.
Alex: @nathanmarz That’s reason for confusion between C in ACID and C in CAP: C in ACID means consistent view of data which can be done w/ quorums
Sergio: @strlen That’s a common misconception: ACID C just means your write operations do not break data constraints. It’s not about the view.
Alex: @sbtourist It also refers to not allowing reads of intermediate states i.e., serializability. W/o a quorum, an EC system could allow such.
Alex: @sbtourist On the other hand, an async system where node B is behind node A is still C in the ACID sense without being C in the CAP sense.
Sergio: @strlen Nope, that’s the isolation level (ACID I). Again, ACID C has a precise meaning and it’s about constraints.
Alex: @sbtourist Yeah, I think you are right: serializability would be “I”, with consensus (strongest form of CAP “C”) being about “A” (atomicity)
Sergio: @strlen That said, I strongly agree with you about ACID C being different than CAP C.
Alex: @sbtourist Yes. Both “consistent” and “atomic” mean diff things in DBs than they do elsewhere in systems (e.g., way that “ln -s” is atomic)
There have been many discussions about the loose definitions of the terms in the CAP theorem. Daniel Abadi exposed an interesting perspective on the subject proposing instead PACELC:
To me, CAP should really be PACELC – if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?
Original title and link: Consistency in the ACID and CAP Perspectives (NoSQL databases © myNoSQL)
Wednesday, 2 March 2011
9 Things to Acknowledge about NoSQL Databases
Excellent list:
- Understand how ACID compares with BASE (Basically Available, Soft-state, Eventually Consistent)
- Understand persistence vs non-persistence, i.e., some NoSQL technologies are entirely in-memory data stores
- Recognize there are entirely different data models from traditional normalized tabular formats: Columnar (Cassandra) vs key/value (Memcached) vs document-oriented (CouchDB) vs graph oriented (Neo4j)
- Be ready to deal with no standard interface like JDBC/ODBC or standarized query language like SQL; every NoSQL tool has a different interface
- Architects: rewire your brain to the fact that web-scale/large-scale NoSQL systems are distributed across dozens to hundreds of servers and networks as opposed to a shared database system
- Get used to the possibly uncomfortable realization that you won’t know where data lives (most of the time)
- Get used to the fact that data may not always be consistent; ‘eventually consistent’ is one of the key elements of the BASE model
- Get used to the fact that data may not always be available
- Understand that some solutions are partition-tolerant and some are not
Print it out and distribute it among your colleagues.
Original title and link: 9 Things to Acknowledge about NoSQL Databases (NoSQL databases © myNoSQL)
via: http://www.evidentsoftware.com/nosql-basics-for-the-rdbms-savvy/
Wednesday, 24 November 2010
Neo4j Transactions and JTA
I’ve already told you about ☞ Chris Gioran’s series on Neo4j internals. Now, he is working on providing support for pluggable JTA compliant transaction managers in Neo4j and details about the current status can be found in his ☞ last post. Anyways, before that he started with a deep dive into the Neo4j transactions and that resulted in 4 (quite long) articles:
- ☞ Write Ahead Log and Deadlock Detection
In this post I will write a bit about two different components that can be explained somewhat in isolation and upon which higher level components are build. The first is the Write Ahead Log (WAL) and the other is an implementation of a Wait-For graph that is used to detect deadlocks in Neo before they happen.
- ☞ XaResources, Transactions and TransactionManagers
This time we will look into a higher level than last time, discussing the Transaction class and its implementations, Commands and TransactionManagers, touching a bit first on the subject of XAResources.
- ☞ Xa roundup and consistency
This post covers Data sources and XA connections, management of XaResources, and putting all these together.
- ☞ A complete run and a conclusion
Here I will try to follow a path from the initialization of the db engine and through the begin() of a transaction and creation of a Node to the commit and shutdown.
As I’ve estimated in my first mention of this series on Neo4j internals, Chris ends up giving up writing and starting to hack Neo4j:
Truth been told, I have reached a point where I no longer want to write about Neo but instead I want to start hacking it
Original title and link: Neo4j Transactions and JTA (NoSQL databases © myNoSQL)
Thursday, 23 September 2010
On NoSQL Databases and ACID
Dan Weinreb:
The “NoSQL” systems are ACID, as long as you accept that a transaction can only perform one operation, in the sense that the only thing that gets in the way of being ACID is when there are network partitions and the system is called upon to perform operations while the partition is still there.
That’s why they are said to be ☞ BASE (basically available, soft state, eventually consistent)
Original title and link for this post: On NoSQL Databases and ACID (published on the NoSQL blog: myNoSQL)
via: http://danweinreb.org/blog/nosql-storage-systems-never-violate-acid-never-well-hardly-ever
Thursday, 2 September 2010
Fixing ACID without going NoSQL
Daniel Abadi and Alexander Thomson:
In our opinion, the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues. Responsibility for atomicity, consistency and isolation is simply being pushed onto the developer. What is really needed is a way for ACID systems to scale on shared-nothing architectures, and that is what we address in the research paper that we will present at VLDB this month. Our view (and yes, this may seem counterintuitive at first), is that the problem with ACID is not that its guarantees are too strong (and that therefore scaling these guarantees in a shared-nothing cluster of machines is too hard), but rather that its guarantees are too weak, and that this weakness is hindering scalability.
No comments until I think this through.
Paper available ☞ here.
Original title and link for this post: Fixing ACID without going NoSQL (published on the NoSQL blog: myNoSQL)
via: http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
Monday, 2 August 2010
Transactions in Distributed Systems
In case you didn’t know it already: banks are not really using ACID transaction on all operations that you would have thought they are. If that’s a surprise then let me tell you that ☞ not even Starbucks is using two-phase commit.
Distributed systems are changing the rules of the game:
the more distributed and decentralized a system is, the less likely it is that we can use transactions that span the entire system. That is certainly true for the banking system, apparently also true for systems inside banks, and in many other places. ACID transactions were invented for the mainframe, the world’s most centralized computing construct. But computing is not “one mainframe” any more I’m afraid as it was in the sixties.
via: http://infogrid.org/blog/2010/08/acid-transactions-are-overrated/
Wednesday, 6 January 2010
Notes on Distributed Programming and CAP
Firstly, an interesting presentation by Paulo Gaspar (@paulogaspar7) on ☞ Distributed programming and data consistency
Key take-aways:
The network falacies:
- The network is reliable
- Latency is zero
- Bandwith is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogenous
-
CAP Trade-offs:
- CA without P: Databases providing distributed transactions can only do it while their network is up
- CP without A: While there is a partition, transactions to an ACID db may be blocked until the partition heals
- AP without C: Caching provides client-server partition resilience by replicating data, even if the partition prevents verifying if a replica is fresh
Another interesting post on this topic, is ☞ The CAP Theorem Distilled by Sid Anand (@r39132). Under the assumption that “any system needs to support ‘P’” (nb I am not sure why the article is limiting the analysis to this case only), the article compares ‘A’ vs ‘C’ in CAP:
If you choose ‘C’, your system might implement 2-phase commit (a.k.a 2PC) . […]
On the other hand, if you opt for an AP system, you are opening the door to potential data inconsistencies. […] AP systems can get quite complicated (relative to CP systems)
In two subsequent articles, Sid is explaining “eventual consistency” for ☞ non-techies and for ☞ techies . I liked the way Paulo visually represented inconsistency across time in his slides: