document database: All content tagged as document database in NoSQL databases and polyglot persistence
Thursday, 21 February 2013
The State of CouchDB With Comments
With the whole confusion surrounding and the lack of energy in the Couch[addsuffixhere] world, my attention has slowly shifted away. Thus, it was only last night that I’ve read Jan Lehnardt’s “State of CouchDB” post.
Couple of notes:
- Jan Lehnardt is out of Couchbase (as an employee) and plans to focus again (a part of his time) on Apache CouchDB
- he’s the first person directly related to CouchDB that finally accepts publicly the whole confusion created around CouchDB and the companies connected in some way or another to it. I was really, really tired repeating this for the last 2 years.
- “people who really get to know CouchDB are extremely passionate about it” — I read this as passive aggressive style. According to my data the people interested in CouchDB were fewer and fewer by the day.
- there are plans for what’s coming next in CouchDB. The post gives a short list of 4 things, but the real ones are in the gist I’ve linked to earlier
- BigCouch merging has been mentioned for so long that right now it feels like “waiting for the unicorn”
- Jan Lehnardt mentions (and is excited) about the alphabet of _ouchDB projects. I’ll say it one more time: they’re probably cool, but long term they’ll perpetuate the confusion. Unfortunately there’s nothing much to be done now.
- I’m glad to read a state of a union from a person that has been involved for so long with CouchDB. But in the world of open source, it’s only the facts that matter. Sometimes reviving a project or regaining users is more difficult than starting from scratch.
- “We had a hard year, lost our traction, and we still came out on top.” Nope.
Original title and link: The State of CouchDB With Comments (©myNoSQL)
via: http://writing.jan.io/2013/02/01/the-state-of-couchdb.html
Wednesday, 20 February 2013
CouchDB Future Feature List
I’m saving the current list of CouchDB Future Features so I can check it back in 6 months1:
-
I am not saying this as a mild form of “filing for future claim chowder”. ↩
Original title and link: CouchDB Future Feature List (©myNoSQL)
Monday, 18 February 2013
CouchDB In-Browser JavaScript Debugger
Interesting project and if it works it can prove to be very useful considering mostly everything in CouchDB is JavaScript, at least from a developer’s perspective:
Original title and link: CouchDB In-Browser JavaScript Debugger (©myNoSQL)
Wednesday, 13 February 2013
NoSQL on MySQL: Stating the Obvious
Matthew Aslett about Couchbase’s and DataStax’s reactions to Oracle’s announcement of MySQL support of NoSQL API:
Sure, Couchbase and DataStax laid it on a bit thick, but these are corporate blog posts – it goes with the territory.
I’ve already linked and commented about these: Couchbase’s reaction and DataStax’s reaction. What I didn’t know—more accurately I should probably write “I hoped”—is that this sort of reactions come with the “corporate” badge. But I’ll keep my hope considering the exhaustive list of reactions from other NoSQL companies.
Original title and link: NoSQL on MySQL: Stating the Obvious (©myNoSQL)
Reactions to MySQL 5.6: Couchbase
Bob Wiederhold (Couchbase CEO) about MySQL 5.6, their use of the NoSQL term, and the PR message touting the new version as the solution “combining the best of both worlds”:
What we see is a whole new wave of applications that have very different requirements than applications had just a few years ago. More often than not they are cloud-based, need to support a huge and dynamically changing number of users, need to store huge amounts of data, and need a highly flexible data model that allows them to adjust to rapidly changing data capture requirements and process lots of semi-structured and unstructured data. The fundamentally different architectural decisions embedded in NoSQL technologies – along with the easy scalability, consistently high performance, and flexible data model advantages (along with all the other tradeoffs) NoSQL provides – are turning out to be a better fit for an increasing number of these applications.
That doesn’t mean MySQL (or relational databases) will go away or won’t play a significant role in the database industry in the future.
Bob Wiederhold is also interested in how Oracle positions their products in terms of NoSQL:
As a side note it’s curious that the MySQL team seems out of step with other parts of Oracle. While the MySQL team seems to be convinced MySQL can do it all, Oracle’s NoSQL team seems to feel differently and is busily trying to catch up to NoSQL leaders like Couchbase, MongoDB, and Cassandra with their own NoSQL product. If relational technology is a one size fits all technology, why is Oracle itself making such a big investment in developing its own NoSQL product?
My supposition, expressed in the post MySQL 5.6 - What’s new, is that NoSQL is just a critical checkbox on the marketing and sales departments. Oracle NoSQL database and its precursor BerkleyDB seem to silently live inside the giant.
Original title and link: Reactions to MySQL 5.6: Couchbase (©myNoSQL)
via: http://blog.couchbase.com/why-mysql-56-no-real-threat-nosql
Tuesday, 12 February 2013
A Human-Readable Jackrabbit Persistence Manager Prototype for Orientdb
Jackrabbit still has a very special place in my heart. I’ve fought it many times, sometimes losing, most of the time winning. But for over 7 years now, it is still the main storage engine serving the content of InfoQ. So this OrientDB engine for Jackrabbit by Thomas Kratz caught my attention:
This has some limitations, as jackrabbit will still access only one node at a time, being able to traverse the graph at the storage level is simply not intended by the whole api. But it works, it’s readable, can be modified at the db level easily.
Original title and link: A Human-Readable Jackrabbit Persistence Manager Prototype for Orientdb (©myNoSQL)
via: http://thomaskratz.blogspot.de/2013/01/a-human-readable-jackrabbit-persistence.html
Monday, 11 February 2013
NoSQL Bug Fix Releases: Redis 2.6.10 and RavenDB 2.01
The RavenDB team has released mostly a bug fix new version RavenDB 2.01. The change log is here.
Redis also has a new bug fix release: 2.6.10 including non-critical fixes and 5 small improvements. Change log is here
Original title and link: NoSQL Bug Fix Releases: Redis 2.6.10 and RavenDB 2.01 (©myNoSQL)
Using Hadoop Pig With MongoDB
In this post, we’ll see how to install MongoDB support for Pig and we’ll illustrate it with an example where we join 2 MongoDB collections with Pig and store the result in a new collection.
Color me very biased this time, but all these (especially the JOIN) can be done directly using RethinkDB.
Original title and link: Using Hadoop Pig With MongoDB (©myNoSQL)
via: http://chimpler.wordpress.com/2013/02/07/using-hadoop-pig-with-mongodb/
Friday, 8 February 2013
MongoDB 2.4 Highlights
MongoDB 2.4 is just around the corner:
From Mike Friedman’s Roadmap slidedeck.
Original title and link: MongoDB 2.4 Highlights (©myNoSQL)
Thursday, 7 February 2013
MongoDB Is Still Broken by Design 5-0
My score after the first period was 4-1. But Emin Gün Sirer contested the 1 in the follow up post to 10gen’s reply:
Until recently, MongoDB did not talk about requestStart() and requestDone() in any context except when talking about how to ensure a very weak consistency requirement. Namely, if you don’t use this pair of operations, then a write to the database followed by a read from the database, by the same client, can return old values. So, I write 42 for key k with a WriteConcern.SAFE, read key k, and get some other number, because the Mongo driver can, by default, very well send the first request to one node over one connection, and the second one to another, over another connection. So requestStart() and requestDone() were billed as a mechanism to avoid that scenario; I saw no mention that they were required for correctness in multithreaded settings. I bet there is plenty of multithreaded code that does not follow that pattern. Such code is broken; if you’re a Mongo user, it’d be a good idea to check if you ever use getLastError without a bracketing requestStart() and Done().
5-0.
Original title and link: MongoDB Is Still Broken by Design 5-0 (©myNoSQL)
via: http://hackingdistributed.com/2013/02/07/10gen-response/#id2
10gen: MongoDB’s Fault Tolerance Is Not Broken…
Sitting comfortably? Check. Popcorn? Check. Let’s press play.
In an interview for InfoQ, 10gen’s Jared Rosoff replies to Emin Gün Sirer: “Broken by Design: MongoDB Fault Tolerance“.
-
MongoDB lies when it says that a write has succeeded
JR: “[…] Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.”
The key word here is today. As clearly explained by Sirer, the behavior he described was in all versions of MongoDB and all the drivers until the 2.2 release and the drivers update in Nov.2012.
Basically the “fire-and-forget” behavior (nb: even this description is not accurate; a better one would be “load-and-forget”) has been the default for almost 3 years. With the 2.2 release and the corresponding drivers update it was changed to “Receipt acknowledged”. But new default acknowledges that the data was received on the server, but not that it was written anywhere. If you want your data to exist on multiple machines or be on a disk you need to use different settings.
-
Using getLastError slows down write operations.
JR: “GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns. Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.”
The problem here is not that the operation slows down because it has to wait for the acknowledgement. The real problem is that getLastError requires an extra network roundtrip. Not to mention that your old code was probably polluted by all these extra calls.
-
getLastError doesn’t work pipelined
JR: “[…] For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. […]”
Read the above.
-
getLastError doesn’t work multi-threaded
JR: ” Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. […]”
If I’m reading this correctly, it seems like Sirer’ hypothesis was that connections can be shared acrossed threads. I have to agree that many drivers do not provide thread-safe connections. So, linking getLastError behavior to the current connection seems OK.
-
Write Concerns are broken
JR: “As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.”
Let’s look at what Sirer wrote:
[…] one could use
WriteConcern.SAFE,FSYNC_SAFEorREPLICAS_SAFEfor the insert operation [2]. There are 13 different concern levels, 8 of which seem to be distinct and presumably the remaining 5 are just kind of there in case you mistype one of the other ones. WriteConcern is at least well-named: it corresponds to “how concerned would you be if we lost your data?” and the potential answers are “not at all!”, “be my guest”, and “well, look like you made an effort, but it’s ok if you drop it.” Specifically, that’s three different kinds of SAFE, but none of them give you what you want: (1) SAFE means acknowledged by one replica but not written to disk, so a node failure can obliterate that data, (2) FSYNC_SAFE means written to a single disk, so a single disk crash can obliterate that data, and (3) REPLICAS_SAFE means it has been written to two replicas, but there is no guarantee that you will be able to retrieve it later.If you want a different explanation: even if there are 13 different WriteConcern types available, there is none that offers the option of having the data written to disk on more replicas.
The period is over and in my mind the score is clearly 4-1. But I know that this is not a real game and there will be some concluding that I’m not getting it. I’m OK living with that though.
Original title and link: 10gen: MongoDB’s Fault Tolerance Is Not Broken… (©myNoSQL)
Tuesday, 5 February 2013
NoSQL Hosting: Redis and RavenDB
More service providers for hosted NoSQL solutions:
- Garantia Data to Offer its Redis & Memcached Hosting Services in Europe: “In-memory NoSQL Company extends Redis Cloud and Memcached Cloud to European Amazon Web Services users.”
- CloudBird Launch, now with RavenDB 2.0 support - The CloudBird Blog: “Today we’re cracking open the Champagne as we peel off the beta label and officially welcome production databases to our RavenDB hosting service. What’s more we’re also introducing support for the Raven 2.0 RTM.”
It’s not anymore just “a database for every taste”, but steadly becoming more of “a database for every taste served from anywhere you like”,
Original title and link: NoSQL Hosting: Redis and RavenDB (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
