NoSQL Database: All content tagged as NoSQL Database in NoSQL databases and polyglot persistence
Wednesday, 6 June 2012
Popularizing the NoSQL Movement
Indeed, “big data” is mostly an outgrowth of the huge amounts of information that were collected and needed to be used by the big Internet services. Most the big services quickly found out that the amount of data they needed to store and the frequency in which they need to update and access it could not be met by traditional relational database management systems
Even if I’m writing about NoSQL databases and adjacent technologies for almost 3 years, I don’t think I’ve authored a single generic post meant to popularize the NoSQL movement. By now, I understand the recipe: get all terms buzzwords, gather all your previous posts about companies that might be somehow related, and glue everything together with things that not necessary make sense, but sound good . I still don’t think I could write one though.
NB: After reading this article, could you please teach me how should I spell NoSQL: noSQL, !SQL, NoSQL, no-SQL, or not-only-SQL.
Original title and link: Popularizing the NoSQL Movement (©myNoSQL)
Monday, 2 April 2012
The Total Cost of (Non) Ownership of a NoSQL Database Service
The Amazon team released a whitepaper comparing the total cost of ownership for 3 scenarios:
- on-premise NoSQL database
- NoSQL database deployed on Amazon EC2 and Amazon EBS
- Amazon DynamoDB

As you can imagine DynamoDB comes out as the most cost-effective solution (79% more effective than on-premise NoSQL database and 61% more cost-effective than AWS hosted NoSQL database). Read or download the paper after the break.
Wednesday, 1 February 2012
5 Requirements for Enterprise NoSQL databases
Emil Eifrem enumerates 5 requirements for adopting NoSQL databases in the enterprise environment:
- Ability to Handle Today’s Complex and Connected Data
- Simplify the Development of Applications Using Complex and Connected Data
- Support for End-to-End Transactions
- Enterprise-grade Durability so that Data is Never Lost
- Java Still Reigns for Enterprise Development
I think Emil Eifrem has left out a couple of other critical aspects, but I agree with 4 and 1/2 of those on his list.
Original title and link: 5 Requirements for Enterprise NoSQL databases (©myNoSQL)
Monday, 19 December 2011
Attacking NoSQL and Node.js: Server-Side JavaScript Injection (SSJS)
Jeff Darcy has written a while back about the (lack of) security in NoSQL database. Unfortunately things haven’t changed much and if you check the NoSQL + Node.js applications I’ve posted lately you’ll notice that some of them are completely ignoring security.
And there are some people realizing the risks and starting to express their concerns:
Playing with MongoDB lately, I’m getting scared. Because I’m seeing some really bad practices out there. Seeing it in live code. In tutorials.
Bryan Sullivan (Senior Security Researcher, Adobe Secure Software Engineering Team) has published a paper (PDF) explaining some of the possible server-side JavaScript injection attacks and the risks the apps and the data are exposed to. Teaser: he can do pretty much everything.
It should be noted that exploitation of server-side JavaScript injection vulnerabilities is more like that of SQL injection than of cross-site scripting. SSJS injection does not require any social engineering of an intermediate victim user the way that reflected XSS or DOM-based XSS do; instead, the attacker can attack the application directly with arbitrarily created HTTP requests.
Because of this, defenses against SSJS injection are also similar to SQL injection defenses:
- Avoid creating “ad-hoc” JavaScript commands by concatenating script with user input.
- Validate user input used in SSJS commands with regular expressions.
- Avoid use of the JavaScript eval command. In particular, when parsing JSON input, use a safer alternative such as JSON.parse.
Remember there’s no such thing as security through obscurity.
Original title and link: Attacking NoSQL and Node.js: Server-Side JavaScript Injection (SSJS) (©myNoSQL)
Monday, 12 December 2011
The Wonderful Wizard of Oz Through a Polyglot Persistence Glass
Adrian Giordani:
Relational databases are the Yellow Brick Road of managing large structured data globally. […] In the 1939 film of The Wizard of Oz, a red brick road is intertwined with the yellow one. Similarly, a new type of database might soon offer a different path: NoSQL, or Not-Only-SQL, first coined in 2008, is promising a faster and more scalable database architecture, at least for some cases.
But who’s the Tin Woodman?
Original title and link: The Wonderful Wizard of Oz Through a Polyglot Persistence Glass (©myNoSQL)
via: http://www.isgtw.org/feature/following-red-brick-road-data-management
Enterprise Caches Versus Data Grids Versus NoSQL Databases
RedHat/JBoss Manik Surtani:
[…] If you want to compare distributed systems, both data grids and NoSQL have kind of come from different starting points, if you will. They solve different problems, but where they stand today they’ve kind of converged. Data grids have been primarily in-memory but now they spill off onto disk and so on and so forth and they’ve added in-query and mapreduce onto it while NoSQL have primarily been on disk, but now cache stuff in-memory anyway for performance. They are starting to look the same now, or are very similar.
One big difference though that I see between data grids and NoSQL, something that still exists today, is how you actually interact with these systems. Data grids tend to be in VM, they tend to be embedded, you tend to launch a Java or JVM program, you tend to connect to a data grid API and you work with it whereas NoSQL tends to be a little bit more client server, a bit more like old-fashion databases where you open a socket to your NoSQL database or your NoSQL grid, if you will, and start talking to it. That’s the biggest difference I see today, but even that will eventually go away.
They seem to converge, but:
- spilling off to disk is not equivalent to optimized disk access
- distributed, sometimes even transactional caches are not equivalent with single node caches
Original title and link: Enterprise Caches Versus Data Grids Versus NoSQL Databases (©myNoSQL)
Tuesday, 6 December 2011
NoSQL Databases Best Practices and Emerging Trends
Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:
InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?
Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.
If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.
If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.
Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.
InfoQ: What are the emerging trends in combining the NoSQL data stores?
Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.
This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.
Original title and link: NoSQL Databases Best Practices and Emerging Trends (©myNoSQL)
Friday, 2 December 2011
Distributed Caches, NoSQL Databases, and RDBMS
Greg Luck[1] following up on his article Ehcache: Distributed Cache or NoSQL Store? talks about architectural differences between distributed caches, NoSQL database, and RDBMS and where distributed caches fit:
NoSQL and RDBMS are generally on disk. Disks are mechanical devices and exhibit large latencies due to seek time as the head moves to the right track and read or write times dependent on the RPM of the disk platter. NoSQL tends to optimise disk use, for example, by only appending to logs with the disk head in place and occasionally flushing to disk. By contrast, caches are principally in memory. […] With RDBMS a cache is added to avoid these scale out difficulties. For NoSQL, scale out is built-in, so the cache will get used when lower latencies are required.
-
Greg Luck: Founder and CTO, Ehcache ↩
Original title and link: Distributed Caches, NoSQL Databases, and RDBMS (©myNoSQL)
via: http://www.infoq.com/news/2011/11/distributed-cache-nosql-data-sto
Monday, 28 November 2011
How to Implement an IMAP Server on Top of a CouchDB/NoSQL Data Store?
Interesting question on SO:
To summarize my objective here, I am really just looking for a simple, opensource method which allows me to create and maintain a (preferably noSQL db) backup/archieve of one/more remote IMAP email accounts on a per user basis and sync each individual users email accounts using a simple, low cost solution which easily scales out, consumes server resources in an efficient maner with the ADDED ABILITY that each user needs to be able to connect to his central email archive by simply addingba new imap account to his existing email client using an imap server, username and password provided through this archive server/setup.
This reminded me of a GSOC project to design and implement a distributed mailbox on top of Hadoop HDFS as part of the Apache James project. The project description can be found on this JIRA ticket and more details here:
We need to implement mailbox storage as a distributed system on top of Hadoop HDFS. The James mailbox API will be used. A first step is to design how to interact with Hadoop (native api, gora incubator at apache,…) and deal with specific performance questions related to mail loading/parsing in a distributed system (use map/reduce or not, use existing local lucene indexes for search,…). The second step is to implement the HDFS mailbox (maildir mailbox is similar because is stores mails as a file and can be an inspiration). A single James server will still be deployed because we don’t have any distributed UID generation.
According to the last comments on the ticket, this project was completed Ioan Eugen Stan under Eric Charles’ mentorship.
Original title and link: How to Implement an IMAP Server on Top of a CouchDB/NoSQL Data Store? (©myNoSQL)
Monday, 31 October 2011
IBM DB2 to Include NoSQL Features
It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.
Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:
So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.
Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.
When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:
Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”
Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.
Original title and link: IBM DB2 to Include NoSQL Features (©myNoSQL)
Monday, 3 October 2011
The Oracle NoSQL Database and Big Data Appliance
There’s been a lot of speculation about the announcements coming from Oracle’s OpenWorld event. A first part was revealed during the keynote in the form of an in-memory analytics appliance called Exalytics [2]. But there’s talk about a Big Data Appliance and an Oracle NoSQL database.
Here’re my predictions[1]
-
Oracle became very aggressive in selling products based on hardware, software, and services. So they’ll announce a Hadoop appliance integrated with an existing Oracle product. It could be either the Oracle Exadata or even the newly announced Exalytics.
This appliance will place Oracle in competition with all other Hadoop appliance sellers: EMC, NetApp, IBM. Also these days most of the analytics databases try to integrate with Hadoop.
-
Oracle already has a couple of non-relational solutions in their portfolio: BerkleyDB, TimesTen, Coherence. And they’ve already started to test the NoSQL market by announcing the MySQL and MySQL Cluster NoSQL hybrid systems.
I don’t expect Oracle NoSQL database to be a new product. Just a rebranding or repackaging of one of the above mentioned ones. Probably the TimesTen.
-
Oracle will invest more into integrating its line of products with Hadoop. Having both a Hadoop and an in-memory analytics appliance will make them very competitive in this space.
-
Oracle will extend the support for NoSQLish interfaces (memcached) to its other database products.
What are your predictions?
Original title and link: The Oracle NoSQL Database and Big Data Appliance (©myNoSQL)
Thursday, 7 July 2011
Rethink Your Data Model
Karl Seguin[1]:
Fundamentally rethinking how you model data is actually a really fun thing to do. Modeling data for a relational database is such second nature, that you constantly have to stop your brain from doing what comes naturally. Why would you want to do that, you might ask? Because we’ve been modeling more or less the same way for decades, it’s time we challenged ourselves, experimented and learned.
Polyglot programming has brought us back the beauty of learning, experimenting, and using any programming langauge. Polyglot persistence is the equivalent in the data space: gaining back the option to learn, experiment, and use the best data models, storage engines, and distribution models.
-
Karl Seguin is the author of the free Little MongoDB book ↩
Original title and link: Rethink Your Data Model (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling