ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

NoSQL Database: All content tagged as NoSQL Database in NoSQL databases and polyglot persistence

Rethink Your Data Model

Karl Seguin[1]:

Fundamentally rethinking how you model data is actually a really fun thing to do. Modeling data for a relational database is such second nature, that you constantly have to stop your brain from doing what comes naturally. Why would you want to do that, you might ask? Because we’ve been modeling more or less the same way for decades, it’s time we challenged ourselves, experimented and learned.

Polyglot programming has brought us back the beauty of learning, experimenting, and using any programming langauge. Polyglot persistence is the equivalent in the data space: gaining back the option to learn, experiment, and use the best data models, storage engines, and distribution models.


  1. Karl Seguin is the author of the free Little MongoDB book  

Original title and link: Rethink Your Data Model (NoSQL database©myNoSQL)

via: http://openmymind.net/2011/7/5/Rethink-your-Data-Model


Zynga, Data Centers, Polyglot Persistence, and Big Data

Cadir Lee (CTO Zynga) quoted in a VentureBeat post:

It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.

Couple of thoughts:

  1. Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
  2. Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
  3. My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
  4. Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
  5. Zynga also operates a large MySQL cluster.
  6. Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.

PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.

Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data (NoSQL database©myNoSQL)


OrientDB - Pure Java NoSQL Datastore

I analysed all the popular ones but none fitted my requirements. I had one criteria for selecting a database: I must be able to code in Java. Most available systems were non-Java based which would be a significant issue for a one man project. Even if they had Java interface, the installation, setup, etc. were a tedious process. Having a database developed purely in Java has many advantages:

  1. Easy packaging with other applications
  2. Easy to install and run
  3. Can be embedded
  4. Can run in same or different VM
  5. Easy to debug
  6. Easy to test

After much searching, I came across OrientDB.

These are less storage requirements than programming and deployment requirements. Judging by the above points alone quite a few other databases would make the list.

Original title and link: OrientDB - Pure Java NoSQL Datastore (NoSQL database©myNoSQL)

via: http://myjavaexp.blogspot.com/2011/06/orientdb-pure-java-nosql-datastore.html


NoSQL Databases: What, Why, and When

Lorenzo Alberton with an overview of the NoSQL landscape:

NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).


Database & Integration: The SDTimes Top 100 for 2011

The only NoSQL name making the list is Couchbase which is still working on merging their CouchDB and Membase products.

Database and Integration SD Times

Hadoop is included in the cloud category making me think of how valuable the SDTimes top 100 is.

Original title and link: Database & Integration: The SDTimes Top 100 for 2011 (NoSQL databases © myNoSQL)

via: http://www.sdtimes.com/content/article.aspx?ArticleID=35592&page=3


Cloud Foundry, NoSQL Databases, and Polyglot Persistence

VMWare’s Cloud Foundry has the potential to become the preferred PaaS solution. It bundles together a set of services that it took years for other PaaS providers (Google App Engine, Microsoft Azure) to offer. And it seems that Cloud Foundry has much less (or none at all) vendor lock in[1].

From a storage perspective, Cloud Foundry is encouraging polyglot persistence right from the start offering access to a relational database (MySQL), a super-fast smart key-value store (Redis), and a popular document database (MongoDB). The only bit missing is a graph database[2].

I think the first graph database to get there will see an immediate bump in its adoption.


  1. These comments are based on what I’ve read about VMWare CloudFoundry as I haven’t received (yet) my invitation.  

  2. I don’t think wide-column databases (Cassandra, HBase) are fit for PaaS  

Original title and link: Cloud Foundry, NoSQL Databases, and Polyglot Persistence (NoSQL databases © myNoSQL)


Comparing NoSQL Databases with Object-Oriented Databases

Don White1 in an interview over odbms.org:

The new data systems are very data centric and are not trying to facilitate the melding of data and behavior. These new storage systems present a specific model abstractions and provide their own specific storage structure. In some cases they offer schema flexibility, but it is basically used to just manage data and not for building sophisticated data structures with type specific behavior.

Decoupling data from behavior allows both to evolve separately. Or differently put, it allows one to outlive the other.

Another intersting quote from the interview:

[…] why would you want to store data differently than how you intend to use it? I guess the simple answer is when you don’t know how you are going to use your data, so if you don’t know how you are going to use it then why is any data store abstraction better than another?

I guess this explains the 30 years dominance of relational databases. Not in the sense that we never knew how to use data, but rather that we always wanted to make sure we can use it in various ways.

And that explains also the direction NoSQL databases took:

To generalize it appears the newer stores make different compromises in the management of the data to suit their intended audience. In other words they are not developing a general purpose database solution so they are willing to make tradeoffs that traditional database products would/should/could not make. […] They do provide an abstraction for data storage and processing capabilities that leverage the idiosyncrasies of their chosen implementation data structures and/or relaxations in strictness of the transaction model to try to make gains in processing.


  1. Don White: senior development manager at Progress Software Inc., responsible for all feature development and engineering support for ObjectStore  

Original title and link: Comparing NoSQL Databases with Object-Oriented Databases (NoSQL databases © myNoSQL)


NoSQL Databases in Grails and Spring Data

When checking the ☞ GitHub repository for Grails/GORM entities support for Riak MapReduce I have noticed multiple NoSQL integration projects[1] :

Grails/GORM and NoSQL databases:

  • Gemfire
  • JCR
  • MongoDB
  • Redis
  • Riak

Spring Data and NoSQL databases:

  • AppEngine
  • Cassandra
  • Gemfire
  • JCR
  • MongoDB
  • Redis
  • Riak

What seems to be missing is the Neo4j support in Spring Data, but maybe there’s a different repo for it.

Finally lots of love for NoSQL database in the Java land.


  1. At this time I’m not sure about each of these project status.  ()

Original title and link: NoSQL Databases in Grails and Spring Data (NoSQL databases © myNoSQL)


SQL and NoSQL In the Cloud

Options of running RDBMSs in the cloud:

  • Install and Manage – in this “traditional” model the developer or sysadmin selects their DBMS, creates instances in their cloud, installs it, and is then responsible for all administration tasks (backups, clustering, snapshots, tuning, and recovering from a disaster. […]
  • Use a Cloud-Managed DBaaS Instance – in this model the cloud provider offers a DBMS service that developers just use. All physical administration tasks (backup, recovery, log management, etc.) are performed by the cloud provider and the developer just needs to worry about structural tuning issues (indices, tables, query optimization, etc). […]
  • Use an External Cloud-Agnostic DBaaS Solution – this is very much like the cloud-based DBaaS, but has a value of cloud-independence – at least in theory. In the long run you might expect to be able to use an independent DBaaS to provide multi-cloud availability and continuous operations in the event of a cloud failure.

I guess these are equivalent to applying Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Database-as-a-Service (DaaS) (nb: this can be seen as a more specialized PaaS) models to persistency. And the same approach applies to NoSQL databases, as these models are orthogonal the the persistency problem.

RDBMS and NoSQL database in the cloud
cloudbzz.com

Original title and link: SQL and NoSQL In the Cloud (NoSQL databases © myNoSQL)

via: http://www.cloudbzz.com/sql-in-the-cloud/


Why Comcast is Interested in NoSQL databases

Fantastic presentation from Jon Moore on why Comcast (enterprises?) is not interested in NoSQL databases:

Summarizing:

  • it is not for massive scale
  • it is not for high performance
  • it is not for handling Big Data
  • NoSQL databases still carry risks and require more ramp-up and investment
  • it is for the distributed nature of NoSQL databases, including multi-data center support
  • it is for operational scalability and operational friendliness of NoSQL databases

You can get the PDF from ☞ here.

Jon Moore

Original title and link: Why Comcast is Interested in NoSQL databases (NoSQL databases © myNoSQL)


NoSQL databases Can Make or Break Your Project

Jeremy Pinkham:

[…] the differences between each of these can sometimes be subtle, but they can also make or break your project. […] The result of those subtle differences is that, unlike the SQL world, you can’t architect a NoSQL based system without having extreme confidence in the specific platform you’ve chosen. There is no reasonable migration path from Cassandra to Membase, for example.

Very, very true.

Original title and link: NoSQL, NoProblem (Not Really… but it’s still awesome) (NoSQL databases © myNoSQL)

via: http://jeremypinkham.com/post/1587967351/nosql-noproblem-not-really-but-its-still-awesome