ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

rdbms: All content tagged as rdbms in NoSQL databases and polyglot persistence

MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs

David Berube’s article debuts with a very good overview of the different approaches for creating and managing work queues:

There are many approaches to creating work queues. One option, though naive, is to use a relational database management system (RDBMS). This is simple to implement because many architectures already have a database system such as MySQL. However, performance is less than optimal compared with other approaches. The atomicity, consistency, isolation, and durability (ACID) compliance required for RDBMS is not necessary for this scenario and negatively impacts performance. A simpler system can perform better.

One system that has gained in popularity for this use is Redis. It’s a key-value data store, like the highly popular memcached, but with more features. For example, Redis has support for pushing and popping elements off lists in a highly scalable and efficient way. Resque, often used with Ruby on Rails, is a system built on top of Redis (see Resources for more details). However, Redis supports only simple primitives. You can’t insert complex objects into the lists, and it has relatively limited support for managing items in those lists.

Alternatively, many systems use a message broker such as Apache ActiveMQ or RabbitMQ. Although these systems are fast and scalable, they’re designed for simple messages. If you want to perform nontrivial reporting on your work queues or modify items in the queues, you are stuck because message brokers rarely offer those features. Fortunately, a powerful, scalable solution is available: MongoDB.

MongoDB allows you to create queues that contain complex nested data. Its locking semantics guarantee you won’t experience problems with concurrency, and its scalability ensures you can run large systems. Because MongoDB is a powerful relational database, you can also run robust reporting on your queue and prioritize by complex criteria. However, MongoDB is not a traditional RDBMS. For instance, it does not support Structured Query Language (SQL) queries.

MongoDB has many appealing features in addition to excellent performance for work queues, such as a flexible, schemaless approach. It supports nested data structures, meaning you can even store subdocuments. Because it is a more full-featured data store than Redis, it provides a richer set of management functions so you can easily view, query, update, and delete jobs on any arbitrary criteria.

Using MongoDB as a queueing system is in many regards as good and as wrong as using a relational database for this type of functionality. They completely lack the semantics and features required by both queues and pubsub. Redis (and obviously the dedicated MOMs) supports natively both queues and pubsub semantics.

So even if the article lists a couple of reasons why MongoDB could be used as a queuing system, consider this solution if and only if the only system you are allowed to run on your environment is MongoDB.

Original title and link: MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs (NoSQL database©myNoSQL)

via: http://www.ibm.com/developerworks/opensource/library/os-mongodb-work-queues/


The time for NoSQL is now

Andrew C. Oliver:

The transition to NoSQL databases will take time. We still don’t have TOAD, Crystal Reports, query language standardization and other essential tools needed for mass adoption. There will be missteps (i.e. I may need a different type of database for reporting than for my operational system), but I truly think this is one technology that isn’t just marketing.

This coming from someone that was happy to discover back in 1998 all the knobs in Oracle.

Original title and link: The time for NoSQL is now (NoSQL database©myNoSQL)

via: http://osintegrators.com/node/76


Doug Cutting About Hadoop, Its Adoption and Future, and Its Relationship With Relational Databases

Jaikumar Vijayan (Computerworld) interviews Doug Cutting:

Q: How would you describe Hadoop to a CIO or a CFO? Why should enterprises care about it?

A: At a really simple level, it lets you affordably save and process vastly more data than you could before. With more data and the ability to process it, companies can see more, they can learn more, they can do more. [With Hadoop] you can start to do all sorts of analyses that just weren’t practical before. You can start to look at patterns over years, over seasons, across demographics. You have enough data to fill in patterns and make predictions and decide, “How should we price things?” and “What should we be selling now?” and “How should we advertise?” It is not only about having data for longer periods, but also richer data about any given period.

The interview covers topics like why the interest in Hadoop, Hadoop adoption in the enterprise world and outside, limitations of relational database. It is a must read—if only they would have added some newlines here and there.

Original title and link: Doug Cutting About Hadoop, Its Adoption and Future, and Its Relationship With Relational Databases (NoSQL database©myNoSQL)

via: http://www.computerworld.com/s/article/9222758/The_Grill_Doug_Cutting


MySQL MEMORY as Poor Man’s Memcached Replacement

ServerFault Q&A:

Q: Copy MySQL to RAM as a poor man’s memcached replacement?

A: Use the the MEMORY storage engine on a read only slave to do your reads from, is exactly what you really want and a sane setup. Forget “dumping it to disk” (?!) or other strange things.

You can even put the slave as another instance on your existing server if you can’t afford to setup a dedicated slave, but properly tuning the MySQL parameters for mostly read workloads will bring a significant performance enhancement too!

Jiminy

Original title and link: MySQL MEMORY as Poor Man’s Memcached Replacement (NoSQL database©myNoSQL)


The Wonderful Wizard of Oz Through a Polyglot Persistence Glass

Adrian Giordani:

Relational databases are the Yellow Brick Road of managing large structured data globally. […] In the 1939 film of The Wizard of Oz, a red brick road is intertwined with the yellow one. Similarly, a new type of database might soon offer a different path: NoSQL, or Not-Only-SQL, first coined in 2008, is promising a faster and more scalable database architecture, at least for some cases.

But who’s the Tin Woodman?

Original title and link: The Wonderful Wizard of Oz Through a Polyglot Persistence Glass (NoSQL database©myNoSQL)

via: http://www.isgtw.org/feature/following-red-brick-road-data-management


NoSQL Databases Best Practices and Emerging Trends

Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:

InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?

Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.

If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.

If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.

Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.

InfoQ: What are the emerging trends in combining the NoSQL data stores?

Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.

This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.

Original title and link: NoSQL Databases Best Practices and Emerging Trends (NoSQL database©myNoSQL)

via: http://www.infoq.com/news/2011/12/mongograph-qa


MapReduce vs Parallel DBMS: Where Does Map Reduce Shine

From Jim Kaskade’s great post about MapReduce’s advantages:

One of the big attractive qualities of the MR programming model (and maybe it’s key attraction to the new generation of data scientists and application programmers) is its simplicity; an MR program consists of only two functions – Map and Reduce – written to process key/value data pairs. Therefore, the model is easy to use, even for programmers without experience with parallel and distributed systems.

It also hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.

Original title and link: MapReduce vs Parallel DBMS: Where Does Map Reduce Shine (NoSQL database©myNoSQL)

via: http://jameskaskade.com/?p=2253


Distributed Caches, NoSQL Databases, and RDBMS

Greg Luck[1] following up on his article Ehcache: Distributed Cache or NoSQL Store? talks about architectural differences between distributed caches, NoSQL database, and RDBMS and where distributed caches fit:

NoSQL and RDBMS are generally on disk. Disks are mechanical devices and exhibit large latencies due to seek time as the head moves to the right track and read or write times dependent on the RPM of the disk platter. NoSQL tends to optimise disk use, for example, by only appending to logs with the disk head in place and occasionally flushing to disk. By contrast, caches are principally in memory. […] With RDBMS a cache is added to avoid these scale out difficulties. For NoSQL, scale out is built-in, so the cache will get used when lower latencies are required.


  1. Greg Luck: Founder and CTO, Ehcache  

Original title and link: Distributed Caches, NoSQL Databases, and RDBMS (NoSQL database©myNoSQL)

via: http://www.infoq.com/news/2011/11/distributed-cache-nosql-data-sto


Apache Sqoop: What, When Where, How

The other day I’ve posted about Sqoop’s first release under Apache umbrella, so I’ve thought of providing a bit more details about where Sqoop fits in picture. I’ve embedded below 3 presentations that will answer questions like what is Sqoop, when and where to use Sqoop, how to use Sqoop.


Traditional SQL DaaS vs NewSQL

Mike Hogan (CEO ScaleDB) provides some very valid issues with traditional relational databases operating as Databases-as-a-Service:

When moving from a self-managed database—either in the cloud or on premise—to a DaaS, the “DBA-in-the-cloud” doesn’t have that visibility into the business requirements, performance requirements, development schedule, and more. This lack of visibility turns the already challenging task of hand-tuning the database into a near impossibility using traditional databases.

And these are just the most visible ones.

On the other hand, I totally agree with Markus ‘maol’ Perdrizat pointing out that NewSQL is not the only solution to these problems:

I agree with the problem positioning, but feel strongly that NewSQL is not a requirement to address the problem here, you can equally work a little services layer and put all the control into the hands of the user, essentially replacing (a lot of) the DBA tasks with automation and APIs.

What NewSQL gives you though, and we see that with Xeround and supposedly also ScaleDB, is the elasticity and transparent sharding that’s difficult to achieve with the more traditional Oracle, Sybase or SQL Server databases that are still often required in the enterprise space.

Original title and link: Traditional SQL DaaS vs NewSQL (NoSQL database©myNoSQL)

via: http://scaledb.blogspot.com/2011/09/lack-of-business-visibility-cripples.html


Is MongoDB a Good Alternative to RDBMs Databases?

Gijs Mollema summarizes the lessons learned after attending Brendan McAdams’ MongoDB workshop at Devoxx—embedded below:

I have to say I was pleasantly surprised by the ease of use and the features of this product. […] Of course, using a NoSQL technology like MongoDB involves some trade-offs and a different mindset than the traditional RDBMS. The main advantages as mentioned before are flexibility, scalability and performance. As the noSQL principle looks promising it is not (yet) the holy grail and therefor currently cannot replace the RDBMs for each situation. It is a different type of database which can be a solution, based on the requirements of the situation. It will not replace RDBMs databases but it I reckon it might run well side-by-side in the future (delegating model / functionality at which MongoDB is good at).

The only thing that made me wonder is how having no sql or hibernate queries (complex joins) could be seen as an advantage?

For reference, below’s Brendan McAdams’ presentation:

via: http://blog.iprofs.nl/2011/11/25/is-mongodb-a-good-alternative-to-rdbms-databases-like-oracle-and-mysql/


Graph Databases and the World Wide Web

Sir Tim Berners-Lee:

Inventing the World Wide Web involved my growing realization that there was a power in arranging ideas in an unconstrained, web-like way.  And that awareness came to me through precisely that kind of process.

Let’s think how the different data models require us to arrange data:

  1. hierarchical model: free form, single-type of relationship (parent-child)
  2. relational model: strict form, (limited) multiple-types of relationships
  3. document model: free form, dual relationship types: logical and hierarchical
  4. star schema: strict form, (limited) multiple-types of relationships

Now think about graph databases: free form (nodes can have any number of properties), unlimited number of uni/bi-directional relationships. So question is, why aren’t network/graph databases used more these days?

Original title and link: Graph Databases and the World Wide Web (NoSQL database©myNoSQL)