ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Topology: The Architecture of Distributed Systems [sponsor]

Words from the special myNoSQL sponsor, Couchbase:


You can’t judge a book by its cover, but you can judge the architecture of a distributed system by its topology.

If two distributed systems are equally effective, is the one with the simpler topology the one with the better architecture? This article compares the architecture of two document databases and two wide column stores by looking at their topologies.

Document Databases

Topology #1

doc_db_one

Wow. There is a lot going on here. There are four node types and two layers of logical groupings.

Topology #2

doc_db_two

Nice. Simple. There is one node type.

Which document database would you choose?

  • Which one is going to be easier to deploy?
  • Which one is going to be easier to maintain?
  • Which one is going to be easier to scale?
  • Which one is going to be more resilient?

I believe the less moving parts, the better.

Read more on system topologies.

Original title and link: Topology: The Architecture of Distributed Systems [sponsor] (NoSQL database©myNoSQL)


Snapdeal selects Aerospike to improve shopper satisfaction over MongoDB, Couchbase and Redis [sponsor]

Words from the long time myNoSQL supporter, Aerospike, reporting on a success story of a customer deploying Aerospike to deal with massive demand growth:


After experiencing 500% growth in 2013, Snapdeal, India’s largest online marketplace, switched from 10 MongoDB servers to just two Linux servers on Amazon EC2 with Aerospike, and reduced response times to less than a millisecond.

Read the case study to learn more.

Original title and link: Snapdeal selects Aerospike to improve shopper satisfaction over MongoDB, Couchbase and Redis [sponsor] (NoSQL database©myNoSQL)


Integrating D3 with CouchDB

A 4-part series by Mike Bostock describing various integrations paths of D3 and CouchDB:

  1. Part 1: saving a D3 app in CouchDB
  2. Part 2: storing D3 library in CouchDB and storing data in CouchDB
  3. Part 3: accessing CouchDB data from D3
  4. Part 4: data import

Original title and link: Integrating D3 with CouchDB (NoSQL database©myNoSQL)


The NoSQL KISS [sponsor]

In the words of the special sponsor, Couchbase:


Kelly knew it. The U.S. Navy knows it. You know it.

Keep it Simple, Stupid (KISS)

The Problem

We categorized NoSQL implementations. The categories include distributed caches, key / value stores, and document databases. However, what if application requirements span multiple categories? Do you add Redis, Riak, and MongoDB? The result would not be simple, stupid.

whatthehellhaveyoubuilt

The Solution

Let distributed caching, key / value storage, and document handling be use cases. The solution is a single NoSQL implementation that supports multiple use cases. In fact, Viber recently solved this problem. Their previous architecture relied on MongoDB for document processing and Redis for distributed caching. Their current architecture relies on Couchbase Server as a single replacement for both MongoDB and Redis. Read the full story.

Original title and link: The NoSQL KISS [sponsor] (NoSQL database©myNoSQL)


Big doubts on big data: Why I won't be sharing my medical data with anyone

Jo Best (ZDNet) talking about the privacy concerns of having centralized, non-regulated, non-anonymised healthcare data:

If ever there was an open goal for big data, healthcare should be it.

By gathering information from doctors, patients, drug companies, insurers, and charities, and putting the big data machinery to work on analysing it, we should be able to get better insights into a range of conditions and then come up with better ways to treat them.

I’m happy I’m not the only one concerned about all these.

Original title and link: Big doubts on big data: Why I won’t be sharing my medical data with anyone (NoSQL database©myNoSQL)

via: http://www.zdnet.com/uk/big-doubts-on-big-data-why-i-wont-be-sharing-my-medical-data-with-anyone-yet-7000026497/


MapR product strategy

Maria Deutscher (SiliconAngle) quoting MapR CMO Jack Norris:

The MapR strategy centers on what chief marketing officer Jack Norris described in an interview as a “proven business model of really focusing on a product, selling a product, making a product enterprise grade, utilizing the innovations of the community but providing some [additional] advantages so customers can be even more successful.”

I thought that a part of a proven business is innovating on the product and less so utilizing the innovations of the community. Or at least finding some ways to paying back for those community innovations.

Original title and link: MapR product strategy (NoSQL database©myNoSQL)

via: http://siliconangle.com/blog/2014/02/24/mapr-continues-on-aggressive-expansion-path-with-new-asia-pacific-office/


Quick guide to CRDTs in Riak 2.0

Joel Jacobson provides a quick intro to using the new CRDT counters, sets, and maps in the Riak 2.0 preview:

Riak Data Types (also referred to as CRDTs) adds counters, sets, and maps to Riak – allowing for better conflict resolution. They enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.

✚ An extra point for everyone recognizing the data sample used in the post.

Original title and link: Quick guide to CRDTs in Riak 2.0 (NoSQL database©myNoSQL)

via: http://blog.joeljacobson.com/riak-2-0-data-types/


Stranger in a strange land: HPC and Big Data

Paul Mineiro sharing his notes and thoughts after attending an HPC event:

My plan was to observe the HPC community, try to get a feel how their worldview differs from my internet-centric “Big Data” mindset, and broaden my horizons. Intriguingly, the HPC guys are actually busy doing the opposite. They’re aware of what we’re up to, but they talk about Hadoop like it’s some giant livin’ in the hillside, comin down to visit the townspeople. Listening to them mapping what we’re up to into their conceptual landscape was very enlightening, and helped me understand them better.

No more ivory towers.

Original title and link: Stranger in a Strange Land: HPC and Big Data (NoSQL database©myNoSQL)

via: http://www.machinedlearnings.com/2014/02/stranger-in-strange-land.html?m=1


From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects

Damien Katz created CouchDB after working at IBM on Lotus Notes: CouchDB and Me. CouchDB went the Apache way. Then things got complicated…

On the West coast, Damien Katz and a team of committers created Couchio, later renamed to CouchOne, later merged with Membase to become Couchbase, which finally dropped CouchDB. Damien Katz left Couchbase.

A confusing history with a very complicated genealogy of projects (don’t worry, this goes on) and companies. And this was only West Coast.

East Coast, Cloudant took CouchDB and made it BigCouch. I thought that Cloudant will be the CouchDB company — and in a way it was. Cloudant put BigCouch on the cloud as a service and on GitHub as open source. BigCouch is supposed to get back into Apache CouchDB, but many months later this hasn’t materialized yet.

To complete the circle, today IBM announced signing an agreement to acquire Cloudant — news coverage on GigaOm, BostInno, TechCrunch. Which probably makes sense considering Cloudant’s relationship with SoftLayer and IBM’s $1 billion Platform-as-a-Service Investment, but less so if you consider the IBM and 10genMongoDB collaboration.

Anyways, the future of Apache CouchDB is bright. Yep.

Original title and link: From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects (NoSQL database©myNoSQL)


How SQL-on-JSON analytics bolstered a business

Alex Woodie (Datanami) reporting about BitYota a SQL-based data warehouse on top of JSON:

BitYota says it designed its own hosted data warehouse from scratch, and that it’s differentiated by having a JSON access layer atop the data store. “We have some uniqueness where we operate SQL directly on JSON,” says BitYota CEO Dev Patel. “We don’t need to translate that data into a structured format like a CSV. We believe that if you transform the data, you will lose some of the data quality. And once that’s transformed, you won’t get it back.”

✚ BitYota’s tagline is Analytics for mongoDB, so I assume it’s safe to say the backend is mongoDB and they are building a SQL layer on top of it. What flavor and what’s the behavior for SQL’s quirks would be a very interesting story.

✚ This related to my earlier Do all roads lead back to SQL?

Original title and link: How SQL-on-JSON analytics bolstered a business (NoSQL database©myNoSQL)

via: http://www.datanami.com/datanami/2014-02-12/how_sql-on-json_analytics_bolstered_a_business.html


Do all roads lead back to SQL? Some might and some might not

Seth Proctor for Dr.Dobb’s:

Increasingly, NewSQL systems are showing scale, schema flexibility, and ease of use. Interestingly, many NoSQL and analytic systems are now putting limited transactional support or richer query languages into their roadmaps in a move to fill in the gaps around ACID and declarative programming. What that means for the evolution of these systems is yet to be seen, but clearly, the appeal of Codd’s model is as strong as ever 43 years later.

Spend a bit of time reading (really reading) the above paragraph—there are quite a few different concepts put together to make the point of the article.

SQL is indeed getting closer to the NoSQL databases, but mostly to Hadoop. I still stand by my thoughts in The premature return to SQL.

Most NoSQL databases already offer some limited ACID guarantees. And some flavors of transactions are supported or are being added. But only as long as the core principles can still be guaranteed or the trade-offs are made obvious and offered as clear choices to application developers.

The relational model stays with the relational databases. If some of its principles can be applied (e.g. data type integrity, optional schema enforcement), I see nothing wrong with supporting them. Good technical solutions know both what is needed and what is possible.

Original title and link: Do All Roads Lead Back to SQL? | Dr Dobb’s (NoSQL database©myNoSQL)

via: http://www.drdobbs.com/architecture-and-design/do-all-roads-lead-back-to-sql/240162452


When should I use Greenplum Database versus HAWQ?

Jon Roberts about the use cases for Greenplum and HAWQ, both technologies offered by Pivotal:

Greenplum is a robust MPP database that works very well for Data Marts and Enterprise Data Warehouses that tackles historical Business Intelligence reporting as well as predictive analytical use cases. HAWQ provides the most robust SQL interface for Hadoop and can tackle data exploration and transformation in HDFS.

First questions that popped in my mind:

  1. why isn’t HAWQ good for reporting?
  2. why isn’t HAWQ good for predictive analytics?

I don’t have a good answer for any of these. For the first, I assume that the implied answer is Hadoop’s latency. On the other hand, what I know is that Microsoft and Hortonworks are trying to bring Hadoop data into Excel with HDInsight. This is not traditional reporting, but if that’s acceptable from a latency point of view, I’m not sure why it wouldn’t work for reporting too.

For the second question, Hadoop and the tools built around it are well known for predictive analytics. So maybe this separation is due only to HAWQ. Another explanation could be product positioning.

This last part seems to be confirmed by the rest of the post which is making the point that data stored in HDFS is temporary and once it is processed with HAWQ it is moved into Greenplum.

Greenplum and HAWQ

In other words, HAWQ is just for ETL/ELT on Hadoop.

✚ I’m pretty sure that many traditional data warehouse companies that are forced to come up with coherent proposals for architectures based on their core products and Hadoop are facing the same product positioning problem — it’s difficult to accept in front of the customers that Hadoop might be capable to replace core functionality of the products you are selling.

What is the best answer to this positioning dilemma?

  1. Find a spot for Hadoop that is not hurting your core products. Let’s say ETL.
  2. Propose an architecture where your core products and Hadoop are fully complementing and interacting with each other.

You already know my answer.

Original title and link: When should I use Greenplum Database versus HAWQ? (NoSQL database©myNoSQL)

via: http://www.pivotalguru.com/?p=642