ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hadoop and big data: Where Apache Slider slots in and why it matters

Arun Murthy for ZDNet about Apache Slider:

Slider is a framework that allows you to bridge existing always-on services and makes sure they work really well on top of YARN without having to modify the application itself. That’s really important.

Right now it’s HBase and Accumulo but it could be Cassandra, it could be MongoDB, it could be anything in the world. That’s the key part.

I couldn’t find the project on the Incubator page.

Original title and link: Hadoop and big data: Where Apache Slider slots in and why it matters (NoSQL database©myNoSQL)

via: http://www.zdnet.com/hadoop-and-big-data-where-apache-slider-slots-in-and-why-it-matters-7000028073/


Price Comparison for Big Data Appliance and Hadoop

The main differences between Oracle Big Data Appliance and a DIY approach are:

  1. A DIY system - at list price with basic installation but no optimization - is a staggering $220 cheaper as an initial purchase
  2. A DIY system - at list price with basic installation but no optimization - is almost $250,000 more expensive over 3 years.
  3. The support for the DIY system includes five (5) vendors. Your hardware support vendor, the OS vendor, your Hadoop vendor, your encryption vendor as well as your database vendor. Oracle Big Data Appliance is supported end-to- end by a single vendor: Oracle
  4. Time to value. While we trust that your IT staff will get the DIY system up and running, the Oracle system allows for a much faster “loading dock to loading data” time. Typically a few days instead of a few weeks (or even months)
  5. Oracle Big Data Appliance is tuned and configured to take advantage of the software stack, the CPUs and InfiniBand network it runs on
  6. Any issue we, you or any other BDA customer finds in the system is fixed for all customers. You do not have a unique configuration, with unique issues on top of the generic issues.

This is coming from Oracle. Now, without nitpicking prices — I’m pretty sure you’ll find better numbers for the different components — how do you sell Hadoop to the potential customer that took a look at this?

Original title and link: Price Comparison for Big Data Appliance and Hadoop (NoSQL database©myNoSQL)

via: https://blogs.oracle.com/datawarehousing/entry/updated_price_comparison_for_big


Hadoop analytics startup Karmasphere sells itself to FICO

Derrick Harris (GigaOm):

The Fair Isaac Corporation, better known as FICO, has acquired the intellectual property of Hadoop startup Karmasphere. Karmasphere launched in 2010, and was one of the first companies to push the idea of an easy, visual interface for analyzing Hadoop data, and even analyzing it using traditional SQL queries.

Original title and link: Hadoop analytics startup Karmasphere sells itself to FICO (NoSQL database©myNoSQL)

via: http://gigaom.com/2014/04/17/hadoop-analytics-startup-karmasphere-sells-itself-to-fico/


Why the clock is ticking for MongoDB

Robert Haas takes a comparative look at PostgreSQL and MongoDB’s features emphasized by its MongoDB CEO in an interview:

Schireson also mentions another advantage of document stores: schema flexibility. Of course, he again ignores the possible advantages, for some users, of a fixed schema, such as better validity checking. But more importantly, he ignores the fact that relational databases such as PostgreSQL have had similar capabilities since before MongoDB existed. PostgreSQL’s hstore, which provides the ability to store and index collections of key-value pairs in a fashion similar to what MongoDB provides, was first released in December of 2006, the year before MongoDB development began. True JSON capabilities were added to the PostgreSQL core as part of the 9.2 release, which went GA in September of 2012. The 9.4 release, expected later this year, will greatly expand those capabilities. In today’s era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.

It’s difficult impossible to debate or contradict the majority of facts and arguments the author is making. But in order to understand the history and future of developer tools, it’s worth emphasizing one aspect that has been almost completely ignored for way too long. — and the author mentions it just briefly.

Developers want to get things done. Fast and Easy.

For too long vendors thought that a tool that had a feature covered was enough. Even if the user had to read a book or two, hire an army of consultants, postpone the deadlines, and finally make three incantations to get it working. This strategy worked well for decades. It worked especially well in the space of databases where buying decisions where made at the top level due to the humongous costs.

MySQL became one of the most popular database because it was free and perceived to be easier than any of the alternatives. Not because it was first. Not because it was feature complete. And definitely not because it was technically superior — PostgreSQL was always technically superior, but never got the install base MySQL got.

MongoDB replays this story by the book. It’s free. It promises features that were missing or are considered complicated in the other products. And it’s perceived as the easiest to use database — a look at MongoDB’s history will reveal immediately its primary focus on ease of use: great documentation, friendly setup, fast getting started experience. For a lot of people, it really doesn’t matter anymore that there are alternative solutions that offer technically superior solutions. They’ve got their things done. Fast and Easy. Tomorrow is another day.

Original title and link: Why the clock is ticking for MongoDB (NoSQL database©myNoSQL)

via: http://rhaas.blogspot.nl/2014/04/why-clock-is-ticking-for-mongodb.html


We will find the author of the Bitcoin whitepaper even if he doesn’t want us to

Nermin Hajdarbegovic (CoinDesk):

A group of forensic linguistics experts from Aston University believe the real creator of bitcoin is former law professor Nick Szabo.

Dr. Grieve explained:

The number of linguistic similarities between Szabo’s writing and the bitcoin whitepaper is uncanny, none of the other possible authors were anywhere near as good of a match.

Privacy is all gone.

Original title and link: We will find the author of the Bitcoin whitepaper even if he doesn’t want us to (NoSQL database©myNoSQL)


How Americans Die

Great example of datavis and investigative analysis.

How Americans Die

Original title and link: How Americans Die (NoSQL database©myNoSQL)

via: http://www.bloomberg.com/dataview/2014-04-17/how-americans-die.html


Hortonworks: the Red Hat of Hadoop

However, John Furrier, founder of SiliconANGLE, posits that Hortonworks, with their similar DNA being applied in the data world, is, in fact, the Red Hat of Hadoop. “The discipline required,” he says, “really is a long game.”

It looks like Hortonworks’s positioning has been successful in that they are now perceived as the true (and only) open sourcerers.

Original title and link: Hortonworks: the Red Hat of Hadoop (NoSQL database©myNoSQL)

via: http://siliconangle.com/blog/2014/04/16/hortonworks-the-red-hat-of-hadoop-rhsummit/


Riak: Entropy detection, correction, and conflict resolution

John Daily covers Riak’s mechanisms for bringing data in sync across the nodes:

Riak’s overarching design goal is simple: be maximally available. […] In order to make sure your data can survive server failures, Riak retains multiple copies (replicas) and allows lock-free, uncoordinated updates. […] This then open ups the possibility that data will be out of sync across a cluster. Riak manages this issue in three distinct stages: entropy detection, correction, and conflict resolution.

You’ll read pitches from products promising both maximal availability and no out-of-date data. Those are just that promises.

Original title and link: Riak: Entropy detection, correction, and conflict resolution (NoSQL database©myNoSQL)

via: https://basho.com/entropy-in-riak/


NoSQL meets Bitcoin and brings down two exchanges

Most of Emin Gün Sirer’s posts end up linked here, as I usually enjoy the way he combines a real-life story with something technical, all that ending with a pitch for HyperDex.

The problem here stemmed from the broken-by-design interface and semantics offered by MongoDB. And the situation would not have been any different if we had used Cassandra or Riak. All of these first-generation NoSQL datastores were early because they are easy to build. When the datastore does not provide any tangible guarantees besides “best effort,” building it is simple. Any masters student in a top school can build an eventually consistent datastore over a weekend, and students in our courses at Cornell routinely do. What they don’t do is go from door to door in the valley, peddling the resulting code as if it could or should be deployed.

Unfortunately in this case, the jump from the real problem, which was caused only by the pure incompetence, to declaring “first-generation NoSQL databases” as being bad and pitching HyperDex’s features is both too quick and incorrect1.


  1. 1) ACID guarantees wouldn’t have solved the issue; 2) All 3 NoSQL databases mentioned, actually offer a solution for this particular scenario. 

Original title and link: NoSQL meets Bitcoin and brings down two exchanges (NoSQL database©myNoSQL)

via: http://hackingdistributed.com/2014/04/06/another-one-bites-the-dust-flexcoin/


GigOM Interviews Aerospike at Structure Data 2014 on Application Scalability [sponsor]

An interview from Structure Data 2014, featuring Aeropsike:


Aerospike Technical Marketing Director, Young Paik explains how you can add rocket fuel to your big data application by running the Aerospike database on top of Hadoop for lightning fast user-profile lookups.

Original title and link: GigOM Interviews Aerospike at Structure Data 2014 on Application Scalability [sponsor] (NoSQL database©myNoSQL)


Apache Hadoop 2.4.0 released with operational improvements

Hadoop 2.4.0 continues that momentum, with additional enhancements to both HDFS & YARN:

  • Support for Access Control Lists in HDFS
  • Native support for Rolling Upgrades in HDFS
  • Smooth operational upgrades with protocol buffers for HDFS FSImage
  • Full HTTPS support for HDFS
  • Support for Automatic Failover of the YARN ResourceManager (a.k.a Phase 1 of YARN ResourceManager High Availability)
  • Enhanced support for new applications on YARN with Application History Server and Application Timeline Server
  • Support for strong SLAs in YARN CapacityScheduler via Preemption

Original title and link: Apache Hadoop 2.4.0 released with operational improvements (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/apache-hadoop-2-4-0-released/


Your Big Data Is Worthless if You Don’t Bring It Into the Real World

Building on the (exact) same premise as last week’s FT.com article Big data: are we making a big mistake?, Mikkel Krenchel and Christian Madsbjerg write for Wired:

Not only did Google Flu Trends largely fail to provide an accurate picture of the spread of influenza, it will never live up to the dreams of the big- data evangelists. Because big data is nothing without “thick data,” the rich and contextualized information you gather only by getting up from the computer and venturing out into the real world. Computer nerds were once ridiculed for their social ineptitude and told to “get out more.” The truth is, if big data’s biggest believers actually want to understand the world they are helping to shape, they really need to do just that.

While the authors actually mean the above literally, I think the valid point the article could have made is that looking at a data set alone without considering:

  1. possibly missing data,
  2. context data and knowledge,
  3. and field know-how

can lead to incorrect conclusions — the most obvious examples being the causal fallacy and the correlation-causation confusions.

✚ Somehow related to the “possibly missing data” point, the article How politics makes us stupid brings up some other very interesting points.

Original title and link: Your Big Data Is Worthless if You Don’t Bring It Into the Real World (NoSQL database©myNoSQL)

via: http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-world/