ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Choice of NoSQL databases from Cloudera

Adam Fowler1 looks at the potential confusion for Cloudera’s customers when talking about NoSQL databases:

As for Cloudera customers I’m not too sure. It may confuse people asking Cloudera about NoSQL. Below is a potential conversation that, as a sales engineer for NoSQL vendor MarkLogic, I can see easily happening:

This announcement struck me as being too publicized — it’s normal for companies with similar interests to partner, but a fair amount of care should be put into clearing all possible confusions and I don’t think this happened.

Just to summarize: Cloudera provides support for HBase and Accumulo. And it has a deal with MongoDB and Oracle. I assume in the sale process, Cloudera will go with: “we work with whatever you already have in place”. As for recommending a NoSQL solution for their customers, it will probably go as in Adam Fowler’s post. To which we could probably add Oracle too.


  1. Adam Fowler works for MarkLogic. 

Original title and link: Choice of NoSQL databases from Cloudera (NoSQL database©myNoSQL)

via: http://adamfowlerml.wordpress.com/2014/05/05/choice-of-nosql-databases-from-cloudera/


White House Report Warns Of 'Big Data' Abuses

Devin Coldewey for NBC News:

To that end, the report offers six major policy recommendations:

  • A Consumer Privacy Bill of Rights that codifies what people can expect when opting in or out of data collection programs.
  • Stringent requirements on preventing and reporting data breaches.
  • Privacy protection for more than just U.S. citizens as a global gesture of good faith.
  • Ensure data collected in schools is used only for educational purposes.
  • Prevent big data from being used as a method of discrimination (so-called “digital redlining”).
  • Update the Electronic Communications Privacy Act (ECPA) to be consonant with an age of cloud computing, mobile data, and email.

Who would oppose such clear recommendations and what would be their arguments?

Original title and link: White House Report Warns Of ‘Big Data’ Abuses (NoSQL database©myNoSQL)

via: http://www.nbcnews.com/tech/security/white-house-report-warns-big-data-abuses-n95081


The White House report recommends that the president take new steps to enhance consumer privacy in the age of big data

Zeke J. Miller for Time:

There are also three recommendations that Podesta is encouraging Obama to order the federal government to take up, including extending existing privacy protections to non-U.S. citizens and people not in the country, and ensuring that data collected in schools is only used for educational purposes. Additionally, the report calls on the federal government to build up the capability to be able to spot discriminatory uses of “big data” by companies and the government. “The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining,” Podesta warned.

Original title and link: The White House report recommends that the president take new steps to enhance consumer privacy in the age of big data (NoSQL database©myNoSQL)

via: http://time.com/84338/obama-eyes-enhanced-privacy-protections-in-big-data-era/


Findings of the Big Data and Privacy Working Group Review

John Podesta, the leader of the group assigned by the White House to look at the present and future of Big Data and privacy:

No matter how quickly technology advances, it remains within our power to ensure that we both encourage innovation and protect our values through law, policy, and the practices we encourage in the public and private sector. To that end, we make six actionable policy recommendations in our report to the President

Original title and link: Findings of the Big Data and Privacy Working Group Review (NoSQL database©myNoSQL)

via: http://www.whitehouse.gov/blog/2014/05/01/findings-big-data-and-privacy-working-group-review


The future of Big Data and its impact on privacy

Tom Simonite summarized the 5 (big) concerns detailed in a White House report about the potential and risks of big data:

The 68-page report was published today and repeatedly emphasizes that big data techniques can advance the U.S. economy, government, and public life. But it also spends a lot of time warning of the potential downsides, saying in the introduction that:

“A significant finding of this report is that big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace.”

I can only hope that having all these clear warning signs at the right level, will only lead to at least a similarly clear legislation protecting the privacy of all.

Original title and link: The future of Big Data and its impact on privacy (NoSQL database©myNoSQL)

via: http://www.technologyreview.com/view/527071/five-things-obamas-big-data-experts-warned-him-about/


Docker, Hadoop and YARN

Jack Clark (The Register) covers the work done to integrate Docker with Hadoop:

“Where Docker makes perfect sense for YARN is that we can use Docker Images to fully describe the entire unix filesystem image for any YARN container,” explained Arun Murthy, a founder and architect at Hortonworks, to El Reg in an email.

Original title and link: Docker, Hadoop and YARN (NoSQL database©myNoSQL)

via: http://www.theregister.co.uk/2014/05/02/docker_hadoop/


MapReduce jobs profiling with R

Only good things can come out of this combination. And the code is available on GitHub:

At SequenceIQ in order to profile MapReduce jobs, understand (job)internal statistics and create usefull graphs many times we rely on R. The metrics are collected from Ambari and the YARN History Server.

In this blog post we would like to explain and guide you through a simple process of collecting MapReduce job metrics, calculate different statistics and generate easy to understand charts.

Original title and link: MapReduce jobs profiling with R (NoSQL database©myNoSQL)

via: http://blog.sequenceiq.com/blog/2014/05/01/mapreduce-job-profiling-with-R/


The essence of Pig

I love this line from Wes Floyd’s slidedeck:

“Essence of Pig: Map-Reduce is too low a level, SQL too high”

Original title and link: The essence of Pig (NoSQL database©myNoSQL)


Big Data lessons from Netflix

Phil Simon (Wired) covers some details of the Netflix’s “Big Data Platform as a Service @ Netlix” (alternatively titled “Watching Pigs Fly with the Netflix Hadoop Toolkit”):

At Netflix, comparing the hues of similar pictures isn’t a one-time experi­ment conducted by an employee with far too much time on his hands. It’s a regular occurrence. Netflix recognizes that there is tremendous potential value in these discoveries. To that end, the company has created the tools to unlock that value. At the Hadoop Summit, Magnusson and Smith talked about how data on titles, colors, and covers helps Netflix in many ways. For one, analyz­ing colors allows the company to measure the distance between customers. It can also determine, in Smith’s words, the “average color of titles for each customer in a 216-degree vector over the last N days.”

While quite fascinating, I’m wondering how one could prove the value of such details. There’s no way you can run an A/B test or a predictive model or a historic model analysis.

Original title and link: Big Data lessons from Netflix (NoSQL database©myNoSQL)

via: http://www.wired.com/2014/03/big-data-lessons-netflix/


Amazon Web Services Global Infrastructure Graph

Super-smart and impressive application of a graph database to a real domain:

Wouldn’t it be nice if you could slice and dice through the entire AWS domain of services, data centres and prices all in one spot to optimise your AWS bill? , enter the AWS Global Infrastructure Graph!

Original title and link: Amazon Web Services Global Infrastructure Graph (NoSQL database©myNoSQL)

via: http://gist.neo4j.org/?8526106


Spark for Data Science: A Case Study

A great practical intro to Apache Spark by Casey Stella of Hortonworks:

This sounds like a great challenge and an even greater opportunity to try out a new (to me) analytics platform, Apache Spark. So, I’m going to take you through a little journey doing some simple analysis and illustrate the general steps. We’re going to cover

  • Data Gathering
  • Data Engineering
  • Data Analysis
  • Presentation of Results and Conclusions

Original title and link: Spark for Data Science: A Case Study (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/spark-data-science-case-study/


Project Secor: Long-term S3 storage for Kafka logs

A new project open sourced by Pinterest, Secor:

Project Secor was born from the need to persist messages logged to Kafka to S3 for long-term storage. Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.

Original title and link: Project Secor: Long-term S3 storage for Kafka logs (NoSQL database©myNoSQL)

via: http://engineering.pinterest.com/post/84276775924/introducing-pinterest-secor