ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hybrid Logical Clocks: Logical clocks and Physical time

Murat Demirbas

In our recent work (in collaboration with Sandeep Kulkarni at Michigan State University), we introduce Hybrid Logical Clocks (HLC). HLC captures the causality relationship like LC, and enables easy identification of consistent snapshots in distributed systems. Dually, HLC can be used in lieu of PT clocks since it maintains its logical clock to be always close to the PT clock.

Many distributed systems depend on ordering events and in many cases time is the way this ordering is achieved. Spanner’s TrueTime is probably the most “famous” example.

via: http://muratbuffalo.blogspot.com.tr/2014/07/hybrid-logical-clocks.html


Building a self-serve platform for Hadoop

What big users, in this case Pinterest, would get, ideally, from Hadoop:

Though Hadoop is a powerful processing and storage system, it’s not a plug and play technology. Because it doesn’t have cloud or elastic computing, or non-technical users in mind, its original design falls short as a self-serve platform. Fortunately there are many Hadoop libraries/applications and service providers that offer solutions to these limitations. Before choosing from these solutions, we mapped out our Hadoop setup requirements.

If you go through the 7 items listed in this post, you’ll have to agree that none sounds unreasonable. Some of these requirements might be Pinterest specific, or at least derived from their size, but I can see how each of them would simplify things. On the other hand, I’m not aware of work being done in any of these areas (nb: security is a hairy topic and everyone wants exactly what they are using).

Original title and link: Building a self-serve platform for Hadoop (NoSQL database©myNoSQL)

via: http://engineering.pinterest.com/post/92742371919/powering-big-data-at-pinterest


Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB

Zardosht Kasheff from Tokutek:

Ark is an implementation of a consensus algorithm (also known as elections) similar to Paxos and Raft that we are working on to handle replica set elections and failovers in TokuMX. It has many similarities to Raft, but also has some big differences.

The paper is unfortunately not very readable as it’s constructed as “the patched version of the current protocol”.

via: http://www.tokutek.com/2014/07/introducing-ark-a-consensus-algorithm-for-tokumx-and-mongodb/


Whitepaper Clarifies ACID Support in Aerospike [sponsor]

Aerospike, myNoSQL’s long time supporter, has published a new paper about ACID support in Aerospike. Check out the details below:


In our latest whitepaper, author and Aerospike VP of Engineering & Operations, Srini Srinivasan, defines ACID support in Aerospike, and explains how Aerospike maintains high consistency by using techniques to reduce the possibility of partitions.

Read the whitepaper: http://www.aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf

Original title and link: Whitepaper Clarifies ACID Support in Aerospike [sponsor] (NoSQL database©myNoSQL)


7 books for Machine Learning with R

Jason Brownlee put together a list of 7 machine learning books that make use of R:

In this post I want to point out some resources you can use to get started in R for machine learning.

Original title and link: 7 books for Machine Learning with R (NoSQL database©myNoSQL)

via: http://machinelearningmastery.com/books-for-machine-learning-with-r/


SQL-on-Hadoop: Pivotal HAWQ benchmark.

The results bore out Pivotal’s statement that HAWQ is the world’s fastest SQL query engine on Hadoop […] The paper, titled “Orca: A Modular Query Optimizer Architecture for Big Data,” includes benchmark results based on the TPC-DS, a well-known decision support benchmark that models several generally applicable aspects of a decision support system.

Pivotal’s SQL-on-Hadoop solution is based on a cost-based query optimizer.

via: http://blog.gopivotal.com/pivotal/products/pivotal-hawq-benchmark-demonstrates-up-to-21x-faster-performance-on-hadoop-queries-than-sql-like-solutions


Spark Summit 2014 roundup

I haven’t been at the Spark Summit and even if the complete event was streamed online, my agenda hasn’t allowed me to watch more than a couple keynotes. Thomas Dinsmore’s notes about the event were quite interesting to get an idea of what happened there.

One thing that caught my attention immediately:

Last December, the 2013 Spark Summit pulled 450 attendees for a two-day event. Six months later, the Spark Summit 2014 sold out at more than a thousand seats for a three- day affair.

Original title and link: Spark Summit 2014 roundup (NoSQL database©myNoSQL)

via: http://thomaswdinsmore.com/2014/07/03/spark-summit-2014-roundup/


The expanding alternative universe of Hadoop

Merv Adrian:

Hadoop has moved from a coarse-grained blunt instrument for largely ETL- style workloads to an expanding stack for virtually any IT task big data professionals will want to undertake. What is Hadoop now? It’s a candidate to be the alternative universe for data processing, with over 20 components that span a wide array of functions.

As the Hadoop alternative universe is expanding, its complexity continues to grow too. The whole purpose of bBig data platforms” from Cloudera and Hortonworks is to make this universe navigable, but it feels the majority of travelers still needs a lot of patience and courage to discover it.

Original title and link: The expanding alternative universe of Hadoop (NoSQL database©myNoSQL)

via: http://blogs.gartner.com/merv-adrian/2014/06/28/what-is-hadoop-now/


Benchmark(et)ing

Mark Callaghan:

Benchmarketing is a common activity for many DBMS products whether they are closed or open source. Most products need new users to maintain viability and marketing is part of the process. The goal for benchmarketing is to show that A is better than B. Either by accident or on purpose good benchmarketing results focus on the message A is better than B rather than A is better than B in this context. Note that the context can be critical and includes the hardware, workload, whether both systems were properly configured and some attempt to explain why one system was faster.

He’s very right about every aspect in the post.

Maybe the only small edit I’d make would be to emphasize once more that the context is critical and if left out it’ll invalidate the value of the benchmark.

Original title and link: Benchmark(et)ing (NoSQL database©myNoSQL)

via: http://smalldatum.blogspot.com/2014/06/benchmarketing.html


Dell and Cloudera and Intel join forces for appliances

Me in Intel kills a Hadoop and feeds another:

As for Intel, what if this investment also sealed an exclusive deal for Hadoop-centric Cloudera-supported Intel-powered appliance?

I didn’t know about the existing Dell-Cloudera-Intel partnership, but this is re-inforced with the recent announcement of an in-memory appliance.

Since 2011, Cloudera, Dell and Intel have built pre-validated reference architectures for Hadoop. […]

The Dell In-Memory Appliances for Cloudera Enterprise is yet another proof point of the collaboration and synergies between the three companies. As the first of a family of appliances, it includes leading Dell hardware, Cloudera’s enterprise data hub -based on Cloudera Enterprise, Intel architecture for fast processing, and ScaleMP’s Versatile SMP (vSMP) architecture to aggregate multiple x86 servers into a single virtual machine to create large memory pools for in-memory processing.

Original title and link: Dell and Cloudera and Intel join forces for appliances (NoSQL database©myNoSQL)


Beating the CAP Theorem Checklist

Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won’t work:

Andrei Savu

Original title and link: Beating the CAP Theorem Checklist (NoSQL database©myNoSQL)

via: http://ferd.ca/beating-the-cap-theorem-checklist.html


Aerospike: One week of being open source

Brian Bulkowski, co-founder and CTO of Aerospike1 about the recent announcement of open sourcing Aerospike (and a new round of funding):

We didn’t want to open source too early and lose the benefits of focus – nor too late and lose the benefits of broad adoption.

[…]

I believe Aerospike’s unique open source strategy has the opportunity to deliver a higher quality open source project than has been delivered in the past.

I was trying earlier this week to remember another project going this route2.


  1. Disclaimer: Aerospike has been a long-time supporter of myNoSQL (and I’m very thankful for that). 

  2. I’m not talking here of TextMate open source abandonware

Original title and link: Aerospike: One week of being open source (NoSQL database©myNoSQL)

via: http://www.aerospike.com/blog/aerospike-open-source-the-first-week/