ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

The evolution of Cassandra

Robbie Strickland (Weather Channel’s software development manager):

“It used to be that Cassandra was something of a beast,” recalled Robbie Strickland, the Weather Channel’s software development manager. “It was like an old space shuttle cockpit with a million knobs.” In fact, prior to joining the Weather Channel, Strickland says using Cassandra required spending “most of the day in the Cassandra IRC channel talking to developers.”

Some databases evolve into more powerful and friendlier tools, while others are just “familiar giant hairballs growing bigger“.

Original title and link: The evolution of Cassandra (NoSQL database©myNoSQL)

via: http://data-informed.com/weather-channel-manages-data-deluge-combination-platforms/


BackMyDB

BackMyDB:

We support the following databases to help you backup and restore your database in case of any failure: MySQL, MongoDB, Redis, PostgreSQL.

Sounds interesting, but I couldn’t find any details about how it works, how secure (as in encrypted) backups are, what’s the guaranteed redundancy, if there’s a SLA for restore, etc.

Original title and link: BackMyDB (NoSQL database©myNoSQL)

via: https://backmydb.com/


Project Rhino goal: at-rest encryption for Apache Hadoop

Although network encryption has been provided in the Apache Hadoop platform for some time (since Hadoop 2.02-alpha/CDH 4.1), at-rest encryption, the encryption of data stored on persistent storage such as disk, is not. To meet that requirement in the platform, Cloudera and Intel are working with the rest of the Hadoop community under the umbrella of Project Rhino — an effort to bring a comprehensive security framework for data protection to Hadoop, which also now includes Apache Sentry (incubating) — to implement at-rest encryption for HDFS (HDFS-6134 and HADOOP-10150).

Looks like I got this wrong: Apache Sentry will become part of Project Rhino.

Original title and link: Project Rhino goal: at-rest encryption for Apache Hadoop (NoSQL database©myNoSQL)

via: http://blog.cloudera.com/blog/2014/06/project-rhino-goal-at-rest-encryption/


Hadoop security: unifying Project Rhino and Sentry

One result of Intel’s investment in Cloudera is putting together the teams to work on the same projects:

As the goals of Project Rhino and Sentry to develop more robust authorization mechanisms in Apache Hadoop are in complete alignment, the efforts of the engineers and security experts from both companies have merged, and their work now contributes to both projects. The specific goal is “unified authorization”, which goes beyond setting up authorization policies for multiple Hadoop components in a single administrative tool; it means setting an access policy once (typically tied to a “group” defined in an external user directory) and having it enforced across all of the different tools that this group of people uses to access data in Hadoop – for example access through Hive, Impala, search, as well as access from tools that execute MapReduce, Pig, and beyond.

A great first step.

You know what would be even better? A single security framework for Hadoop instead of two.

Original title and link: Hadoop security: unifying Project Rhino and Sentry (NoSQL database©myNoSQL)

via: http://vision.cloudera.com/project-rhino-and-sentry-onward-to-unified-authorization/


Hortonworks’ Hadoop secret weapon is... Yahoo

Derrick Harris:

Hortonworks was working right alongside Yahoo all through that process. They’ve also worked together on things like rolling upgrades so Hadoop users can upgrade software without taking down a cluster.

  1. who didn’t know about Hortonworks and Yahoo’s collaboration?
  2. what company and product management team would choose not to work with one of the largest user of the technology it is working on?

    This is the perfect example of testing and validating new ideas, learning about the pain your customers are facing in real life. Basically by the book product/market fit.

Original title and link: Hortonworks’ Hadoop secret weapon is… Yahoo (NoSQL database©myNoSQL)

via: http://gigaom.com/2014/06/16/when-it-comes-to-hadoop-yahoo-is-still-hortonworks-secret-weapon/


Storing, processing, and computing with graphs

Marko Rodriguez is on the roll with yet another fantastic article about graphs:

To the adept, graph computing is not only a set of technologies, but a way of thinking about the world in terms of graphs and the processes therein in terms of traversals. As data is becoming more accessible, it is easier to build richer models of the environment. What is becoming more difficult is storing that data in a form that can be conveniently and efficiently processed by different computing systems. There are many situations in which graphs are a natural foundation for modeling. When a model is a graph, then the numerous graph computing technologies can be applied to it.

✚ If you missed it, the other recent article I’m referring to is “Knowledge representation and reasoning with graph databases

Original title and link: Storing, processing, and computing with graphs (NoSQL database©myNoSQL)

via: http://www.javacodegeeks.com/2014/06/on-graph-computing.html


Consensus-based replication in HBase

Konstantin Boudnik (WANdisco):

The idea behind consensus-based replication is pretty simple: instead of trying to guarantee that all replicas of a node in the system are synced post-factum to an operation, such a system will coordinate the intent of an operation. If a consensus on the feasibility of an operation is reached, it will be applied by each node independently. If consensus is not reached, the operation simply won’t happen. That’s pretty much the whole philosophy.

Not enough details, but doesn’t this sound like Paxos applied earlier?

Original title and link: Consensus-based replication in HBase (NoSQL database©myNoSQL)

via: http://blogs.wandisco.com/2014/06/16/consunsus-based-replication-hbase/


Where to look for Hadoop reliability problems

Dan Woods (Forbes) gets a list of 10 possible problems in Hadoop from Raymie Stata (CEO Altiscale) that can be summarized as:

  1. using default configuration options
  2. doing no tuning
  3. understanding Amazon Elastic MapReduce’s behavior

Original title and link: Where to look for Hadoop reliability problems (NoSQL database©myNoSQL)

via: http://www.forbes.com/sites/danwoods/2014/06/16/solving-the-mystery-of-hadoop-reliability/


A Story of graphs, DBs, and graph databases

After Marko Rodriguez’s Knowledge representation and reasoning with graph databases, another great intro to graph databases resource is Joshua Shinavier’s presentation:


Knowledge representation and reasoning with graph databases

A graph database and its ecosystem of technologies can yield elegant, efficient solutions to problems in knowledge representation and reasoning. To get a taste of this argument, we must first understand what a graph is.

And Marko Rodriguez delivers a dense but very readable intro to modeling with graphs.

Original title and link: Knowledge representation and reasoning with graph databases (NoSQL database©myNoSQL)

via: http://www.javacodegeeks.com/2014/06/knowledge-representation-and-reasoning-with-graph-databases.html


Dude, missing indexes? Seriously….

dude missing indexes

The problem is not the tool itself

I didn’t know about CommitStrip. Until now.

Original title and link: Dude, missing indexes? Seriously…. (NoSQL database©myNoSQL)


Apache Kafka: Next generation distributed messaging system

Abhishek Sharma in an 3000 words article on InfoQ:

Its architecture consists of the following components:

  • A stream of messages of a particular type is defined as a topic. A Message is defined as a payload of bytes and a Topic is a category or feed name to which messages are published.
  • A Abhishek Sharma can be anyone who can publish messages to a Topic.
  • The published messages are then stored at a set of servers called Brokers or Kafka Cluster.
  • A Consumer can subscribe to one or more Topics and consume the published Messages by pulling data from the Brokers.

Producer can choose their favorite serialization method to encode the message content. For efficiency, the producer can send a set of messages in a single publish request. Following code examples shows how to create a Producer to send messages.

Kafka is an amazing system. I just wish the article would have actually looked into what makes it unique and how it compares to systems like RabbitMQ or ActiveMQ.

✚ Cameron Purdy in one of the comments:

If you carefully read the article, you’ll note that Kafka is not actually a message queue. It’s just a specialized database with some messaging semantics in its API. That means if you need the behaviors that you would associate with a message queue, you can’t get them with Kafka (or if you can, the performance will plummet.)

Original title and link: Apache Kafka: Next generation distributed messaging system (NoSQL database©myNoSQL)

via: http://www.infoq.com/articles/apache-kafka