ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mongodb: All content tagged as mongodb in NoSQL databases and polyglot persistence

10gen Transitioning From Startup to Corporation

I’ve spent most of my career in startups or small companies, that sometimes interacted with large corporation. I’ve also worked a couple of years within a large corporation. But I’ve never been through the transition from startup to corporation.

This is the phase 10gen, the company behind MongoDB, is in right now and they are hiring positions like VP of business development (Ed Albanese, ex-Cloudera), VP of corporate strategy (Matt Asay, ex-Nodeable, Alfresco, Canonical), and VP of services and product management (Ron Avnur, ex-MarkLogic).

In his first post for 10gen, Matt Asay cites 10gen president Max Schireson:

By far our most important competitor is Oracle. After that it’s Oracle, Oracle and Oracle. I see other NoSQL players such as DataStax [distributor of Apache’s Cassandra] and CouchDB as comrades in arms in the battle to persuade people that the answer does not have to be Oracle.

Original title and link: 10gen Transitioning From Startup to Corporation (NoSQL database©myNoSQL)


5 Things to Monitor in MongoDB

In his “what I’ve learned while using MongoDB for an year” post, Simon Maynard recommends 5 metrics to always monitor:

  1. index sizes
  2. current ops
  3. index misses
  4. replication lag
  5. I/O performance

The 1st and the 3rd are about making sure all your MongoDB working set (including indexes) fits in RAM.

Original title and link: 5 Things to Monitor in MongoDB (NoSQL database©myNoSQL)

via: http://snmaynard.com/2012/10/17/things-i-wish-i-knew-about-mongodb-a-year-ago/


Pig the Big Data Duct Tape: Examples for MongoDB, HBase, and Cassandra

A three part article from Hortonworks showing how Pig can be used with MongoDB, HBase, and Cassandra:

Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems, to enable you to process data from wherever and to wherever you like.

Original title and link: Pig the Big Data Duct Tape: Examples for MongoDB, HBase, and Cassandra (NoSQL database©myNoSQL)


Designing a MongoDB Schema for a Monitoring System

Rick Copeland describes the results of his experiment designing and testing a MongoDB-based monitoring system. Some of his findings are quite interesting:

  1. So documents that grow and grow and grow are a real performance-killer with MongoDB.
  2. BSON actually stores documents as an association list thus lookups are not constant-speed

One aspect that the post doesn’t cover is how MongoDB behaves when there’s a lot of data contention which is usually the case for tracking systems.

Original title and link: Designing a MongoDB Schema for a Monitoring System (NoSQL database©myNoSQL)

via: http://blog.pythonisito.com/2012/09/mongodb-schema-design-at-scale.html?spref=tw


I'll Give MongoDB Another Try. In Ten Years... Or Maybe Just Read the Docs First

Diego Basch:

So, one gem install and two lines of code later I was happily inserting documents into a MongoDB server on my puny AWS Micro instance somewhere in Oregon. It worked just fine for all of three weeks.

It is to MongoDB’s credit if they convinced people that all you need to run MongoDB is to install it, install a Ruby library, and start inserting data without ever needing to look at the documentation.

Original title and link: I’ll Give MongoDB Another Try. In Ten Years… Or Maybe Just Read the Docs First (NoSQL database©myNoSQL)

via: http://diegobasch.com/ill-give-mongodb-another-try-in-ten-years


From MongoDB to Riak at Shareaholic

Robby Grossman talked at Boston Riak meetup about Shareaholic’s migration from MongoDB to Riak and their requirements and evaluation of top contenders: HBase, Cassandra, Riak.

Why not MongoDB?

  • working set needs to fit in memory
  • global write lock blocks all queries despite not having transactions/joins
  • standbys not “hot”

Bullet point format pros and cons for HBase, Cassandra, and Riak are in the slides.

via: http://blog.shareaholic.com/2012/08/migrating-to-riak-at-shareaholic/


10Gen: That's the Type of Company We Want to Build

10gen President Max Schireson for PandoDaily:

10Gen’s vision is to build a software platform company akin to Redhat or Oracle, Schireson says. “That’s the type of company we want to build,” he says. “Those companies don’t get acquired.”

Original title and link: 10Gen: That’s the Type of Company We Want to Build (NoSQL database©myNoSQL)

via: http://pandodaily.com/2012/09/03/how-10gen-is-pulling-engineers-from-wall-street-to-become-an-anchor-of-nyc-tech/


Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

It’s unfortunate that the post focuses mostly on the usage of Spring and RabitMQ and the slidedeck doesn’t dive deeper into the architecture, data flows, and data stores, but the diagrams below should give you an idea of this truly polyglot persistentency architecture:

Architecture of Big Data at Aadhaar

Big Data at Aadhaar Data Stores

The slide deck presenting architecture principles and numbers about the platform after the break.


MongoDB GridFS Over HTTP With mod_gridfs

Aristarkh Zagordnikov wrote me an email describing the reasons that led his company create and open source mod_gridfs.

Some time ago we were looking for a way to serve files to the web right from the GridFS database. We considered different options, including IIS handler (we use .NET on Windows as a backend) that requires a Windows machine to serve files (we planned to use Windows as backend only), nginx-gridfs that was too slow (because it’s synchronous and nginx isn’t, and uses the not-very-much-up-to-date MongoDB C driver that doesn’t do connection pooling, etc.) and does not support slaveOk (horizontal sharding).

At last I decided to roll our own method: a module for Apache 2.2 or higher that uses MongoDB’s own C++ driver. It supports replica sets, slaveOk reads, proper output caching headers (Last-Modified, Etag, Cache-Control, Expires), properly responds to conditional requests (If-Modified-Since/If-None-Match), and uses Apache brigade API to serve large files with less in-memory copying.

While Apache isn’t the most resource-friendly server for a high-load environment (it consumes too much memory per connection and does not yet support production-quality event-based I/O), it really shines as a backend for something like nginx+proxy_cache with optional SSD as proxy_cache storage that does the heavy lifting.

Serving a 4KiB file over a gigabit network on modern hardware, 100 concurrent requests, MongoDB replica set of 3 machines as a backend:

  • NGINX + nginx-gridfs: 1.2kr/s
  • Apache + mod_gridfs: 6.6kr/s
  • Apache + mod_gridfs with slaveOk: 12.1kr/s

I didn’t test with larger files, because this way I’ll be benchmarkng OS I/O performance instead of user-mode code.

The public Mercurial repo is here. It uses Simplified 2-clause BSD license, and contains installation instructions and docs in the README file (building might seem hard, but after building if you have to mass-deploy, you just install dependent libraries like boost and copy the mod_gridfs.so file around).

Original title and link: MongoDB GridFS Over HTTP With Mod_gridfs (NoSQL database©myNoSQL)


Klout Data Architecture: MySQL, HBase, Hive, Pig, Elastic Search, MongoDB, SSAS

Just found slideck (embedded below) describing the data workflow at Klout. Their architecture includes many interesting pieces combining both NoSQL and relational databases with Hadoop and Hive and Pig and traditional BI. Even Excel gets a mention in the slides:

  1. Pig and Hive
  2. HBase
  3. Elastic Search
  4. MongoDB
  5. MySQL

Klout Data Architecture


Which Is Better for Programmers: SQL vs. NoSQL?

Jeff Cogswell compares some short code samples in an attempt to answer the much bigger question:

But what about the programmers, who write the client code that access the databases? Where do the disagreements leave them? From a programming perspective, is SQL really that horrible and outdated? Or is the new NoSQL really that awful to work with? Perhaps they both have strengths and good points.

I confess that reading the above made me curious about what the article would conclude. Unfortunately, by the time I’ve read the first comparison (JavaScript in NodeJS using SQL vs Mongo) I realized my expectations were too high. For a few reasons:

  1. it would have been impossible to compare the APIs of all relevant NoSQL databases with a relational database;
  2. it would have been very difficult to choose a generic, representative enough use case;
  3. the results would have always been heavily influenced by the quality of drivers and libraries used.

Last but not least, many of the merits of the NoSQL databases are related to operational complexity and not programming complexity. As someone that did a fare amount of coding and close to zero operations, I would probably feel OK accepting a bit of programming complexity for simplified operations. But that might be just a biased opinion.

Original title and link: Which Is Better for Programmers: SQL vs. NoSQL? (NoSQL database©myNoSQL)

via: http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/


From MongoDB to Cassandra: Why Atlas Platform Is Migrating

Sergio Bossa tells the story of migrating the Atlas platform from using MongoDB to Cassandra emphasizing the reasons behind their decision:

  • It works on the JVM, and we have lots of in-house experience on it.
  • It scales in terms of processing and storage capacity.
  • Its column-based data model gives us some advanced capabilities we will talk about in a few minutes.
  • Its tunable consistency levels provide greater control over high availability and consistency requirements.

As regards what made them look into a different solution:

  • We need higher resiliency to faults: MongoDB provides replica sets, but we’re experiencing lots of problems with replication lags and during replica synchronization.
  • We need higher scalability: MongoDB global lock and huge memory requirements aren’t already going to cope well with our growing data set.

Original title and link: From MongoDB to Cassandra: Why Atlas Platform Is Migrating (NoSQL database©myNoSQL)

via: http://metabroadcast.com/blog/looking-with-cassandra-into-the-future-of-atlas