mongodb: All content tagged as mongodb in NoSQL databases and polyglot persistence
Monday, 14 January 2013
Short Demo of MongoDB Text Search and Hashed Shard Keys
Staying on the subject of MongoDB full text search—see here and here—a 10 minutes demo of the new feature:
Original title and link: Short Demo of MongoDB Text Search and Hashed Shard Keys (©myNoSQL)
Full Text Search in MongoDB: Details About Languages and Queries
Another post about the upcoming MongoDB full text search, this one adds some more details about supported languages and queries:
- Support for Latin based languages initially, with plans for other character sets later. Initially this will be: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
- Support for advanced queries, similar to the Google search syntax e.g. negation and phrase matching.
It’s worth emphasizing that the post refers to character sets when speaking about supported languages, but not about stemming which differs for many of those.
Original title and link: Full Text Search in MongoDB: Details About Languages and Queries (©myNoSQL)
via: http://blog.serverdensity.com/full-text-search-in-mongodb/
MongoDB Full Text Search Explained
Tobias Trelle explains the features planned for the full text support coming in MongoDB 2.4: stop words, (basic) stemming, full text indexes, and API:
The upcoming release 2.4 of MongoDB will include a first, experimental support for full text search (FTS). This feature was requested early in the history of MongoDB as you can see from this JIRA ticket: SERVER-380. FTS is first available with the developer release 2.3.2.
Couple of reasons for MongoDB including full text search:
- highly requested feature (239 votes, 193 watchers, 42 participants)
- (high level) feature parity with MySQL
- NIH
The majority of databases support full text indexing, but almost everyone needing good full text search ends up using Lucene or Solr or Elastic Search or Sphinx.
Original title and link: MongoDB Full Text Search Explained (©myNoSQL)
via: http://blog.codecentric.de/en/2013/01/text-search-mongodb-stemming/
Friday, 11 January 2013
MongoMem: Memory Usage by Collection in MongoDB
MongoMem, a Python tool, by Wish Tech team:
Today, we’re releasing the first of these tools, MongoMem. MongoMem solves the age-old problem of figuring out how much memory each collection is using. In MongoDB, keeping your working set in memory is pretty important for most apps. The problem is, there’s not really a way to get visibility into the working set or what’s in memory beyond looking at resident set size or page faults rate.
Original title and link: MongoMem: Memory Usage by Collection in MongoDB (©myNoSQL)
via: http://eng.wish.com/mongomem-memory-usage-by-collection-in-mongodb/
Storing Tree Like Hierarchy Structures With MongoDB
Vyacheslav Voronenko expands a bit on the Model tree structures in MongoDB article and provides some code snippets for common operations:
In a real life almost any project deals with the tree structures. Different kinds of taxonomies, site structures etc require modelling of hierarhy relations. In this article I will illustrate using first three of five typical approaches of operateting with hierarchy data on example of the MongoDB database. Those approaches are:
- Model Tree Structures with Child References
- Model Tree Structures with Parent References
- Model Tree Structures with an Array of Ancestors
- Model Tree Structures with Materialized Paths
- Model Tree Structures with Nested Sets
The 2nd part of the article is available here and all the code is on GitHub
Original title and link: Storing Tree Like Hierarchy Structures With MongoDB (©myNoSQL)
via: http://www.codeproject.com/Articles/521713/Storing-Tree-like-Hierarchy-Structures-With-MongoD
Monday, 7 January 2013
Oops Replication - MongoDB Secondary Node Data Loss
I have two mongod instances without replication each having same collection name but different data.Now initialized replication between them.Secondary machine copies all data from primary machine and looses it’s original data.Can I recover original data present in secondary machine ?
Leaving aside the typos in the question (and any resentments they might generate), would you consider this the expected behavior? To me this sounds like a conflict in the setup and the database should error.
Original title and link: Oops Replication - MongoDB Secondary Node Data Loss (©myNoSQL)
Exploring Google Analytics Data With Clojure, Incanter, and MongoDB
Arnold Matyasi posted 4 articles (with Clojure code, charts, and explanations) on how to analyze Google Analytics data locally with Clojure, Incanter, and MongoDB:
- Part 1: exporting data, setup, Clojure helper functions
- Part 2: first charts
- Part 3: grouping data
- Part 4: implementing weighted sort
Original title and link: Exploring Google Analytics Data With Clojure, Incanter, and MongoDB (©myNoSQL)
Thursday, 3 January 2013
A SO Answer About MongoDB Memory Usage
MongoDB isn’t a write-to-disk application unlike SQL. So it writes to a fsync queue first which actually gets managed by the OS (this is fundamentally NoSQL here to not write straight to disk, “Eventually Consistent”), since MongoDB does no memory management of its own.
The OS itself will then decide when paged in data should be removed using the LRU algorithm: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
Added ontop of this is that you must load a part of the _id btree each time you insert as well which is ever growing due to insertions. This means that actually if you do mass inserts in a short period of time you could end up loading more of the btree than you need to because the older section of the btree have not been seen as stale yet.
How many “strange” explanations have you counted in the above answer?
Original title and link: A SO Answer About MongoDB Memory Usage (©myNoSQL)
Thursday, 25 October 2012
YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak
Put together by the team at Altoros Systems Inc., this time run in the Amazon EC2 and including Cassandra, HBase, MongoDB, MySQL Cluster, sharded MySQL and Riak:
After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.
Teaser: HBase got the best results in most of the benchmarks (with flush turned off though). And I’m not sure the setup included the latest HBase read improvements from Facebook.
Original title and link: YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak (©myNoSQL)
Wednesday, 24 October 2012
MongoHQ Raises More Funding for MongoDB as Service Engine
Alex Williams for TechCrunch:
MongoHQ has raised $6 million from Trinity Ventures and a host of investors for its database service for developers. The company will use the funds to expand its public cloud offering and improve its management tools for MongoDB, the popular NoSQL database.
Undeniably there’s a lot of demand for MongoDB. 10gen is also growing fast.
But how safe is to build a business around a product that is not in your control?
Original title and link: MongoHQ Raises More Funding for MongoDB as Service Engine (©myNoSQL)
via: http://techcrunch.com/2012/10/18/mongohq-raises-6-million-for-database-as-service-engine/
10gen Transitioning From Startup to Corporation
I’ve spent most of my career in startups or small companies, that sometimes interacted with large corporation. I’ve also worked a couple of years within a large corporation. But I’ve never been through the transition from startup to corporation.
This is the phase 10gen, the company behind MongoDB, is in right now and they are hiring positions like VP of business development (Ed Albanese, ex-Cloudera), VP of corporate strategy (Matt Asay, ex-Nodeable, Alfresco, Canonical), and VP of services and product management (Ron Avnur, ex-MarkLogic).
In his first post for 10gen, Matt Asay cites 10gen president Max Schireson:
By far our most important competitor is Oracle. After that it’s Oracle, Oracle and Oracle. I see other NoSQL players such as DataStax [distributor of Apache’s Cassandra] and CouchDB as comrades in arms in the battle to persuade people that the answer does not have to be Oracle.
Original title and link: 10gen Transitioning From Startup to Corporation (©myNoSQL)
Monday, 22 October 2012
5 Things to Monitor in MongoDB
In his “what I’ve learned while using MongoDB for an year” post, Simon Maynard recommends 5 metrics to always monitor:
- index sizes
- current ops
- index misses
- replication lag
- I/O performance
The 1st and the 3rd are about making sure all your MongoDB working set (including indexes) fits in RAM.
Original title and link: 5 Things to Monitor in MongoDB (©myNoSQL)
via: http://snmaynard.com/2012/10/17/things-i-wish-i-knew-about-mongodb-a-year-ago/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling