NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



geo: All content tagged as geo in NoSQL databases and polyglot persistence

New Geo Features in MongoDB 2.4

The primary conceptual difference (though there are also many functional differences) between the 2d and 2dsphere indexes, is the type of coordinate system that they consider. Planar coordinate systems are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper spherical coordinates becomes important.

I don’t know anything about geo, so I’ll leave this up for experts to comment on.

✚ There’s actually something I like about this announcement: the fact that MongoDB decided to use an existing standard instead of coming up with its own custom solution.

Original title and link: New Geo Features in MongoDB 2.4 (NoSQL database©myNoSQL)


GIS Tools for Hadoop by Esri

Interesting project, GIS Tools for Hadoop:

GIS Tools for Hadoop is an open source toolkit intended for Big Spatial Data Analytics. The toolkit provides different libraries:

  • Esri Geometry API for Java: A generic geometry library, can be used to extend Hadoop core with vector geometry types and operations, and enables developers to build MapReduce applications for spatial data.
  • Spatial Framework for Hadoop: Extends Hive and is based on the Esri Geometry API, to enable Hive Query Language users to leverage a set of analytical functions and geometry types. In addition to some utilities for JSON used in ArcGIS.
  • Geoprocessing Tools for Hadoop: Contains a set of ready to use ArcGIS Geoprocessing tools, based on the Esri Geometry API and Spatial Framework for Hadoop. Developers can download the source code of the tools and customize it; they can also create new tools and contribute it to the open source project. Through these tools ArcGIS users can move their spatial data and execute a pre-defined workflow inside Hadoop.

I recently learned about GeoJSON — JSON Geometry and Feature Description, but the two don’t seem to be related.

Original title and link: GIS Tools for Hadoop by Esri (NoSQL database©myNoSQL)


Neo4J Spatial and Gephi for Smart Data Analysis

As I often run the same course, it would be interesting to calculate my average pace at specific locations. When combining the data of all of my courses, I could deduct frequently encountered locations. Finally, could there be a correlation between my average pace and my distance from home? In order to come up with answers to these questions, I will import my running data into a Neo4J Spatial datastore. Neo4J Spatial extends the Neo4J Graph Database with the necessary tools and utilities to store and query spatial data in your graph models. For visualizing my running data, I will make use of Gephi, an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs.

This looks like a great application of a graph database for analyzing geo data. And it’s very practical.

Original title and link: Neo4J Spatial and Gephi for Smart Data Analysis (NoSQL database©myNoSQL)


MongoDB, Data Modeling, and Adoption

Micheal Shallop describes in this post how he “built and re-buit” a geospatial table, replacing several tables in MySQL with MongoDB:

The mongo geospatial repository will be replacing several tables in the legacy mySQL system – as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is: no more hideous trigonometric mySQL queries!

But what actually picked my attention was this paragraph:

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

This made me realize that MongoDB adoption is benefiting hugely from the fact that its data model and querying are the closest to the relational databases, neither requiring a radical mindshift from developers that have at least once touched a database. It is like knowing a programming language and learning a 2nd one that follows almost the same paradigms.

The same cannot be said about key-value stores, multi-dimensional maps, MapReduce algorithms, or graph databases. Any of these would require one to dismiss pretty much everything learned in the relational model and completely remodel the world. It’s a tougher job, but when used right the reward pays off.

Original title and link: MongoDB, Data Modeling, and Adoption (NoSQL database©myNoSQL)

A Big Data Angle of the Carrier IQ Scandal

These days when talking about Big Data, the discussion is almost always about architectures like Twitter, Facebook, DataSift, or at most the large Hadron collider grid.

But think for a second about the Carrier IQ scandal. This private company is offering phone carriers a solution to track details from and about every device out in the wild. To put this in some context, here is what IBM Chief Scientist Jeff Jonas wrote 2 years ago about mobile location data:

Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day.  Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. 

Even if not all this data is reaching Carrier IQ servers, it still sounds like a lot. And this makes me wonder how many other such unknown/hidden big data stories are out there and when will we start hearing more about their architectures.

Original title and link: A Big Data Angle of the Carrier IQ Scandal (NoSQL database©myNoSQL)

Easy IP Geotargeting with Geokit and MongoMapper

There are several cases in which it might make sense to tailor your app’s content based on a user’s physical location. But asking them directly is a bit of a pain. Luckily, it’s extremely simple to find a user’s location knowing only something you will always know about a visitor: their IP address. Today I’ll walk you through how to use IPs to geolocate your visitors in a Rails application using Geokit and MongoDB’s geospatial indexing with MongoMapper.

And a couple of days ago it was Rails with Geocoder and MongoDB with Mongoid.

Original title and link: Easy IP Geotargeting with Geokit and MongoMapper (NoSQL databases © myNoSQL)


Geolocation, Rails and MongoDB- a recipe for success

‘Geolocation’ seems to be the best dish being served today. Every web-portal, every mobile app wants to be sensitive to a persons location. Everyone wants to see information that is ‘relative’ or location sensitive. Whether its a deal portal, travel portal, social network – giving users information that is relevant to their location bring not only a personalized touch but also keeps tuned in to the portal.

Combining Rails with Geocoder, MongoDB with Mongoid and mongoid-geo, and Google Maps Javascript API.

Original title and link: Geolocation, Rails and MongoDB- a recipe for success (NoSQL databases © myNoSQL)


Intro to MongoDB Geospatial Queries

The post will walk you through creating the needed 2d index and executing geo queries using $near, $maxDistance, $box, $within, $box:

MongoDB has supported geospatial queries for a while, and in the upcoming 1.7 release it’ll get even better. Let’s take a look at how easy it is to query MongoDB in an idiomatic geospatial manner.

In the upcoming release (remember production ready MongoDB versions are using even numbers: 1.2, 1.4, 1.6, 1.8), the geo model will ☞ not be flat anymore.

Original title and link: Intro to MongoDB Geospatial Queries (NoSQL databases © myNoSQL)


Neo4j Gets Geo Support

I guess that’s what kept the Neo Technology — the guys behind Neo4j — busy lately:

The Neo4j Spatial project supports the use of geographic data by providing utilities that simplify and support advanced capabilities like:

  • Storage of geographic features like points, lines and polygons as graphs
  • Indexing and querying based on location with R-trees, Quad-trees and other structures
  • Spatial operations for GIS and Location Based Services
  • Import/Export from existing industry standard formats like shapefiles
  • Exposure of any Neo4j traversal as dynamic layers with points, multilines or polygons
  • Construction of geographic operations from arbitrary combinations of traversals and relevant properties in the Neo4j graph
  • Support for well known libraries and applications


Even a GISer agreed in the past the graph databases are the obivious direction that spatial-enabled databases should take.

So far in the NoSQL space we’ve had CouchDB Geo support with GeoCouch and 2D geo support in MongoDB since its version 1.4

Original title and link: Neo4j Gets Geo Support (NoSQL databases © myNoSQL)

NoSQL Graph Databases and the Future of GIS

Coming from a GISer:

[…] I think this type of database (nb graph databases) is the obvious direction that spatial-enabled databases should take. A lot of our spatial analysis tasks involve searching the relationships between data. This could really expand those functions, and potentially make them quicker. Personally, I think this type of database is the obvious direction that spatial-enabled databases should take. A lot of our spatial analysis tasks involve searching the relationships between data. This could really expand those functions, and potentially make them quicker.


One is topology. What is topology to us but the relationship between different geometries?


The other possibility that I see with this, is relationships between metadata. Metadata in a GIS is boring. Yes it is important, but no one seems to use it, and it is tedious to create. FGDC is a pain. Metadata through relationships sounds a lot more interesting to me.


GeoCouch: Geo Support for CouchDB

At the time I have covered MongoDB 1.4 release I was noticing the community excitement around its geospatial support. A similar initiative was started two years ago to bring geo support to CouchDB:

Why should someone want to put his geodata into a big mess of thousands of documents instead of a nicely structured RDBMS? You don’t have to be a computer scientist to know that retrieving data out of a RDBMS is damn fast and a DODB approach sounds like a slow, “I grep through a long list of files”.

This might partly be true, but high performance shouldn’t be a use case for DODBs. Their flexibility and ease of usage is what they make them perform great. You have the choice between being fast or being flexible.

and now ☞ GeoCouch is here:

An idea has become reality. Exactly two years after the blog post with the initial vision, a new version of GeoCouch is finished. It’s a huge step forward. The first time the dependencies were narrowed down to CouchDB itself. No Python, no SpatiaLite any longer, it’s pure Erlang. GeoCouch is tightly integrated with CouchDB, so you’ll get all the nice features you love about CouchDB.

Release: Production Ready MongoDB 1.4 Released

Judging by the number of posts I’ve seen around I’d guess you’ve already heard about the MongoDB 1.4 release[1]. Anyways, I definitely had to include it here as myNoSQL covers all major NoSQL projects and follows closely all things related to the NoSQL ecosystem.

While some MongoDB users seemed quite excited about the addition of ☞ geospatial indexing, others about some ☞ query language improvements, the things that caught my attention were:

  • background indexing and indexing improvements
  • concurrency improvements
  • the lack of autosharding (still alpha, still pushing, still…)
  • the lack of improvements or alternatives for the MongoDB durability tradeoff

Speaking of performance, the 10gen people[2] have run some benchmarks comparing MongoDB 1.2 with MongoDB 1.4. Without a couple of exceptions, the performance haven’t improved radically, so I’d speculate that there is still a lot of locking involved. The benchmark source code was made available[3] so you can dig deeper into it.

All in all, good and exciting news for the NoSQL world!