Geo: All content tagged as Geo in NoSQL databases and polyglot persistence
Micheal Shallop describes in this post how he “built and re-buit” a geospatial table, replacing several tables in MySQL with MongoDB:
The mongo geospatial repository will be replacing several tables in the legacy mySQL system – as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates. The tl;dr for this paragraph is: no more hideous trigonometric mySQL queries!
But what actually picked my attention was this paragraph:
What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored. Mongo stores data as a collection of documents. The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.
This made me realize that MongoDB adoption is benefiting hugely from the fact that its data model and querying are the closest to the relational databases, neither requiring a radical mindshift from developers that have at least once touched a database. It is like knowing a programming language and learning a 2nd one that follows almost the same paradigms.
The same cannot be said about key-value stores, multi-dimensional maps, MapReduce algorithms, or graph databases. Any of these would require one to dismiss pretty much everything learned in the relational model and completely remodel the world. It’s a tougher job, but when used right the reward pays off.
Original title and link: MongoDB, Data Modeling, and Adoption ( ©myNoSQL)
But think for a second about the Carrier IQ scandal. This private company is offering phone carriers a solution to track details from and about every device out in the wild. To put this in some context, here is what IBM Chief Scientist Jeff Jonas wrote 2 years ago about mobile location data:
Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not.
Even if not all this data is reaching Carrier IQ servers, it still sounds like a lot. And this makes me wonder how many other such unknown/hidden big data stories are out there and when will we start hearing more about their architectures.
Original title and link: A Big Data Angle of the Carrier IQ Scandal ( ©myNoSQL)
I guess that’s what kept the Neo Technology — the guys behind Neo4j — busy lately:
The Neo4j Spatial project supports the use of geographic data by providing utilities that simplify and support advanced capabilities like:
- Storage of geographic features like points, lines and polygons as graphs
- Indexing and querying based on location with R-trees, Quad-trees and other structures
- Spatial operations for GIS and Location Based Services
- Import/Export from existing industry standard formats like shapefiles
- Exposure of any Neo4j traversal as dynamic layers with points, multilines or polygons
- Construction of geographic operations from arbitrary combinations of traversals and relevant properties in the Neo4j graph
- Support for well known libraries and applications
At the time I have covered MongoDB 1.4 release I was noticing the community excitement around its geospatial support. A similar initiative was started two years ago to bring geo support to CouchDB:
Why should someone want to put his geodata into a big mess of thousands of documents instead of a nicely structured RDBMS? You don’t have to be a computer scientist to know that retrieving data out of a RDBMS is damn fast and a DODB approach sounds like a slow, “I grep through a long list of files”.
This might partly be true, but high performance shouldn’t be a use case for DODBs. Their flexibility and ease of usage is what they make them perform great. You have the choice between being fast or being flexible.
and now ☞ GeoCouch is here:
An idea has become reality. Exactly two years after the blog post with the initial vision, a new version of GeoCouch is finished. It’s a huge step forward. The first time the dependencies were narrowed down to CouchDB itself. No Python, no SpatiaLite any longer, it’s pure Erlang. GeoCouch is tightly integrated with CouchDB, so you’ll get all the nice features you love about CouchDB.
Judging by the number of posts I’ve seen around I’d guess you’ve already heard about the MongoDB 1.4 release. Anyways, I definitely had to include it here as myNoSQL covers all major NoSQL projects and follows closely all things related to the NoSQL ecosystem.
- background indexing and indexing improvements
- concurrency improvements
- the lack of autosharding (still alpha, still pushing, still…)
- the lack of improvements or alternatives for the MongoDB durability tradeoff
Speaking of performance, the 10gen people have run some benchmarks comparing MongoDB 1.2 with MongoDB 1.4. Without a couple of exceptions, the performance haven’t improved radically, so I’d speculate that there is still a lot of locking involved. The benchmark source code was made available so you can dig deeper into it.
All in all, good and exciting news for the NoSQL world!