ElasticSearch: All content tagged as ElasticSearch in NoSQL databases and polyglot persistence
Monday, 15 April 2013
Apache Solr Versus ElasticSearch - the Feature Smackdown
Pretty thorough comparison of the feature sets in Solr and ElasticSearch put together by Kelvin Tan with 4 main sections: API, indexing, searching, customizability, distributed, but many many features considered for each of them.
✚ The complete website source is on GitHub so if one would like to improve it, it’s easy.
✚ Feature checklists should not be used to making final technical decisions. But they are extremely useful in the early stages of the decision process when having to go through a lot of options.
✚ I know this will Solr vs ElasticSearch comparison will evolve over time, so I’ve starred the project on Github and also saved the current version as PDF.
Original title and link: Apache Solr Versus ElasticSearch - the Feature Smackdown (©myNoSQL)
Friday, 16 March 2012
Real-Time Search With MongoDB and Elasticsearch
Interesting usage of the MongoDB oplog to replace the lack of storage notifications:
ElasticSearch has a built in feature of Rivers, which are essentially plugins for specific services to constantly stream in new updates for indexing. Unfortunately, there’s no MongoDB River (probably due to the lack of built-in database triggers), so I did some research and realized that I could use the MongoDB oplog to continually capture updates to our main databases.
Kristina Chodorow has two posts—here and here—detailing what’s stored in the oplog.
Original title and link: Real-Time Search With MongoDB and Elasticsearch (©myNoSQL)
via: http://amberonrails.com/real-time-search-with-mongodb-and-elasticsearch/
Tuesday, 6 March 2012
Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth
Grant Ingersoll:
Instead of doing all the extra work of making sure instances are up, etc., however, I am going to focus on using some of the new features of Solr4 (i.e. SolrCloud whose development effort has been primarily led by several of my colleagues: Yonik Seeley, Mark Miller and Sami Siren) which remove the need to figure out where to send documents when indexing, along with a convenient Hadoop-based document processing toolkit, created by Julien Nioche, called Behemoth that takes care of the need to write any Map/Reduce code and also handles things like extracting content from PDFs and Word files in a Hadoop friendly manner (think Apache Tika run in Map/Reduce) while also allowing you to output the results to things like Solr or Mahout, GATE and others as well as to annotate the intermediary results.
I have to agree with Karussell:
Scaling Solr means using Solr AND X AND Y AND… Scaling ElasticSearch means using ElasticSearch
Original title and link: Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth (©myNoSQL)
Wednesday, 8 February 2012
Fulltext search your CouchDB in Ruby
When having to choose what library to use for full text indexing of CouchDB data for a Ruby application, Taylor Luk looked at from Sphinx, Lucene, Ferret, Xapian and decided to go with Xapian with Xapit . Besides the fact that Xapian with Xapit offers a clean interface and customization of the indexing process, there seem to be quite a few important limitations:
- Xapit is still under active development
- You need to trigger Index update manually
- It doesn’t Incremental index update at the moment
I know some are afraid of managing a Java stack, but in the land of indexing, Lucene, Solr, ElasticSearch, IndexTank are the most powerful tools.
Original title and link: Fulltext search your CouchDB in Ruby (©myNoSQL)
via: http://taylorluk.com/post/17255656638/fulltext-search-your-couchdb-in-ruby
Wednesday, 1 February 2012
Getting off the CouchDB... or Lessons Learned while Experimenting in Production
The move to CouchDB went well. Pages in our web application that would occasionally time out were now loading in a couple of seconds. And, our MySQL database was much, much happier. We liked CouchDB so much that we started planning a feature that would make heavy use of CouchDB’s schema-less nature.
And that’s when the wheels came off.
Word of caution: this is not the “CouchDB sucks so we went with MongoDB” type of post. It’s more of “we thought CouchDB can solve one of our problems, but then got confused and thought it can solve world hunger. So we decided to throw a bunch of data to it to see if it sticks. Surprise! It didn’t.”
Just to be clear, I’m not defending CouchDB and everything John Wood writes about it is correct. It’s just that experimenting with CouchDB in a non-production environment or at least reading myNoSQL would have already offered all those answers.
Original title and link: Getting off the CouchDB… or Lessons Learned while Experimenting in Production (©myNoSQL)
via: http://blog.signalhq.com/2012/01/24/getting-off-the-couchdb/
Wednesday, 15 June 2011
Choosing Technologies: The Library of Congress and the Twitter Archive
Remember when everyone was suggesting solutions for Twitter architecture? Now the Library of Congress is trying to figure out what technologies to use to store the Twitter archive:
The project is still very much under construction, and the team is weighing a number of different open source technologies in order to build out the storage, management and querying of the Twitter archive. While the decision hasn’t been made yet on which tools to use, the library is testing the following in various combinations: Hive, ElasticSearch, Pig, Elephant-bird, HBase, and Hadoop.
Note that in terms of storage only HBase is mentioned—Twitter’s main tweet storage is MySQL though.
Original title and link: Choosing Technologies: The Library of Congress and the Twitter Archive (NoSQL database©myNoSQL)
Tuesday, 24 May 2011
ThriftDB: The Amazon Web Services of Search
ThriftDB presented today at TechCrunch Disrupt:
Technically speaking, ThriftDB is a flexible key-value datastore with search built in that has the flexibility, scalability, and performance of a NoSQL datastore with the capabilities of full-text search. Essentially, what this means is that, by combining the datastore and the search engine, ThriftDB is offering a service that makes it easy for developers to build fast, horizontally-scalable applications with integrated search.
The website says ThriftDB is a document database built on top of Thrift with full-text search support. I’m not really sure about the Amazon Web Services for Search, but it sounds like it would go against Marklogic, ElasticSearch, Solr, and so on.
Original title and link: ThriftDB: The Amazon Web Services of Search (NoSQL databases © myNoSQL)
via: http://techcrunch.com/2011/05/24/thriftdb-wants-to-be-the-amazon-web-services-of-search/
Tuesday, 16 November 2010
Full text search with MongoDB and Lucene analyzers
Johan Rask:
It is important to understand that for a full fledged full text search engine, Lucene or Solr is still your choice since it has many other powerful features. This example only includes simple text searching and not i.e phrase searching or other types of text searches, nor does it include ranking of hits. But, for many occasions this is all you need but then you must be aware of that especially write performance will be worse or much worse depending on the size of the data your are indexing. I have not yet done any search performance tests for this so I am currently totally unaware of this but I will publish this as soon as I can.
Just a couple of thoughts:
- Besides Lucene and Solr, ☞ ElasticSearch is another option you should keep in mind
- your application will have to deal maintaining the index (adding, updating, removing). MongoDB currently lacks a notification mechanism that would help you decouple this. Something a la CouchDB
_changesfeed or Riak post-commit hooks (nb: leaving aside that starting with version 0.133 Riak search is available)
Original title and link: Full text search with Mongodb and Lucene analyzers (NoSQL databases © myNoSQL)
via: http://blog.jayway.com/2010/11/14/full-text-search-with-mongodb-and-lucene-analyzers/
Wednesday, 10 November 2010
Cassandra and ElasticSearch backends for Django-nonrel in development
Django continues his path towards NoSQL:
Rob Vaterlaus has started working on a Cassandra backend and Alberto Paro is working on an ElasticSearch backend for Django-nonrel.
The Cassandra backend is still experimental and lacks support for ListField (from djangotoolbox.fields), but overall it already looks very very interesting. This backend comes with experimental secondary indexes support for Cassandra and requires a recent Cassandra 0.7 build.
Currently supported: App Engine and MongoDB.
Original title and link: Cassandra and ElasticSearch backends for Django-nonrel in development (NoSQL databases © myNoSQL)
Tuesday, 9 November 2010
Why Redis? And Memcached, Cassandra, Lucene, ElasticSearch
Why do we keep jumping from one storage engine to another? Can’t we make up our minds already and settle with the “best” storage engine that meets our needs?
In short, No.
A common misconception is the belief that all storage engines are created equal, all designed to simply “store stuff” and provide fast access to your data. Unless your application performs one clearly defined simple task, it is a dire mistake to expect a single storage engine will effectively fulfill all of your data warehousing and processing needs.
I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency.
Original title and link: Why Redis? And Memcached, Cassandra, Lucene, ElasticSearch (NoSQL databases © myNoSQL)
Thursday, 4 November 2010
Terrastore and ElasticSearch to Replace MySQL, Memcached and Sphinx
Currently we are using PHP, MySQL, Sphinx, and Memcached to serve up pages so quick. […]
[…] Our (MY) final decision was to use Terrastore. I’m not sure if it is the fastest, but it is fast. The main reason is how easy it is to scale with growth, how well it protects the data and keeps multiple copies always available, and the fast release cycle which means it is always improving.
As a replacement for Sphinx , we have considered many, but have landed on ElasticSearch, which just so happens to have a direct integration with Terrastore. A no-brainer for us to choose ElasticSearch for our search and ranking algorithms.
While each piece is important, sometimes it is also about the combo.
Original title and link: Terrastore and ElasticSearch to Replace MySQL, Memcached and Sphinx (NoSQL databases © myNoSQL)
Tuesday, 19 October 2010
Integrating ElasticSearch and CouchDB
This tutorial explains the process of setting up ElasticSearch to automatically index data in CouchDB and make it search-able. ElasticSearch 0.11 introduced a feature named The River, which allows it to connect to external systems and listen for documents updates. On receiving a notification, Elasticsearch indexes the data and makes it available for search.
In a nutshell, the solution uses what I’ve mentioned in previous posts: a combination of CouchDB _changes and an ElasticSearch automatic pull mechanism.
Original title and link: Integrating ElasticSearch and CouchDB (NoSQL databases © myNoSQL)
via: http://github.com/elasticsearch/elasticsearch/wiki/Couchdb-integration
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling