NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Splunk: All content tagged as Splunk in NoSQL databases and polyglot persistence

Boundary for Splunk app for correlating alerts

Alex Williams for TechCrunch:

Boundary‘s application performance monitoring technology is now integrated into Splunk‘s enterprise platform, providing a window into apps that increasingly are distributed across cloud and on-premise virtualized environments.

At first I thought this means Boundary will use Splunk as the backend for the data. But Boundary is a service so that’s not the case. Plus Splunk can already be used for network management and monitoring.

According to the post, “Splunk real-time alerts are tagged as annotations in Boundary’s time-series graphs. Customers can then correlate alerts against application flow and performance data.” So basically this is monitoring your monitoring system, right?

Original title and link: Boundary for Splunk app for correlating alerts (NoSQL database©myNoSQL)


Hadoop and Splunk Use Cases

Good post from Splunk about the use cases where Hadoop and Splunk coexist and cooperate:

The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.

  1. Splunk then Hadoop
    • Splunk: collects, visualizes and analyzes the data
    • Hadoop: ETL and other batch processing
  2. Hadoop then Splunk
    • Hadoop: collects the data
    • Splunk: visualization
  3. Bi-directional: Splunk and Hadoop collect different artifacts and share the data that Hadoop needs for ETL or batch analytics and Splunk needs for real-time analysis and visualization
  4. Splunk monitors Hadoop

Original title and link: Hadoop and Splunk Use Cases (NoSQL database©myNoSQL)


Splunk Surges After Pricing Above Range in Web Data IPO

The stock, listed on the Nasdaq Stock Market under the symbol SPLK, climbed to $35.48 at the close in New York. The San Francisco-based company raised $229.5 million in its IPO, selling 13.5 million shares at $17 apiece, it said in a statement yesterday. Splunk’s market value of $1.57 billion at the time of the sale jumped to $3.28 billion.

This is a good answer to my question why Splunk filed for IPO. Plus it’s a solid sign that investors see the potential of the Big Data market.

Original title and link: Splunk Surges After Pricing Above Range in Web Data IPO (NoSQL database©myNoSQL)


Big Data Market Analysis: Vendors Revenue and Forecasts

I think this is the first extensive Big Data report I’m reading that includes enough relevant and quite exhaustive data about the majority of players in the Big Data market, plus some captivating forecasts.

As of early 2012, the Big Data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big Data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big Data a practical reality, will result in a super-charged CAGR of 58% between now and 2017.

2011 Big Data Pure-Play Vendors Yealy Big Data Revenue

While there are many stories behind these numbers and many things to think about, here is what I’ve jotted down while studying the report:

  • it’s no surprise that “megavendors” (IBM, HP, etc.) account for the largest part of today’s Big Data market revenue
  • still, the revenue ratio of pure-players vs megavendors feels quite unbalanced: $311mil out of $5.1bil
    • the pure-player category includes: Vertica, Aster Data, Splunk, Greenplum, 1010data, Cloudera, Think Big Analytics, MapR, Digital Reasoning, Datameer, Hortonworks, DataStax, HPCC Systems, Karmasphere
    • there are a couple of names that position themselves in the Big Data market that do not show up in anywhere (e.g. 10gen, Couchbase)
  • this could lead to the conclusion that the companies that include hardware in their offer benefit of larger revenues
    • I’m wondering though what is the margin in the hardware market segment. While not having any data at hand, I think I’ve read reports about HP and Dell not doing so well due exactly to lower margins
    • see bullet point further down about revenue by hardware, software, and services
  • this could explain why so many companies are trying their hand at appliances
  • by looking at the various numbers you can see that those selling appliances usually have a large corporation behind supporting the production costs for hadware and probably the cost of the sales force
  • in the Big Data revenue by vendor you can find quite a few well-known names from the consulting segment
  • the revenue by type pie lists services as accounting for 44%, hardware for 31%, and software for 13% which might give an idea of what makes up the megavendors’ sales packages
    • most of the NoSQL database companies and Hadoop companies are mostly in the software and services segment

Great job done by the Wikibon team.

Original title and link: Big Data Market Analysis: Vendors Revenue and Forecasts (NoSQL database©myNoSQL)


Polyglot Persistence Architecture at Socialize: Splunk for MapReduce & Big Data Analysis

Very informative post on Socialize blog about their data flow and the data analysis stack used to processing it. The post is missing the architecture diagram, so I took the time to reconstruct it based on the details in the article:

Socialize polyglot persistence architecture

Click to view full size diagram of Socialize architecture

The traditional solution is to use aggregate functions in the RDBMS such as count() to get the aggregate results but this presents a few problems at a large scale:

  1. Aggregating rows in a database creates unneeded load on the server
  2. Data could be stored in multiple sharded databases and the aggregated results would be inaccurate.
  3. Data could be stored in other datastore like a NoSQL datastore or even flat log files.
  4. Data is stored in an uncommon format across many sources.

Original title and link: Polyglot Persistence Architecture at Socialize: Splunk for MapReduce & Big Data Analysis (NoSQL database©myNoSQL)

Splunk, the Search Engine for Machine Data Company, Files for IPO. Why?

Splunk, the company which recently announced Shep a solution combining Splunk’s tool for collecting, monitoring, analyzing, searching, and reporting on massive streams of real-time and historical machine data with Hadoop, has filed for IPO.

Giving the following facts:

  1. you are in a market (Big Data, Web of Things) that is confirmed to see tremendous growth
  2. you have over 3300 customers,including a majority of the Fortune 100
  3. your revenues almost doubled year-over-year

the real question to be answered is why filing for IPO?

None of the posts I’ve read (TechCrunch, GigaOM, CTO Vision) gives any answers.

The very next question is who is going to be next rushing to capitalize on the growing trends of Big Data. Many names sprang to mind, but firstly what are your bets?

Original title and link: Splunk, the Search Engine for MacHine Data Company, Files for IPO. Why? (NoSQL database©myNoSQL)

Combining Splunk and Hadoop: Introducing Shep

Shep is what will enable seamless two-way data-flow across the systems (nb: Hadoop and Splunk), as well as opening up two-way compute operations across data residing in both systems.

Shep = Splunk and Hadoop

  • Query both Splunk and Hadoop data, using Splunk as a “single-pane-of-glass”
  • Data transformation utilizing Splunk search commands
  • Real-time analytics of data streams going to mutliple destinations
  • Splunk as data warehouse/marts for targeted exploration of HDFS data
  • Data acquisition from logs and apis via Splunk Universal Forwarder

And in case you don’t know much about Splunk here’s a short interview with Erik Swan, CTO and co-founder of Splunk recorded at Hadoop World 2011 by Barton George of Dell:

What I don’t understand though is why announcing an open source project, but keeping it behind a private beta.

Original title and link: Combining Splunk and Hadoop: Introducing Shep (NoSQL database©myNoSQL)


Explaining Hadoop to Your CEO

Dan Woods (Forbes):

The answer is, yes, Hadoop could be helpful, but there are other technologies as well. For example, technologies such as Splunk allow you to explore big data sets in a way that’s more interactive than most Hadoop implementations. Splunk not only lets you play with big data; you can also distill it and visualize it. Pervasive’s DataRush allows you to write parallel programs using a simplified programming model, and then process lots of data at scale. 1010data allows you to look at a spreadsheet that has a trillion rows, as well as handle time series data. EMC Greenplum and Teradata Aster Data and SAP HANA will also want a crack at your business. If you take any of these technologies and combine them with QlikView, Tableau, or TIBCO Spotfire, you can figure out what a big data set means to your business very quickly. So if your job is understanding the business value of the data, Hadoop is one of many things that you should analyze.


Blah blah blah Big Data, blah blah blah list of vendors, blah blah blah Big Data

It might even work for a dummy CEO.

Original title and link: Explaining Hadoop to Your CEO (NoSQL database©myNoSQL)