NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Oracle: All content tagged as Oracle in NoSQL databases and polyglot persistence

Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?

  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)

IBM DB2 to Include NoSQL Features

It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.

Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:

So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.

Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.

When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:

Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”

Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.

Original title and link: IBM DB2 to Include NoSQL Features (NoSQL database©myNoSQL)

Oracle, Big Data, Hadoop...There Is Nothing to See Here

Rob Thomas:

Anyone that has spent any time looking at Hadoop/Big Data and has actually talked to a client, knows a few basic things:

  1. Big Data platforms enable ad-hoc analytics on non-relational (ie unmodelled data). This allows you to uncover insights to questions that you never think to ask. This is simply not possible in a relational database.

  2. You cannot deliver true analytics of Big Data relying only on batch insights. You must deliver streaming and real-time analytics. That is not possible if you are biased towards putting everything in a database, before doing anything.

  3. Clients will demand that Big Data platforms connect to their existing infrastructure. Clients don’t think that Big Data platforms exist solely for the purpose of populating existing relational systems. Big difference.

As I pointed out before, Oracle is neither the first nor the last using this strategy. But I don’t think this “let them believe we are providing Hadoop integration, but all we want is to push our hardware and databases” approach will sell very well.

Original title and link: Oracle, Big Data, Hadoop…There Is Nothing to See Here (NoSQL database©myNoSQL)


Hadoop: It's Still a Niche Technology

In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:

It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.

Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.

These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.

So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.

Original title and link: Hadoop: It’s Still a Niche Technology (NoSQL database©myNoSQL)


Oracle Big Data Appliance Roundup: What, Why, How

Oracle Big Data Appliance Sales Pitch

The Oracle Database Insider Blog:

Offering customers an end-to-end solution for Big Data, the Oracle Big Data Appliance, in conjunction with Oracle Exadata Database Machine and the new Oracle Exalytics Business Intelligence Machine, delivers everything customers need to acquire, organize, analyze and maximize the value of Big Data within their enterprise.

What’s in the box?

The Oracle Database Insider Blog:

  • Oracle Big Data Appliance: The Oracle Big Data Appliance is an engineered system optimized for acquiring, organizing and loading unstructured data into Oracle Database 11g.
  • Oracle Data Integrator Application Adapter for Hadoop: The new Hadoop adapter simplifies data integration from Hadoop and an Oracle Database through Oracle Data Integrator’s easy to use interface.
  • Oracle Loader for Hadoop: Oracle Loader for Hadoop enables customers to use Hadoop MapReduce processing to create optimized data sets for efficient loading and analysis in Oracle Database 11g. Unlike other Hadoop loaders, it generates Oracle internal formats to load data faster and use less database system resources.
  • Oracle R Enterprise: Oracle R Enterprise integrates the open-source statistical environment R with Oracle Database 11g. Analysts and statisticians can run existing R applications and use the R client directly against data stored in Oracle Database 11g, vastly increasing scalability, performance and security. The combination of Oracle Database 11g and R delivers an enterprise-ready deeply-integrated environment for advanced analytics.

The Oracle Big Data Appliance official page is here.

Oracle Big Data Appliance Market Positioning

Ashok Bindra:

Engineered to work together, the Oracle Big Data Appliance is easily integrated with Oracle Database 11g, Oracle Exadata Database Machine, and Oracle Exalytics Business Intelligence Machine. In essence, said oracle, it is designed to deliver extreme analytics on all data types, with enterprise-class performance, availability, supportability and security.

Shaun Nichols:

Mendelsohn said the company would pitch the Big Data Appliance as a companion to the Exadata platform and an additional tool for understanding customer behaviour rather than just another repository for information.

“Big is interesting, but traditional warehouses deal with that quite well,” he explained.

Jaikumar Vijayan quoting James Kobielus (Forrester Research):

Today’s announcement is likely to put pressure on rivals such as Teradata, IBM, SAP, Microsoft and EMC to ramp up their own offerings. The onus is on them to “match and surpass Oracle in their roadmaps, offerings and partnerships,” Kobielus said. “Forrester expects M&A activity in these arenas to ramp up now that Oracle has made these aggressive moves.”

Chris Kanaracus:

Pricing and a release date for the machine weren’t immediately available on Monday. When available, it will compete with products such as Aster Data, Netezza and Greenplum.

Oracle Big Data Appliance Technical Details

Oracle Big Data Appliance

Alex Gorbachev:

A rack with InfiniBand, full of 2U servers similar to Exadata Storage. No flash storage needed so couple sockets and a dozen of disks will do. Maybe more ram than Exadata storage cells themselves. I suspect you could have as many servers as you want in a configuration but since Hadoop clusters are usually dozens and more nodes, full rack seems reasonable with about 20 Hadoop compute nodes to start with. Real deployments should easily go into multiple racks stacked together.

Timothy Prickett Morgan:

The underlying hardware for the Big Data Appliance is Oracle’s Exadata x86 clusters, which support a parallel implementation of the Oracle 11g R2 database running on top of Oracle’s RHEL-ish clone of Linux. Oracle Enterprise Linux and Oracle’s twist on the open source Xen hypervisor are the appliance’s underlying layer.

Shaun Nichols:

The rack-based appliance will house 18 server systems and will hold up to 432TB of data and 864GB of memory. The appliance will form the basis of the company’s push into the big data management and analysis space.

Gwen Shapira:

The Big Data Appliance (BDA) has 18 Sun x4270 M2 servers per rack. As usual, you can add racks together for larger clusters. Each node has 48G RAM, 12 intel cores and 24Tb of storage. Less memory than in the Exadata 2×2 nodes and no SSD indicates that the plan is to hit the spinning magnetic devices a lot for data storage and processing. Not a big deal in Hadoop where this is the design assumption, but not optimal for the NoSQL portion of the device.

In addition there is 40gb/s infiniband and 10g/s Ethernet. The choice of infiniband for Hadoop machine is a bit odd, since Hadoop was designed to do most of the processing on the machine that holds the data and avoid overloading the network. On the other hand, connecting the Hadoop cluster to an Exadata machine with infiniband will allow for fast data loading. Which is exactly what Oracle is after.

Thomas Kurian (Oracle EVP):

ETL can deploy on the Hadoop cluster and you can model that using Oracle Integrator ETL tool and then deploy that on Hadoop MapReduce platform. We provide load balancing and after preprocessing is done, [the loader moves] the data set into Oracle. The finished data set then can be piped into Exalytics for  analytic dashboards and reports.

Oracle Big Data Appliance: What does it mean to the market and competitors?

Billy Bosworth (DataStax):

I have been around databases for 20 years, and have tons of respect for Oracle.  When someone of their caliber releases a NoSQL solution, it takes us beyond the era of speculation and “niche” and squarely into the mainstream.  It validates our work and our passion and paints a very exciting future for big data databases.

Edd Dumbill:

Whether you use Oracle or not, today’s announcement moves the big data world forward. We have de facto agreement on Hadoop and R as core infrastructure, and we have healthy competition at the database and NoSQL layer.

Max Schireson (10gen):

In my opinion this is a good thing for alternative database vendors. Competition is already thriving in the sector and I don’t think one more competitor, even one as large as Oracle, will alter the  dynamics dramatically. But many customers will take Oracle’s arrival in the space as a sign that this trend is significant and it is a space they should look at. If Oracle’s offering is strong, we may lose some market share to them, but their presence will make it a bigger market.

Klint Finley:

One of the big issues at play here is whether enterprises want expensive Oracle appliances, open core software running on commodity hardware or pay-as-you-go public cloud services. As Wikibon analyst Jeff Kelley notes, “Ellison knows Oracle needs to have some Hadoop/NoSQL offering, but the open source/commodity hardware/scale-out approach to Big Data is the antithesis of the Oracle way: closed source/Sun-only hardware/scale-up.”

French Caldwell (Gartner):

Got big data problems?  Got cloud angst?   Just put all your worries in a big iron box.  At least that’s what I took away after two hours of keynotes from Oracle and EMC executives this morning.   Big data and the cloud are euphemisms for huge information management and business challenges, but listening to the keynotes, you’d think it’s just a technical problem.  The proliferation of vast amounts of unstructured content and a revolution in IT provisioning models, and even digital dependent revenue streams are not issues to be trifled with.  But at the opening of Open World, the dumbing down of these challenges is exactly what happened.  The vision communicated is that the solution is that you can put it all in a big data box, or a BI machine.


Ashok Bindra:

According to Oracle, the Big Data Appliance is a new system that includes an open source distribution of Apache Hadoop, Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and an open source distribution of R.


My predictions turned true. Almost all.

Original title and link: Oracle Big Data Appliance Roundup: What, Why, How (NoSQL database©myNoSQL)

Hadoop and NoSQL Mythbusting

Gwen Shapira:

With all the buzz in OOW about the big data machine, there was also a lot of non-sense flying around. I love it that the Oracle community is finally interested in Hadoop and NoSQL, but I hate it when people sound authoritative without having an actual clue. I’ve left a few presentations with smoke coming out of my ears.

The one that seems to be integral part of the Oracle Big Data Appliance message is that “Hadoop can only be used for basic ETL transformations. Real data analysis has to be done in Oracle and BI tools“. Unfortunately I’ve heard the same thing coming from IBM Netezza.

Original title and link: Hadoop and NoSQL Mythbusting (NoSQL database©myNoSQL)


The Oracle NoSQL Database 11G

A bit after posting my predictions about the Oracle NoSQL database, I’ve received a link to a PDF introducing the Oracle NoSQL database, embedded below for your reference.


  • based on BerkleyDB Java Edition. Thus it is a key-value store
  • it’s a commercial product available as a Community edition and an Enterprise edition
  • single-master with multireplicas.
  • PAXOS-based automated fail-over master election
  • supports configurable consistency policies
  • auto-sharding
  • update: there’s no download available yet, the term mentioned being mid-October

Update: There’s an official product page: Oracle NoSQL Database Technical Overview.

Oracle NoSQL database key features:

  • Simple Data Model
  • Key-value pair data structure, keys are composed of Major & Minor keys
  • Easy-to-use Java API with simple Put, Delete and Get operations
  • Scalability
  • Automatic, hash-function based data partitioning and distribution
  • Intelligent NoSQL Database driver is topology and latency aware, providing optimal data access
  • Predictable behavior
  • ACID transactions, configurable globally and per operation
  • Bounded latency via B-tree caching and efficient query dispatching
  • High Availability
  • No single point of failure
  • Built-in, configurable replication
  • Resilient to single and multi-storage node failure
  • Disaster recovery via data center replication
  • Easy Administration
  • Web console or command line interface
  • System and node management
  • Shows system topology, status, current load, trailing and average latency, events and alerts

The Oracle NoSQL Database and Big Data Appliance

There’s been a lot of speculation about the announcements coming from Oracle’s OpenWorld event. A first part was revealed during the keynote in the form of an in-memory analytics appliance called Exalytics [2]. But there’s talk about a Big Data Appliance and an Oracle NoSQL database.

Here’re my predictions[1]

  1. Oracle became very aggressive in selling products based on hardware, software, and services. So they’ll announce a Hadoop appliance integrated with an existing Oracle product. It could be either the Oracle Exadata or even the newly announced Exalytics.

    This appliance will place Oracle in competition with all other Hadoop appliance sellers: EMC, NetApp, IBM. Also these days most of the analytics databases try to integrate with Hadoop.

  2. Oracle already has a couple of non-relational solutions in their portfolio: BerkleyDB, TimesTen, Coherence. And they’ve already started to test the NoSQL market by announcing the MySQL and MySQL Cluster NoSQL hybrid systems.

    I don’t expect Oracle NoSQL database to be a new product. Just a rebranding or repackaging of one of the above mentioned ones. Probably the TimesTen.

  3. Oracle will invest more into integrating its line of products with Hadoop. Having both a Hadoop and an in-memory analytics appliance will make them very competitive in this space.

  4. Oracle will extend the support for NoSQLish interfaces (memcached) to its other database products.

What are your predictions?

  1. or speculations  

  2. I’m currently gathering more details about Exalytics.  

Original title and link: The Oracle NoSQL Database and Big Data Appliance (NoSQL database©myNoSQL)

Will Oracle Win the NoSQL Competition

I agree this title is misleading but problem is clear: today Oracle does not provide any product can compete with new cloud computing needs and with the NoSQL movement. It is not possibile to think that actually the RAC technology of oracle can be used in a cloud environment and also a cloud service cannot be deployed over an Exadata.

The real question though is if Oracle is really interested by the market currently served by NoSQL databases and/or hybrid solutions. And judging by the latest versions of MySQL and MySQL Cluster[1] it looks like they are testing the waters.

  1. Latest versions of MySQL and MySQL Cluster are adding support for using the Memcached protocol. See NoSQL to MySQL with Memcached  

Original title and link: Will Oracle Win the NoSQL Competition (NoSQL database©myNoSQL)


Enterprise Big Data Stack vs Open Source Big Data Stack

Goldmacher estimated that YouTube consumption—user uploads of 48 hours of video a minute and 3 billion videos a day along with roughly 45 petabytes of viewed videos a day—would require at least 9 full-rack Exadata machines at $1.5 million each. There would be at least 18 Exadata machines to handle spikes. Those machines would add up to 14 Exalogic devices to serve data at $1.1 million per system. The software stack under Oracle would include WebLogic middleware, Oracle databases, Exadata optimized storage and Oracle as operating system. The open source comparison included JBoss middleware, MySQL, Hadoop and Red Hat Enterprise Linux as the OS.

Big Data Enterprise Stack

Big Data Open Source Stack

Credit Peter Goldmacher (Cowen & Co. analyst)

Two comments (the only I have):

  1. what advantages would the enterprise stack offer to justify a 5x cost?
  2. in case all numbers are completely wrong, what’s the advantage of the enterprise stack?

Original title and link: Enterprise Big Data Stack vs Open Source Big Data Stack (NoSQL database©myNoSQL)


Oracle and IBM May Not Know Big Data, but Neither Does Ballmer

The echo chamber is reacting:

Specifically, for a data processing and analytics project to qualify as Big Data, it must encompass not just internal corporate data, but also third-party data that resides outside the firewall, according to Ballmer. He said IBM and Oracle limit their Big Data approaches to internal data, thus they are not in fact Big Data by his definition.


IBM, Oracle and now Microsoft are jockeying to position each of their approaches to Big Data as the industry standard, and Ballmer is clearly trying to steer the Big Data conversation towards Microsoft’s strengths and away from its weaknesses. That means talking up Microsoft’s ability to integrate third-party data with relatively large volumes of corporate data inside Microsoft’s SQL Server R2 Parallel Data Warehouse and away from its lack of petabyte-scale data processing power.

I guess there will be no end to the Oracle-IBM-Microsoft triangle love, so I’ll stop here until real facts are added to the story.

Original title and link: Oracle and IBM May Not Know Big Data, but Neither Does Ballmer (NoSQL database©myNoSQL)