ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Membase Amazon SimpleDB MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

oracle: All content tagged as oracle in NoSQL databases and polyglot persistence

The time for NoSQL is now

Andrew C. Oliver:

The transition to NoSQL databases will take time. We still don’t have TOAD, Crystal Reports, query language standardization and other essential tools needed for mass adoption. There will be missteps (i.e. I may need a different type of database for reporting than for my operational system), but I truly think this is one technology that isn’t just marketing.

This coming from someone that was happy to discover back in 1998 all the knobs in Oracle.

Original title and link: The time for NoSQL is now (NoSQL database©myNoSQL)

via: http://osintegrators.com/node/76


12 Hadoop Vendors to Watch in 2012

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)


Oracle Database or Hadoop? And What Led to NoSQL Databases

In a follow up post to SQL or Hadoop: What Tools Should I Use to Process My Data?, Gwen Shapira presents some reasons why, even if many things that fit into Hadoop better, could be done with Oracle, that’s not also a good idea:

But, do you really want to use Oracle to store millions of emails and scanned documents?[1] I have few customers who do it, and I think it causes more problems than it solves. After you stored them, do you really want to use your network and storage bandwidth so  the application servers will keep reading the data from the database? Big data is… big. It is best not to move it around too much and run the processing on the servers that store the data. After all, the code takes fewer packets than the data. But, Oracle makes cores very expensive.  Are you sure you want to use them to run processing-intensive data mining algorithms?

Then there’s the issue of actually programming the processing code. If your big data is in Oracle and you want to process it efficiently, PL/SQL is pretty much the only option. […]

All these are very solid arguments.

Generalizing a bit the point Gwen’s making, I would say that this is exactly the history and what made relational databases successful. Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years[2]. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.


  1. Interestingly, when presented with a Hadoop and Solr solution for archiving emails, I’ve also wondered if that is the best solution.  

  2. This is a bit of an oversimplification to make the point, as there were other obvious technical advantages of relational databases over some of the alternative solutions.  

Original title and link: Oracle Database or Hadoop? And What Led to NoSQL Databases (NoSQL database©myNoSQL)

via: http://www.pythian.com/news/30009/oracle-database-or-hadoop/


Comparing Hadoop Appliances: Oracle’s Big Data Appliance, EMC Greenplum DCA, Netapp Hadooplers

Great post from Gwen Shapira over Pythian diving into the pros and cons of Hadoop appliances vs building your own Hadoop clusters. Plus a comparison of existing Hadoop appliances: Oracle Big Data Appliance, EMC Greenplum DCA, and Netapp Hadooplers.

Another good reason to roll your own is the flexibility: Appliances are called that way because they have a very specific configuration. You get a certain number of nodes, cpus, RAM and storage. Oracle’s offering is an 18 node rack. What if you want 12 nodes? or 23? tough luck. What if you want less RAM and more CPU? you are still stuck.

Original title and link: Comparing Hadoop Appliances: Oracle’s Big Data Appliance, EMC Greenplum DCA, Netapp Hadooplers (NoSQL database©myNoSQL)

via: http://www.pythian.com/news/29955/comparing-hadoop-appliances/


Partnerships in the Hadoop Market

Just a quick recap:

Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.

Original title and link: Partnerships in the Hadoop Market (NoSQL database©myNoSQL)


Oracle Big Data Appliance Released Features Cloudera Distribution of Hadoop: What You Need to Know

Oracle Big Data Appliance hardware specification

Klint Finley for ServicesANGLE:

18 Oracle Sun servers with a total of:

  • 864 GB main memory;
  • 216 CPU cores;
  • 648 TB of raw disk storage;
  • 40 Gb/s InfiniBand connectivity between nodes and other Oracle engineered systems; and,
  • 10 Gb/s Ethernet data center connectivity.

Joab Jackson for PCWorld Business Center:

The package includes 40Gb/s InfiniBand connectivity among the nodes, a rarity among Hadoop deployments, many of which use Ethernet to connect the nodes. Lumpkin said InfiniBand would speed data transfers within the system. Multiple racks can be tethered together in a cluster configuration. There is no theoretical limit to how many racks can be clustered together, though configurations of more than eight racks would require additional switches, Lumpkin said.

Oracle Big Data Appliance software specification

  • Cloudera’s Distribution including Apache Hadoop
  • Cloudera Manager
  • Open source distribution of R
  • Oracle NoSQL Database Community Edition
  • Oracle Big Data Connectors
  • Oracle Linux

Joab Jackson for PCWorld Business Center:

Along with the release, Oracle also released Oracle Big Data Connectors, a set of drivers for exchanging data between the Big Data Appliance and other Oracle products, such as the Oracle Database 11g, the Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud and Oracle Exalytics In-Memory Machine.

Derrick Harris for GigaOm:

However, Oracle isn’t blind to the fact that not everyone will be gung ho about buying an appliance. Its custom-built Big Data Connectors are available as separate products for those customers wanting to connect existing Hadoop clusters to Oracle database environments or R statistical-analysis environments.

Klint Finley for ServicesANGLE:

According to Oracle’s announcement “The integrated Oracle and Cloudera architecture has been fully tested and validated by Oracle, who will also collaborate with Cloudera to provide support for Oracle Big Data Appliance.”

Oracle Big Data Appliance Services

George Lumpkin, Oracle’s vice president of data warehousing product management:

Oracle will provide first-line support for the appliance and all software (including the Hadoop distribution and Cloudera Manager) through its case-tracking support infrastructure. But when particularly tough support cases arise, Oracle will tap Cloudera’s expertise.

What’s more, Oracle will refer customers to Cloudera for Hadoop training and consulting engagements.

Oracle Big Data Appliance Positioning

George Lumpkin, Oracle’s vice president of data warehousing product management:

We are positioning this as something that runs alongside other Oracle-based systems. Big data is more than just a cluster of hardware running Hadoop. It is an overall information architecture for enabling companies to analyze data and make decisions.

Doug Hanshen for Informationweek:

Oracle highlighted the Big Data Appliance as a complement to a growing family of “engineered systems” that now includes Exadata, Exalogic, and the Exalytics In-Memory Machine.

Merv Adrian (Gartner analyst) cited by Informationweek:

But what’s more remarkable is the fact that Oracle is finally looking beyond its core database. Oracle’s TimesTen and Essbase databases, which were recently upgraded for use in the Exalytics appliance, and BerkeleyDB, which was Oracle’s development starting point for the new NoSQL database, are examples of that shift.

Oracle is suddenly beginning to act as a data-management portfolio company, not just a company with a big brother and a bunch of starving siblings.

Joab Jackson for PCWorld Business Center:

Oracle is positioning the appliance for managing and analyzing large sets of data that may be too large, or otherwise unsuitable for keeping in databases, such as telemetry data, click-stream data or other log data. “You may not want to keep the data in a database, but you do want to store it and analyze it,” Lumpkin said. The appliance is intended for those organizations that want to undertake Big Data-style analysis but may not have the in-house expertise to assemble large Hadoop or NoSQL-based systems.

Pricing

Kurt Dunn, Cloudera’s chief operating officer told InformationWeek.

Oracle has put together a very comprehensive product that is priced very well.

Brian Proffitt for ITworld:

The cost of the Big Data Appliance is what will really stand out. At $500,000, this may not seem like a bargain, but in reality it is. Typically, commoditized Hadoop systems run at about $4,000 a node. To get this much data storage capacity and power, you would need about 385 nodes… which puts the price tag at around $1.54 million—three times the price of Oracle’s Cloudera-based offering (which, I should add, excludes things like support costs and power).

Doug Hanshen for Informationweek:

The hardware and software combined will sell for $450,000, with an annual support fee for both hardware and software of 12%. That’s highly competitive, working out to less than $700 per terabyte and being in line with the low costs big data practitioners expect from deployments built on commodity hardware.

Oracle - Cloudera Parternship

I wrote earlier my take on what this partnership means to both Oracle and Cloudera.

Doug Hanshen for Informationweek:

But by releasing the product early in the year in partnership with Cloudera, which has more customers and years in the market than any other Hadoop software and services provider, Oracle has made it clear that it is wasting no time and taking no chances with unproven technology.

“Cloudera brings us a couple of very important missing pieces, including its management software and assistance for a deeper second- and third-tier level of support,” said George Lumpkin, Oracle’s vice president of product management, data warehousing.

Speculations about the future of the Oracle - Cloudera partnership

Brian Proffitt for ITworld:

Students of Linux history will well remember that’s exactly what happened when Oracle partnered with Red Hat to introduce commoditized Oracle offerings… and then Larry Ellison and crew decided to roll their own Oracle Enterprise Linux in 2006 when they decided to cut Red Hat out of the stack.

This is strong historical evidence that Oracle will do the same with Cloudera, because frankly the big data market is too big for Oracle not to want to own. Big Data Appliance customers should note this, and be very prepared that future versions may not be tied to Cloudera at all, but rather Oracle’s version of Hadoop.

A few people suggested on Twitter that this partnership is a sign of a possible Oracle’s acquisition of Cloudera. TechCrunch’s Leena Rao links to an old post by Matt Asay suggesting this acquisition.

Media coverage of Oracle Big Data Appliance

Original title and link: Oracle Big Data Appliance Released Features Cloudera Distribution of Hadoop: What You Need to Know (NoSQL database©myNoSQL)


Cloudera Distribution of Hadoop Powers Oracle’s Big Data Appliance

The announcement of the Oracle Big Data Appliance was out for a couple of hours and already hit all media sites. Before looking at the details of the announcement, let’s try to understand what this announcement means for the parties involved.

What does it mean for Oracle?

  • Oracle enters a very busy Hadoop market associated with the best known company in the Hadoop ecosystem
  • With this partnership, Oracle didn’t have to make a huge investment in software development or services
  • Not having to build its own distribution of Hadoop, Oracle could focus on developing the Oracle Big Data Connectors
  • Oracle will delegate everything Hadoop to Cloudera thus it won’t have to deal with a very fast evolving open source project that might see some interesting events due to the
  • Oracle seems to have changed the message about Hadoop being used only for basic ETL.

What does it mean for Cloudera?

  • Cloudera gets access to a pool of customers (many of them possibly very large customers)
  • Cloudera will not need a big sales force to reach to these possible customers. Even if Cloudera knew about them, Oracle’s sales force will do the job
  • If Oracle spells Cloudera’s name in every sales pitch, Cloudera will see a huge publicity bump that will sooner or later lead to more customers

Truth is I was expecting yet another distribution of Hadoop. And even if Oracle’s Big Data Appliance doesn’t feature the official Apache Hadoop distribution, I think that by choosing an existing distribution, Oracle did the right thing. For them and for their customers.

Original title and link: Cloudera Distribution of Hadoop Powers Oracle’s Big Data Appliance (NoSQL database©myNoSQL)


Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?


  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)


IBM DB2 to Include NoSQL Features

It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.

Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:

So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.

Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.

When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:

Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”

Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.

Original title and link: IBM DB2 to Include NoSQL Features (NoSQL database©myNoSQL)


Oracle, Big Data, Hadoop...There Is Nothing to See Here

Rob Thomas:

Anyone that has spent any time looking at Hadoop/Big Data and has actually talked to a client, knows a few basic things:

  1. Big Data platforms enable ad-hoc analytics on non-relational (ie unmodelled data). This allows you to uncover insights to questions that you never think to ask. This is simply not possible in a relational database.

  2. You cannot deliver true analytics of Big Data relying only on batch insights. You must deliver streaming and real-time analytics. That is not possible if you are biased towards putting everything in a database, before doing anything.

  3. Clients will demand that Big Data platforms connect to their existing infrastructure. Clients don’t think that Big Data platforms exist solely for the purpose of populating existing relational systems. Big difference.

As I pointed out before, Oracle is neither the first nor the last using this strategy. But I don’t think this “let them believe we are providing Hadoop integration, but all we want is to push our hardware and databases” approach will sell very well.

Original title and link: Oracle, Big Data, Hadoop…There Is Nothing to See Here (NoSQL database©myNoSQL)

via: http://www.robdthomas.com/2011/09/please-dispersethere-is-nothing-to-see.html


Hadoop: It's Still a Niche Technology

In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:

It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.

Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.

These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.

So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.

Original title and link: Hadoop: It’s Still a Niche Technology (NoSQL database©myNoSQL)

via: http://searchdatamanagement.techtarget.co.uk/feature/Hadoop-for-big-data-puts-architects-on-journey-of-discovery