NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



IBM: All content tagged as IBM in NoSQL databases and polyglot persistence

Jaql: Query Language for JSON in IBM InfoSphere BigInsights

jaql was created and is used by IBM InfoSphere BigInsights—the IBM Apache Hadoop distribution:

Jaql’s query language was inspired by many programming and query languages that include: Lisp, SQL, XQuery, and PigLatin. Jaql is a functional, declarative query language that is designed to process large data sets. For parallelism, Jaql rewrites high-level queries when appropriate into a “low-level” query consisting of Map-Reduce jobs that are evaluated using the Apache Hadoop project. Interestingly, the query rewriter produces valid Jaql queries which illustrates a departure from the rigid, declarative-only approach (but with hints!) of most relational databases. Instead, developers can interact with the “low-level” queries if needed and can add in their own low-level functionality such as indexed access or hash-based joins that are missing from Map-Reduce platforms.

Original title and link: Jaql: Query Language for JSON in IBM InfoSphere BigInsights (NoSQL database©myNoSQL)

IBM Hadoop Commitment

The company also cemented its commitment to the Hadoop open source data analytics tool, identifying it as “the cornerstone of [IBM’s] big data strategy” in a statement.

IBM is the latest in a line of enterprises to stress their commitment to Hadoop. Enterprise storage vendor EMC put a tweaked Hadoop distribution at the heart of a recently updated range of data analytics Greenplum appliances, while business intelligence company Jaspersoft announced plans to better integrate its products with Hadoop in February.

Sometimes I don’t get the meaning of the words commitment and investment. But this makes me believe others are having the same understanding problem.

Original title and link: IBM Hadoop Commitment (NoSQL databases © myNoSQL)


The Data Processing Platform for Tomorrow

In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS[1]. And in the open source corner we have Hadoop and R.

Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.

Who is ready to bet on which of these platforms will be processing more data in the next years?

  1. GigaOm has a good article on this subject here  

Original title and link: The Data Processing Platform for Tomorrow (NoSQL databases © myNoSQL)

Types of Big Data Work

Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics

Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.

There’s also a list of tools for BigData: AsterData (acquired by Teradata), Datameer, Paraccel, IBM Netezza, Oracle Exadata, EMC Greenplum.

Original title and link: Types of Big Data Work (NoSQL databases © myNoSQL)


About Watson

Watson is powered by 10 racks of IBM Power 750 servers running Linux, and uses 15 terabytes of RAM, 2,880 processor cores and is capable of operating at 80 teraflops. Watson was written in mostly Java but also significant chunks of code are written C++ and Prolog, all components are deployed and integrated using UIMA.

Watson contains state-of-the-art parallel processing capabilities that allow it to run multiple hypotheses – around one million calculations – at the same time.

Hadoop inside™

Original title and link: About Watson (NoSQL databases © myNoSQL)


Jeopardy Goes to Hadoop

Did you know that Hadoop was the knowledge base behind the Watson supercomputer? I didn’t:

Hadoop was used to create Watson’s “brain,” or the database of knowledge and facilitation of Watson’s processing of enormously large volumes of data in milliseconds. Watson depends on 200 million pages of content and 500 gigabytes of preprocessed information to answer Jeopardy questions. That huge catalog of documents has to be searchable in seconds.

I’d love to read what other open source tools have been used when building Watson. For example has Watson used the Python-based Natural Language Toolkit?

Update: Jeroen Latour points out in a comment a presentation about Watson’s DeepQA Project and an article available in PDF format:

Original title and link: Jeopardy Goes to Hadoop (NoSQL databases © myNoSQL)


Netezza Acquired by IBM

Netezza, the data warehousing appliance maker, has been acquired by IBM for approximately $1.7 billion. While I haven’t covered Netezza before, this acquisition is interesting from the perspective of the BigData market.

Update: Daniel Abadi wrote ☞ here about a possible Netezza acquisition by IBM over an year ago.


Original title and link: Netezza Acquired by IBM (NoSQL databases © myNoSQL)