hypergraphdb: All content tagged as hypergraphdb in NoSQL databases and polyglot persistence
Pere Urbón-Bayes must check slides deck on graph databases and their applicability. I like this graph database products slide most:
- Neo4j: open source database NoSQL graph
- Dex: the high performance graph database
- HyperGraphDB: an IA and semantic web graph database
- Infogrid: the Internet graph database
- Sones: SaaS dot Net graph database
- VertexDB: high performance database server
By the way I’ve heard Pere (@purbon) is currently looking for a job ;-).
Recently we’ve seen a lot of activity in the graph database world. Better understanding the space will help us make smarter decisions, so I’ve decided to reach out to the main players in the market and run a series of interviews about their projects and goals. The first in this series is about HyperGraphDB and Borislav Iordanov, his creator, has been kind enough to answer my questions.
myNoSQL: What is HyperGraphDB?
Borislav Iordanov: HyperGraphDB is a storage framework based on generalized hypergraphs as its underlying data model. The unit of storage is a tuple made up of 0 or more other tuples. Each such tuple is called an atom. One could think of the data model as relational where higher-order, n-ary relationships are allowed or as graph-oriented where edges can point to an arbitrary set of nodes and other edges. Each atom has an arbitrary, strongly-typed value associated with it. The type system managing those values is embedded as a hypergraph and customizable from the ground up. HyperGraphDB itself is an embedded database with an XMPP-based distribution framework and it relies on a key-value store underneath, currently BerkeleyDB. In its present form, it is a full-fledged object-oriented database for Java as well. Storage layout, indexing and caching are designed to support graph traversals and pattern matching.
myNoSQL: How would you position HyperGraphDB inside the NoSQL space?
Boris: I think it is quite apart and I don’t see it fit into any particular category. Because of the term “hypergraph”, it’s been categorized as a “graph database”, but strictly speaking it is not. The focus is highly complex data and knowledge representation problems. It originated from an AI project (http://www.opencog.org) and its power is partly in its data model and in its open-architecture framework.
myNoSQL: Would you mind explaining a bit more why you are placing HyperGraphDB closer to object databases than to graph databases?
Boris: Probably because object structures have the same kind of generality — arbitrary nesting, n-ary relations and if you model a relation as an identifiable object, it’s in effect reified, so you can have higher-order relationships. In addition OO database have well-developed type systems, as HyperGraphDB does (but HyperGraphDB’s is more general because you could model functional style type systems in it, you could also have types of types of types etc. ad infinitum).
Standard graphs are really just one kind of data structure that is conceptually simple and that happens to be very well studied mathematically so people use them a lot in modeling. A graph database is probably very good at dealing with graph-oriented problems with large datasets, but for general programming one would want a more versatile data model, and HyperGraphDB offers that as well as OO databases.
Obviously, one could model object structures as well as hypergraphs with classical graphs, but that doesn’t mean much - one could translate a C program into a Turing machine, and this doesn’t make the Turing machine a good choice for the problem the C program is solving.
myNoSQL: What are other solutions in this category/space?
Boris: I don’t know of any. The topic maps formalism (an RDF rival, that sadly is not very popular) is very close to the HyperGraphDB data model. RDF itself, named graphs etc. are close. Then graph and OO databases obviously touch on some of the functionality, with OO databases probably being closer. The database behind freebase.com is very similar in architecture, but relations are with fixed arity there too.
myNoSQL: Could you identify a couple of unique features that are differentiating HyperGraphDB from the other solutions?
Boris: Probably the two most interesting ones are:
- Higher-order, n-ary relations are unique to HyperGraphDB
- Open-architecture: there’s a very strong “frameworky” aspect to HyperGraphDB, it’s not a black box with fixed, restrictive data-model. The storage layout is open and documented. One can plugin customized indexing, customized type handling, customized back-end storage, customized distribution algorithms etc.
myNoSQL: What’s coming next on HyperGraphDB’ roadmap and why?
Boris The next release will be 1.1 within the next month or so, containing many bug fixes and polishing of the APIs. In addition, it will contain an MVCC implementation to increase transaction throughput, out-of-the-box replication, some optimizations for querying and graph traversals.
Following that, we will be focusing on developing a query language geared towards HyperGraphDB’s unique data model and developing more distribution algorithms for truly massive scalability.
People have also asked about full-text search so an integration with Lucene might happen some time within the next couple of months.
Nested graphs, RAM only graphs and a C++ port are also desirable features on our radar, as time and resources allow. We are an open-source, LGPL project and it all depends on how many people are willing to contribute and how much time they are willing to put in, so no definite dates yet.
myNoSQL: Thanks a lot Boris!
After triggering our quick review of graph databases, Pere Urbón came up with a nice comparison of these — Neo4j, HyperGraphDB, DEX, InfoGrid, Sones, VertexDB — in terms of License, Schema, Querying, Storage implementation, Utilities, Language and Operating system support.
Pere has made this very interesting NoSQL graph database matrix available as a ☞ PDF on his blog.
Pere Urbón ☞ published a short review of a couple of existing graph databases. For your reference, below are the ones reviewed in the post and a couple more that we’ve previously mentioned here on myNoSQL:
☞ Neo4j is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.
☞ DEX is a high performance library to manage very large graphs or networks
☞ HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
☞ InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
☞ vertexdb: a high performance graph database server that supports automatic garbage collection.
Note: by checking the project homepage I cannot tell if the project is still active or not.
☞ AllegroGraph RDFStore: a modern, high-performance, persistent RDF graph database.
Note: AllegroGraph seems to be positioned in the RDF stores space, which features some other solutions too.
☞ Filament: a graph persistence framework and associated toolkits based on a navigational query style.
☞ Sones GraphDS provides an inherent support for high-level data abstraction concepts (graph structures, walks, consistency, editions, revisions, copies), its own Graph Query Language, an underlying distributed file system and various interfaces like SOAP, REST or WebDAV.
And I’m not sure these are all …
Update: make sure you check the NoSQL Graph Database Matrix