Recently we’ve seen a lot of activity in the graph database world. Better understanding the space will help us make smarter decisions, so I’ve decided to reach out to the main players in the market and run a series of interviews about their projects and goals. The first in this series is about HyperGraphDB and Borislav Iordanov, his creator, has been kind enough to answer my questions.
myNoSQL: What is HyperGraphDB?
Borislav Iordanov: HyperGraphDB is a storage framework based on generalized hypergraphs as its underlying data model. The unit of storage is a tuple made up of 0 or more other tuples. Each such tuple is called an atom. One could think of the data model as relational where higher-order, n-ary relationships are allowed or as graph-oriented where edges can point to an arbitrary set of nodes and other edges. Each atom has an arbitrary, strongly-typed value associated with it. The type system managing those values is embedded as a hypergraph and customizable from the ground up. HyperGraphDB itself is an embedded database with an XMPP-based distribution framework and it relies on a key-value store underneath, currently BerkeleyDB. In its present form, it is a full-fledged object-oriented database for Java as well. Storage layout, indexing and caching are designed to support graph traversals and pattern matching.
myNoSQL: How would you position HyperGraphDB inside the NoSQL space?
Boris: I think it is quite apart and I don’t see it fit into any particular category. Because of the term “hypergraph”, it’s been categorized as a “graph database”, but strictly speaking it is not. The focus is highly complex data and knowledge representation problems. It originated from an AI project (http://www.opencog.org) and its power is partly in its data model and in its open-architecture framework.
myNoSQL: Would you mind explaining a bit more why you are placing HyperGraphDB closer to object databases than to graph databases?
Boris: Probably because object structures have the same kind of generality — arbitrary nesting, n-ary relations and if you model a relation as an identifiable object, it’s in effect reified, so you can have higher-order relationships. In addition OO database have well-developed type systems, as HyperGraphDB does (but HyperGraphDB’s is more general because you could model functional style type systems in it, you could also have types of types of types etc. ad infinitum).
Standard graphs are really just one kind of data structure that is conceptually simple and that happens to be very well studied mathematically so people use them a lot in modeling. A graph database is probably very good at dealing with graph-oriented problems with large datasets, but for general programming one would want a more versatile data model, and HyperGraphDB offers that as well as OO databases.
Obviously, one could model object structures as well as hypergraphs with classical graphs, but that doesn’t mean much - one could translate a C program into a Turing machine, and this doesn’t make the Turing machine a good choice for the problem the C program is solving.
myNoSQL: What are other solutions in this category/space?
Boris: I don’t know of any. The topic maps formalism (an RDF rival, that sadly is not very popular) is very close to the HyperGraphDB data model. RDF itself, named graphs etc. are close. Then graph and OO databases obviously touch on some of the functionality, with OO databases probably being closer. The database behind freebase.com is very similar in architecture, but relations are with fixed arity there too.
myNoSQL: Could you identify a couple of unique features that are differentiating HyperGraphDB from the other solutions?
Boris: Probably the two most interesting ones are:
- Higher-order, n-ary relations are unique to HyperGraphDB
- Open-architecture: there’s a very strong “frameworky” aspect to HyperGraphDB, it’s not a black box with fixed, restrictive data-model. The storage layout is open and documented. One can plugin customized indexing, customized type handling, customized back-end storage, customized distribution algorithms etc.
myNoSQL: What’s coming next on HyperGraphDB’ roadmap and why?
Boris The next release will be 1.1 within the next month or so, containing many bug fixes and polishing of the APIs. In addition, it will contain an MVCC implementation to increase transaction throughput, out-of-the-box replication, some optimizations for querying and graph traversals.
Following that, we will be focusing on developing a query language geared towards HyperGraphDB’s unique data model and developing more distribution algorithms for truly massive scalability.
People have also asked about full-text search so an integration with Lucene might happen some time within the next couple of months.
Nested graphs, RAM only graphs and a C++ port are also desirable features on our radar, as time and resources allow. We are an open-source, LGPL project and it all depends on how many people are willing to contribute and how much time they are willing to put in, so no definite dates yet.
myNoSQL: Thanks a lot Boris!