Sharding: All content tagged as Sharding in NoSQL databases and polyglot persistence
What I’ve noticed in these posts:
- RavenDB requires defining the actual shard servers in your sharding implementation (i.e. in your source code)
when performing writes there’re a bunch of round trips (for id generation)
Original title and link: RavenDB Sharding ( ©myNoSQL)
Q: Considering performance only, can a MySQL Cluster beat a custom data sharding MySQL solution?
A: I would say that MySQL Cluster could achieve higher throughput/host than sharded MySQL+InnoDB provided that :
- Queries are simple
- All data fits in-memory
In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latencies for purely in-memory data could be similar. As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing.
Make sure you read the complete answer as it covers some more MySQL Sharding vs MySQL Cluster pros and cons.
Original title and link: MySQL Sharding vs MySQL Cluster ( ©myNoSQL)
Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources.
And how it applies the relational algebra to distribute operations:
Every SQL statements breaks down into a relational algebra equation. In Algebra you learn that some operations are “distributable”, that is, you can split them up into smaller units, and when you put those units back together the result is the same as if you didn’t take it apart. Projection,
MAX* are distributable. With a little elbow grease, all non-distributable aggregate functions (
VARIANCE, etc) can be decomposed into distributable functions using simple substitution rules.
Jim Webber has published a series of posts — here, here, and here — discussing generic graph sharding solutions, the route Neo4j is taking for addressing this problem, and, in the last post, a simple strategy — suggested by Mark Harwood — for deciding a graph database scaling approach:
Dataset size: Many tens of Gigabytes
Strategy: Fill a single machine with RAM
Dataset size: Many hundreds of Gigabytes
Dataset size: Terabytes and above
Strategy: Domain-specific sharding
The cache sharding approach — described here — suggests replacing the problem of graph sharding with “the simpler problem of consistent routing”. But I’m not sure how this solution works effectively:
- if there is a way to provide consistent routing isn’t that equivalent with having an solution for sharding the graph?
- considering graph databases are chatty — in the sense that most of the time there is a complex traversal happening — how smart should the client and the router be to work effectively and efficiently?
For now I feel that being able to solve the consistent routing in a graph implies having a strategy for sharding a graph. The reverse applies too: having a sharding strategy would provide a routing solution. Thus scaling graph databases remains a subject open for research.
Original title and link: Scaling Graph Databases: Sharding and Consistent Routing (NoSQL databases © myNoSQL)
I’m seeing many questions asked
 about MongoDB replication and sharding, so I thought it might be an idea to try to gather the most interesting ones and submit them to the MongoDB/10gen people to get some answers. So, after reading the documentation on MongoDB replica sets
 and MongoDB sharding
, what questions would you want answered?
Please post a comment with your questions and I’ll be adding them to the list. After accumulating a couple of good questions I’ll forward them to MongoDB/10gen people for answers.
To start with, here are a couple from me:
- What may be causing replica sets to lag behind the master or become “disconnected” (except network partitions)?
- How would one determine the best size of sharding “chunks”?
- Is it possible and what would lead to an unbalanced cluster?
What are yours? Feel free to forward it to any friends using MongoDB that are looking into using replica sets and sharding.
Note: I’m not planning to create yet another forum or Q&A site, so I’ll make sure that once we get some answers these will be published in a place where everyone interested will find them easily.
Note: If you are interested in getting answers about other NoSQL database, please let me know and I’ll create the initial list.
- Not only on the ☞ MongoDB group, but also on Quora.com, StackOverflow.com and blogs (↩)
- MongoDB replica sets resources (↩):
- MongoDB sharding resources (↩):