Itamar Syn-Hershko explains the indexing process in RavenDB:
RavenDB has a background process that is handed new documents and document
updates as they come in, right after they were stored in the Document Store,
and it passes them in batches through all the indexes in the system. For
write operations, the user gets an immediate confirmation on their
transaction—even before the indexing process started processing these
updates—without waiting for indexing, but being 100 percent certain the
changes were recorded in the database. Queries do not wait for indexing
either—they just use the indexes that exist at the time the query was
issued. This ensures both smooth operation on all fronts, and that no
documents are left behind.
Asynchronous indexing is tricky. While it looks like addressing the performance penalty on both read and write, it actually has a few drawbacks:
- immediate inconsistency: with asynchronous indexes, there are no consistency guarantees.
- impossibility of defining unique indexes. When using async indexes, it’s impossible to define unique indexes as by the time the index would be updated it would be too late to acknowledge the client that the uniqueness constraint is not satisfied.
- complicated crash recovery. With async indexing, the server must be able to continue the indexing process from where it was left. If this information is not persistent, crash recovery might lead to permanent data inconsistencies.
Any other obvious ones I’ve missed?
Original title and link: RavenDB document indexing process ( ©myNoSQL)