Performant Bulk Mutations in IndexedDB

2021-05-30 | Tech

IndexedDB seems to be inefficient when working on bulk mutations, such as dumping a huge list of items into an object store – at least I think so at the first sight on the MDN docs. It provides no explicit API for the job as SQL does ¹, so all we can do is to loop from client side, which cannot benefit from database internal optimization (if there’s any). The mutation requests, in addition, appear to be spawned sequentially – the tutorial recommends a paradigm to raise a request within the success event callback of the previous request, which is in fact a sequential execution. Such code will be definitely slow.

We may conduct a quick benchmark on the above approach:

;(async () => {
    await new Promise((resolve) => {
        const r = indexedDB.deleteDatabase("test")
        r.onsuccess = r.onerror = resolve
    })
    const items = Array.from({ length: 100000 }, (_, i) => ({ id: i }))
    const store = await new Promise((resolve) => {
        indexedDB.open("test", 1).onupgradeneeded = (event) => {
            const db = event.target.result
            const store = db.createObjectStore("store", { keyPath: "id" })
            store.createIndex("id", "id")
            resolve(store)
        }
    })
    console.time("bulkAdd")
    await bulkAdd(store, items)
    console.timeEnd("bulkAdd")
})()

function bulkAdd(store, items) {
    const failures = []
    return new Promise((resolve) => {
        function _perform(idx) {
            const req = store.add(items[idx])
            req.onsuccess = (event) => {
                if (idx === items.length - 1) resolve(failures)
                else _perform(idx + 1)
            }
            req.onerror = (event) => {
                failures.push(items[idx].id)
            }
        }
        _perform(0)
    })
}

Practically, we concern more about failed records than the ones inserted successfully. We thus take down only the indices of those records, which improves the efficiency at least a little bit.

The timing is rather unstable, but on average, it takes 30~40 seconds to insert 100k records or 2000~3000 records per second, which is not promising.

So why should the requests be aranged like this? I found a statement in the MDN docs:

Note that commit() doesn’t normally have to be called — a transaction will automatically commit when all outstanding requests have been satisfied and no new requests have been made. commit() can be used to start the commit process without waiting for events from outstanding requests to be dispatched.

Unlike many other databases, IndexedDB transaction implicitly commits changes, at the instant when all callbacks of previous requests are invoked (and returned), and there’s no more new requests. Hence a later request must be queued before the previous one completed or at the time of its success or error callbacks executing, otherwise the transaction will be closed. Raising new requests in callbacks is safe, but not very efficient.

Can we make it better? Absolutely. We can perform the mutations “concurrently”, that is, raising all requests at a time.

function bulkAdd(store, items) {
    const failures = []
    const promises = []
    for (const item of items) {
        const req = store.add(item)
        promises.push(
            new Promise((resolve) => {
                req.onsuccess = resolve
                req.onerror = () => {
                    failures.push(item.id)
                    resolve()
                }
            })
        )
    }
    return Promise.all(promises).then(() => failures)
}

The execution time is ~15 seconds now, around 2x faster than the first implementation.

But wait … is this implementation reliable? Is it possible that at some point all previous requests are completed, but there’s still new requests to be raised? The answer is yes, but the transaction won’t be closed for this, since the callbacks have not yet been invoked.

Recall that Javascript has a single-threaded execution model. All callbacks reside in a queue and are executed sequentially when the thread is idle. The mechanism ensures that mutation request callbacks won’t be executed before all requests raised, as the raising operations and callbacks are executed in the same thread. So even if some requests complete early, their success or error callbacks will still be scheduled after all requests are created. The implementation is definitely reliable.

Now we have reached a speed of ~6000 records per second. Can we make it even better? Of course yes!

If you read the code carefully, you may notice that req.onsuccess is doing something vain. A resolve function is passed as success callback, just to inform the timing of request completion. However, we are not interested about the timing of each request completion. Instead, only the timing of last completion is important.

We may expect the callbacks be executed in the order they are raised. They do actually. But weirdly, the MDN docs does not advertise such point, and we have to dive into the IndexedDB Specification to seek the clues. After struggling against the verbose documentation, I catch something at section Asynchronously executing a request. Briefly speaking,

A request won’t be processed before previous requests’ processed flags be set. ²
A request will immediately queue a task after its processed flag set, within which the callbacks are invoked. ³

which means, the tasks firing events will be ordered as their corresponding requests. Consequently, we might be able to know the timing of all completions, by only listening to the success/error events of the last request:

function bulkAdd(store, items) {
    return new Promise((resolve) => {
        const failures = []
        let req, item
        for (item of items) {
            req = store.add(item)
            req.onerror = () => failures.push(item.id)
        }
        req.onsuccess = () => resolve(failures)
        req.onerror = () => {
            failures.push(item.id)
            resolve(failures)
        }
    })
}

This time we have a better running time of 10~12 seconds, which is 8000~10000 records per second, nearly 4x faster than the first implementation! The table below summarizes how the performance get improved along the way:

impl.	time(s)	speed(r/s)
sequential	30~40	2000~3000
concurrent	~15	~6000
concurrent+less listeners	10~12	8000~10000

The last implementation is also how Dexie.js does for efficient bulk mutations. The library makes a statement on its homepage:

Its bulk operations utilize an often-overlooked feature in IndexedDB, ignoring success callbacks when possible.

References

you may add as many records after the INSERT INTO ... VALUES clause
See 5.6.5.1 Wait until request is the first item in transaction’s request list that is not processed.
See 5.6.5.6: Queue a task to run these steps

Author: hsfzxjy.
Link: .
License: CC BY-NC-ND 4.0.
All rights reserved by the author.
Commercial use of this post in any form is NOT permitted.
Non-commercial use of this post should be attributed with this block of text.

Web IndexedDB

OOPS!

A comment box should be right here...But it was gone due to network issues :-(If you want to leave comments, make sure you have access to disqus.com.