Write ingest: a firehose of new keys
Event logs, metrics, and bulk loads that write new keys fast. LSM engines like pebble and badger absorb writes without rewriting a tree.
This is the ingest workload: you are writing a stream of new keys as fast as they arrive. Event logs, metrics pipelines, crawl output, bulk imports. Reads happen later or elsewhere; right now the only thing that matters is keeping up with the write rate.
Writing new keys favours engines that do not rewrite a tree on every insert. That is the LSM trade: a write is an in-memory buffer insert plus a small log append, and the sorting happens later in the background.
These numbers are with the disk flush off, so they measure the engine's structural write speed, not the disk. If every write must survive a crash, that is a different question with different winners, on the durable-writes page.
The numbers
Writing 100,000 fresh random keys, 1 KB values, 8 concurrent clients:
| Engine | Shape | M4 | EPYC 4-core | EPYC 6-core | EPYC 8-core |
|---|---|---|---|---|---|
| badger | LSM | 239,000 | 61,000 | 22,000 | 32,000 |
| buntdb | in-memory B-tree | 230,000 | 85,000 | 44,000 | 63,000 |
| pogreb | hash-log | 190,000 | 84,000 | 50,000 | 54,000 |
| pebble | LSM | 97,000 | 120,000 | 92,000 | 72,000 |
| goleveldb | LSM | 92,000 | 54,000 | 30,000 | 37,000 |
| tamnd/kv | hash-log | 83,000 | 25,000 | 12,000 | 17,000 |
| bbolt | B+tree | 38,000 | 20,000 | 11,000 | 10,000 |
| sqlite | B-tree | 29,000 | 8,000 | 3,000 | 5,000 |
There is a twist here worth noticing. badger is fastest on the M4 laptop, but pebble is fastest on every Linux server and barely slows as the data grows, because its compaction scales across cores where badger's value-log GC does not. If your ingest runs on a server, pebble is the safer bet; on a laptop or a few cores, badger edges it.
bbolt and sqlite sit at the bottom because a B-tree insert can rewrite a page, the exact cost the LSM shape avoids.
What to pick
- pebble for sustained ingest on a server. It is the most consistent writer across machines and compresses the result smallest on disk (see footprint).
- badger for ingest on a laptop or a few cores, where its in-memory write path is fastest, as long as you can afford its disk footprint (it uses 22x the raw data until its background GC catches up).
- buntdb if the dataset fits in RAM and you want fast writes and fast reads from one engine.
What to avoid
- bbolt and sqlite for write-heavy ingest. The B-tree page rewrite caps them well below the LSM engines.
- badger if disk space is tight. Its write speed comes with the highest space amplification measured here.
- tamnd/kv if ingest is the main job. It is mid-pack on fresh writes and its strength is on the read side, not ingest.