Storage Engine

Architecture

  • Append-only file -- Documents stored as [status:u8][length:u32 LE][payload]. Soft-delete flips the status byte in place.
  • Write-Ahead Log (WAL) -- CRC32 checksums per entry. Transaction ID tagging. 3-fsync protocol: WAL → data → checkpoint.
  • Zstd compression -- Level 3 by default. Transparent per-document. Thread-local compressor/decompressor reuse.
  • AES-256-GCM encryption -- Optional. Random 12-byte nonce per document. Applied after compression.
  • LRU document cache -- Per-collection in-memory cache. JSON deserialized once, then Arc-refcounted. Configurable capacity.
  • Lock-free reads -- Separate read-only file handle uses pread. Writes are serialized via Mutex.
  • Lazy sync mode -- Background thread batches fsyncs at a configurable interval. Reduces write latency at the cost of durability window.

Collection Isolation

Each collection has its own storage file, WAL, indexes, and cache. Per-collection RwLock enables concurrent reads across different collections and concurrent reads within the same collection.

Cluster mode persistence v0.28.18

When --features cluster is enabled and OXIDB_NODE_ID is set, each node also writes its Raft state inside OXIDB_DATA:

  • raft_meta.json -- small file (~400 B): vote, last committed log id, last purged log id, last applied log id, current membership. Rewritten on metadata changes.
  • raft_log.jsonl -- append-only log: one openraft Entry per line. append_to_log is O(1) per entry; only conflict-resolution and snapshot purges rewrite the file.

This is what allows a node to come back as a Follower after a restart instead of a fresh Learner term=0. Verified end-to-end at 1M records under mid-stream failover; see oxidb-server/src/raft/log_store.rs for the implementation.