Declarative bit-level binary parsing for Go
Generated: —
What was measured and what it means
nibble is a Go library that lets you describe binary packet layouts using struct tags (bits:"N"), then marshal and unmarshal bit-packed data declaratively — no hand-written bit arithmetic required.
This report benchmarks nibble against two alternatives: manual bit arithmetic (the theoretical performance ceiling — raw shifts and masks, zero reflection, zero allocations) and go-bitfield (a comparable struct-tag library that provides unmarshal-only parsing). The test packet is a 64-bit game-state struct (8 bytes, 8 fields with widths from 1 to 16 bits).
Key findings: After schema-caching was added to nibble, unmarshal throughput improved 11.6× (from 2102 ns/op to 182 ns/op per packet). nibble is now ~10× faster than go-bitfield and sustains ~5 million packets/second on a single core — sufficient for most production game servers, IoT hubs, and security tooling workloads. The remaining gap versus manual code (~27–40×) is the cost of reflection-based field dispatch.
When to use nibble: correctness-critical protocol work, rapid protocol iteration, debugging (Explain/Diff/Validate APIs), and any workload under ~5 M pkt/s. When to use manual: hot-path code requiring >5 M pkt/s, HFT/raw-networking, and cases where you own the bit-math and never change the protocol.
Unmarshal performance across five dataset sizes — ns per packet (lower is better)
| Dataset | nibble (ns) | manual (ns) | go-bitfield (ns) | nibble / manual | nibble / go-bitfield |
|---|---|---|---|---|---|
| 100 | 212.7 | 5.3 | 2235.0 | 40.1× | 10.5× faster |
| 1K | 244.2 | 11.7 | 2065.5 | 20.9× | 8.5× faster |
| 10K | 182.5 | 5.2 | 1725.6 | 35.1× | 9.5× faster |
| 100K | 184.9 | 5.7 | 1705.1 | 32.4× | 9.2× faster |
| 1M | 181.7 | 6.7 | 1824.7 | 27.1× | 10.0× faster |
Marshal performance — ns per packet (lower is better) · go-bitfield has no Marshal API
| Dataset | nibble (ns) | manual in-place (ns) | nibble / manual |
|---|---|---|---|
| 100 | 237.9 | 10.4 | 22.9× |
| 1K | 162.6 | 5.8 | 28.0× |
| 10K | 157.2 | 6.0 | 26.2× |
| 100K | 155.7 | 5.9 | 26.4× |
| 1M | 225.8 | 35.0 | 6.5× |
Higher is better · measured on Intel i7-10510U @ 1.80 GHz · single core
| Dataset | nibble unmarshal | manual unmarshal | go-bitfield | nibble marshal | manual marshal |
|---|---|---|---|---|---|
| 100 | 4.7 | 187.5 | 0.45 | 4.2 | 95.7 |
| 1K | 4.1 | 85.5 | 0.48 | 6.1 | 173.9 |
| 10K | 5.5 | 191.7 | 0.58 | 6.4 | 167.7 |
| 100K | 5.4 | 174.3 | 0.59 | 6.4 | 170.2 |
| 1M | 5.5 | 149.2 | 0.55 | 4.4 | 28.6 |
Heap allocations per single operation — measured with testing.AllocsPerRun(1000, …)
Each call allocates two small objects on the heap — one for the parsed struct layout and one for the byte slice result. At 5 M pkt/s this is ~10 M allocs/s, increasing GC frequency. Target: 0 allocs/op via object pooling in a future release.
Pure stack arithmetic — no heap involvement. ManualMarshalInto writes directly into a caller-supplied buffer. Zero GC pressure regardless of throughput.
After 10 × 1M-packet batches the post-GC live heap stays at a constant 305 MiB — confirming nibble allocations are transient and the GC reclaims them fully.
Performance isn't everything — correctness and developer safety matter more in most codebases
| Scenario | nibble | manual | go-bitfield |
|---|---|---|---|
| Truncated packet input | ✅ ErrInsufficientData | ❌ index panic | ❌ index panic |
| Field overflow (value > bit-width max) | ✅ ErrFieldOverflow | ❌ silent truncation | ❌ silent truncation |
| Protocol format change | ✅ 1-line struct edit | ❌ risky bit-math refactor | ✅ 1-line struct edit |
| Code readability | ✅ Declarative struct tags | ❌ Opaque bit arithmetic | ⚠️ Verbose (no Marshal) |
| Marshal support | ✅ Yes | ✅ Yes | ❌ No |
| Explain() debug tool | ✅ Yes — byte/bit breakdown | ❌ No | ❌ No |
| Validate() before marshal | ✅ Yes | ❌ Manual | ❌ No |
| Diff() struct comparison | ✅ Yes — field-level diff | ❌ No | ❌ No |
| Signed integer support | ✅ With sign extension | ✅ Manual sign extension | ⚠️ Unsigned only safe |
| Bool field support | ✅ Native bool | ✅ Manual comparison | ❌ Must use uint8 |
Schema caching eliminated repeated reflection on every call
A practical guide to choosing the right approach
All numbers used in this report
| Benchmark | ns/op (loop) | ns/pkt | MB/s | allocs/op | B/op |
|---|---|---|---|---|---|
| Nibble/Tiny_100 | 22,460 | 212.7 | 35.6 | 200 | 1700 |
| Nibble/Small_1K | 244,200 | 244.2 | 32.7 | 2000 | 17000 |
| Nibble/Medium_10K | 1,825,000 | 182.5 | 43.8 | 20000 | 170000 |
| Nibble/Large_100K | 18,490,000 | 184.9 | 43.3 | 200000 | 1700000 |
| Nibble/XLarge_1M | 181,700,000 | 181.7 | 44.0 | 2000000 | 17000000 |
| Manual/Tiny_100 | 530 | 5.3 | 1270 | 0 | 0 |
| Manual/Small_1K | 11,700 | 11.7 | 546 | 0 | 0 |
| Manual/Medium_10K | 52,000 | 5.2 | 1231 | 0 | 0 |
| Manual/Large_100K | 570,000 | 5.7 | 1122 | 0 | 0 |
| Manual/XLarge_1M | 6,700,000 | 6.7 | 954 | 0 | 0 |
| GoBitfield/Tiny_100 | 223,500 | 2235.0 | 3.6 | 100 | 800 |
| GoBitfield/Small_1K | 2,065,500 | 2065.5 | 3.9 | 1000 | 8000 |
| GoBitfield/Medium_10K | 17,256,000 | 1725.6 | 4.6 | 10000 | 80000 |
| GoBitfield/Large_100K | 170,510,000 | 1705.1 | 4.7 | 100000 | 800000 |
| GoBitfield/XLarge_1M | 1,824,700,000 | 1824.7 | 4.4 | 1000000 | 8000000 |
| Benchmark | ns/op (loop) | ns/pkt | MB/s | allocs/op | B/op |
|---|---|---|---|---|---|
| Nibble/Tiny_100 | 23,790 | 237.9 | 33.6 | 200 | 1600 |
| Nibble/Small_1K | 162,600 | 162.6 | 49.2 | 2000 | 16000 |
| Nibble/Medium_10K | 1,572,000 | 157.2 | 50.9 | 20000 | 160000 |
| Nibble/Large_100K | 15,570,000 | 155.7 | 51.4 | 200000 | 1600000 |
| Nibble/XLarge_1M | 225,800,000 | 225.8 | 35.4 | 2000000 | 16000000 |
| Manual/Tiny_100 | 1,040 | 10.4 | 770 | 0 | 0 |
| Manual/Small_1K | 5,800 | 5.8 | 1379 | 0 | 0 |
| Manual/Medium_10K | 60,000 | 6.0 | 1333 | 0 | 0 |
| Manual/Large_100K | 590,000 | 5.9 | 1356 | 0 | 0 |
| Manual/XLarge_1M | 35,000,000 | 35.0 | 229 | 0 | 0 |
git clone https://github.com/PavanKumarMS/nibble-benchmark cd nibble-benchmark go mod tidy go test -bench=. -benchmem -benchtime=10s -count=3 ./... go run cmd/runner/main.go --full --open