Controlled variable: only the diffusion-model precision changes
(fp8_scaled → int8 ConvRot). Same model, same er_sde sampler, same 8 steps, seed,
prompt, and resolution — so the difference is attributable to precision alone.
1.92× faster
int8 generates the same image in 48% less time than fp8 on this RTX 3090
+97%
throughput (it/s)
7.1 s
saved / image
+0.3 GB
VRAM vs fp8
Throughput — it/s (higher = faster)
Seconds per image (lower = faster)
Mean peak VRAM
Avg power draw
Per-run consistency (clean runs)
Flat, separated lines = stable measurement; the gap between them is the speedup.
Same prompt · same seed (42) · fp8 vs int8
Identical seed both sides — visual quality is comparable; the difference is speed, not output.