Technical Glossary

The terms behind every recommendation, in plain English.

VRAM (Video RAM)
Dedicated memory on a GPU. Loading an AI model means copying its weights into VRAM. If a model's weights don't fit, it can't run (or runs painfully slowly with offloading).
Parameter Count
The number of learned weights in a model, usually given in billions (B). A 70B model has 70 billion weights. More parameters generally mean more capability โ€” and more VRAM.
Quantization (FP16, INT8, Q4)
Compression that stores each weight in fewer bits. FP16 uses 16 bits per weight; Q4 uses 4. Q4 cuts VRAM by ~4ร— but slightly degrades quality.
VRAM Required Formula
VRAM โ‰ˆ (Parameters ร— Bits) / 8 bytes, plus a 20% overhead buffer for activations, KV cache, and CUDA workspace.
Memory Bandwidth (GB/s)
How fast the GPU can read weights from VRAM. The single biggest predictor of LLM token generation speed once a model fits.
Future-Proof Score (FPS)
Our proprietary value metric: (VRAM ร— Bandwidth รท Price) ร— 0.08 ร— (1 โˆ’ Obsolescence Risk). Normalized to 0โ€“100. Higher = better long-term value.
BUY / RENT / WAIT
BUY = strong fit and high FPS. RENT = doesn't fit, tight fit, or only moderate value (use cloud GPUs first). WAIT = fits but poor value or flagged data โ€” a better option is likely coming.