Technical Glossary
The terms behind every recommendation, in plain English.
- VRAM (Video RAM)
- Dedicated memory on a GPU. Loading an AI model means copying its weights into VRAM. If a model's weights don't fit, it can't run (or runs painfully slowly with offloading).
- Parameter Count
- The number of learned weights in a model, usually given in billions (B). A 70B model has 70 billion weights. More parameters generally mean more capability โ and more VRAM.
- Quantization (FP16, INT8, Q4)
- Compression that stores each weight in fewer bits. FP16 uses 16 bits per weight; Q4 uses 4. Q4 cuts VRAM by ~4ร but slightly degrades quality.
- VRAM Required Formula
- VRAM โ (Parameters ร Bits) / 8 bytes, plus a 20% overhead buffer for activations, KV cache, and CUDA workspace.
- Memory Bandwidth (GB/s)
- How fast the GPU can read weights from VRAM. The single biggest predictor of LLM token generation speed once a model fits.
- Future-Proof Score (FPS)
- Our proprietary value metric: (VRAM ร Bandwidth รท Price) ร 0.08 ร (1 โ Obsolescence Risk). Normalized to 0โ100. Higher = better long-term value.
- BUY / RENT / WAIT
- BUY = strong fit and high FPS. RENT = doesn't fit, tight fit, or only moderate value (use cloud GPUs first). WAIT = fits but poor value or flagged data โ a better option is likely coming.