Technical Glossary

The terms behind every recommendation, in plain English.

VRAM (Video RAM): Dedicated memory on a GPU. Loading an AI model means copying its weights into VRAM. If a model's weights don't fit, it can't run (or runs painfully slowly with offloading).
Parameter Count: The number of learned weights in a model, usually given in billions (B). A 70B model has 70 billion weights. More parameters generally mean more capability — and more VRAM.
Quantization (FP16, INT8, Q4): Compression that stores each weight in fewer bits. FP16 uses 16 bits per weight; Q4 uses 4. Q4 cuts VRAM by ~4× but slightly degrades quality.
VRAM Required Formula: VRAM ≈ (Parameters × Bits) / 8 bytes, plus a 20% overhead buffer for activations, KV cache, and CUDA workspace.
Memory Bandwidth (GB/s): How fast the GPU can read weights from VRAM. The single biggest predictor of LLM token generation speed once a model fits.
Future-Proof Score (FPS): Our proprietary value metric: (VRAM × Bandwidth ÷ Price) × 0.08 × (1 − Obsolescence Risk). Normalized to 0–100. Higher = better long-term value.
BUY / RENT / WAIT: BUY = strong fit and high FPS. RENT = doesn't fit, tight fit, or only moderate value (use cloud GPUs first). WAIT = fits but poor value or flagged data — a better option is likely coming.