Fine-Tuning vs. Training in the Foundation Model Era
5 min read
Ml EngAi Research
Ml Eng
Default to prompt or fine-tune. Train from scratch only when you have the data and the need.
Ai Research
Research still does novel architectures. Production usually doesn't.
Fine-Tuning vs. Training in the Foundation Model Era
TL;DR
- Most problems: prompt or fine-tune. Training from scratch is for edge cases.
- Fine-tuning is faster and cheaper. You need enough quality data and a clear task.
- Training from scratch: when the foundation doesn't fit (domain, modality, constraints) and you have the resources.
The old playbook was "collect data, train a model." The new one is "try the foundation model first, then fine-tune if needed, train only if necessary." Most teams never get past step 2.
The Decision Tree
Prompt first:
- Can you get 80% there with a good prompt? Try it. No training. No infra. Iterate fast.
- Use for: classification, extraction, summarization, Q&A. Often enough.
Fine-tune when:
- Prompting isn't good enough. You have 100s–1000s of labeled examples. The task is consistent.
- Fine-tuning adapts the foundation to your domain. Fewer examples than training from scratch. Faster iteration.
Train from scratch when:
- No foundation model fits (exotic modality, strict latency, on-device, regulatory).
- You have large-scale data and the budget for compute and expertise.
- Rare. Most teams shouldn't start here.
Fine-Tuning: What You Need
Data:
- Quality > quantity. 500 good examples beat 50K noisy ones.
- Format: input-output pairs. Consistent task definition.
Compute:
- LoRA, QLoRA reduce cost. Fine-tuning is cheaper than training. Still not free.
- Cloud or on-prem. You provision.
Evaluation:
- How do you know it worked? Holdout set, production metrics. Define before you tune.
The Pitfalls
- Overfitting — Small data + aggressive tuning = model that memorizes. Use validation. Early stopping.
- Catastrophic forgetting — Fine-tuning can hurt general ability. Monitor both task performance and base capabilities.
- Vendor lock-in — Fine-tuning via API ties you to a provider. Fine-tune open weights if portability matters.
Collect data. Train from scratch. Months of compute. One model per task.
Click "ML Development With Foundation Models" to see the difference →
Quick Check
You need a model for a domain-specific classification task. What's the right order?
Do This Next
- Map your use cases — For each: prompt-only, fine-tune, or train? Document the decision and the threshold.
- Run one fine-tuning experiment — Pick a task. Get 100+ examples. Fine-tune a small model. Compare to prompt-only. Document the lift.
- Define your "train from scratch" criteria — When would you? List the conditions. Makes the default (don't) explicit.