Skip to main content

NLP and CV When General Models Exist

5 min read
Ml EngAi Research

Ml Eng

Use foundation models for 80%. Build custom for the 20% that matters.

Ai Research

Research pushes the boundary. Production uses what works.

NLP and CV When General Models Exist

TL;DR

  • Foundation models handle most NLP and CV tasks out of the box. Or with light fine-tuning.
  • Specialists add value at the edges: domain, scale, latency, constraints, and quality bar.
  • Your job shifts from "build the model" to "integrate, adapt, and fix what breaks."

You specialized in NLP or computer vision. Now there are models that do translation, summarization, object detection, and image generation without you training anything. It's disorienting. It's also an opportunity: the 80% is commoditized. The 20% — the hard cases — still need you.

Where Foundation Models Win

NLP:

  • Translation, summarization, Q&A, classification, extraction — GPT, Claude, Llama, etc. handle these well.
  • For many products, an API call is enough. Or a local deployment of an open model.

CV:

  • Object detection, segmentation, image classification — YOLO, Segment Anything, ViT-based models. Off-the-shelf.
  • Image generation — DALL-E, Midjourney, Stable Diffusion. Available.

The pattern:

  • Standard task + general domain = use a foundation model.
  • Don't reinvent. Integrate.

Where Specialists Still Matter

Domain-specific:

  • Legal, medical, scientific — Terminology, accuracy requirements, regulatory. General models make mistakes. You adapt.
  • Fine-tune. Use retrieval. Add validation layers.

Scale and latency:

  • Millions of requests. Sub-100ms. On-device. Foundation models might not fit.
  • Distillation, smaller models, custom architectures. You optimize.

Edge cases:

  • Ambiguity. Rare classes. Adversarial inputs. Production finds the cracks. You fix them.
  • Evaluation design. Red-teaming. Guardrails.

Multimodal and novel setups:

  • Video + text. Sensor fusion. Real-time. Research territory. You push the boundary.

The New Playbook

  1. Start with the foundation — Can it do the task? Try it. Measure.
  2. Identify the gap — Where does it fail? Wrong domain? Too slow? Bad edge cases?
  3. Fix the gap — Fine-tune, add layers, switch model, add human-in-the-loop. Targeted intervention.

Build custom NLP/CV model for every task. Train, evaluate, deploy. Months per project.

Click "NLP/CV With Foundation Models" to see the difference →

Quick Check

Foundation models handle most NLP tasks. Where do specialists still add value?

Do This Next

  1. List your current NLP/CV work — For each task: foundation model sufficient? If not, why? Document the gap.
  2. Run one task through a foundation model — Your data. Your metrics. What's the delta from your current solution?
  3. Build an "edge case" backlog — Where does the foundation model fail? That's your differentiation. Own it.