AI · Models
The right model is the one that pays for itself.
Open-weight vs hosted, fine-tuning vs RAG vs prompt engineering, evaluation benchmarks tailored to your workflows, and cost/latency engineering.
Overview
Most production systems combine prompting, retrieval, and fine-tuning. We pick the dominant strategy on evidence, then layer the others where they pay rent.
What it is
Choosing the model that earns its keep.
"Picking a model" is a stand-in for several different decisions: do you need a hosted frontier model or an open-weight one you can run yourself; do you need to teach the model new behavior or just retrieve facts at query time; can you live with prompt engineering or do you actually need a fine-tune.
We help you answer those questions with measurement, not opinion — and design the system so the answer can change cleanly when models, prices, or terms do.
Workflow
Fine-tune vs RAG vs prompt — a decision tree.
- If the task is knowledge-bound (needs facts not in the model): with static knowledge, fine-tune; with dynamic knowledge, use RAG.
- If the task is not knowledge-bound but has behavioral or format constraints: with a repeatable pattern, fine-tune a small model with a low-rank adapter; otherwise use prompt engineering.
- If neither knowledge-bound nor strongly constrained, prompt engineering is sufficient.
- Most production systems combine all three approaches; the tree picks the dominant strategy.
Deliverables
What you walk away with.
- Model shortlist with measured quality, latency, and unit cost on your own evaluation set.
- Decision memo: prompt vs RAG vs fine-tune for each task, with the reasoning written down.
- Fine-tuning plan when warranted: dataset spec, base model, adapter strategy, and eval thresholds.
- Cost and latency model for the chosen approach, including p50/p99 and headroom for growth.
- Operational plan: model upgrades, deprecations, and a rollback path that does not require an outage.
Pitfalls
How we don't do it.
- Defaulting to the largest hosted model for every task and calling cost "an optimization for later".
- Fine-tuning to fix a problem that better retrieval would have solved for a tenth of the cost.
- Picking a model from a leaderboard instead of an evaluation on your own task.
- Locking the architecture to a single provider with no plan for the day pricing or terms change.
Engagement
How we work with you.
-
01
Discover
Tasks, success criteria, latency budget, and the cost ceiling that matters.
-
02
Evaluate
Run candidates on your own data; score quality, latency, and cost together.
-
03
Decide
Prompt, RAG, fine-tune — or a combination — chosen on evidence, written down.
-
04
Operate
Upgrade cadence, regression eval, and a rollback path you have actually rehearsed.
Choose the model on evidence.
Tell us the task and your constraints. We'll evaluate candidates on your own data and recommend the combination that pays its way.
Related