Navigation

AI · Models

The right model is the one that pays for itself.

Open-weight vs hosted, fine-tuning vs RAG vs prompt engineering, evaluation benchmarks tailored to your workflows, and cost/latency engineering.

Overview

Most production systems combine prompting, retrieval, and fine-tuning. We pick the dominant strategy on evidence, then layer the others where they pay rent.

What it is

Choosing the model that earns its keep.

"Picking a model" is a stand-in for several different decisions: do you need a hosted frontier model or an open-weight one you can run yourself; do you need to teach the model new behavior or just retrieve facts at query time; can you live with prompt engineering or do you actually need a fine-tune.

We help you answer those questions with measurement, not opinion — and design the system so the answer can change cleanly when models, prices, or terms do.

Workflow

Fine-tune vs RAG vs prompt — a decision tree.

Fine-tune vs RAG vs prompt-engineering decision tree A decision tree starting from whether the task is knowledge-bound, branching into static vs dynamic knowledge, then into behavioral constraints and pattern repeatability — terminating in fine-tune, RAG, or prompt-engineering recommendations. Knowledge-bound task? facts not in the model Static knowledge? rarely changes Behavioral / format constraints? Fine-tune RAG Pattern repeatable? Prompt engineering Fine-tune (small + LoRA) low-rank adapter Prompt engineering yes no yes no yes no yes no most production systems combine all three — the tree picks the dominant strategy
The right model is the one that pays for itself in production. Most systems combine all three approaches.
  1. If the task is knowledge-bound (needs facts not in the model): with static knowledge, fine-tune; with dynamic knowledge, use RAG.
  2. If the task is not knowledge-bound but has behavioral or format constraints: with a repeatable pattern, fine-tune a small model with a low-rank adapter; otherwise use prompt engineering.
  3. If neither knowledge-bound nor strongly constrained, prompt engineering is sufficient.
  4. Most production systems combine all three approaches; the tree picks the dominant strategy.

Deliverables

What you walk away with.

Pitfalls

How we don't do it.

Engagement

How we work with you.

  1. 01

    Discover

    Tasks, success criteria, latency budget, and the cost ceiling that matters.

  2. 02

    Evaluate

    Run candidates on your own data; score quality, latency, and cost together.

  3. 03

    Decide

    Prompt, RAG, fine-tune — or a combination — chosen on evidence, written down.

  4. 04

    Operate

    Upgrade cadence, regression eval, and a rollback path you have actually rehearsed.

Choose the model on evidence.

Tell us the task and your constraints. We'll evaluate candidates on your own data and recommend the combination that pays its way.

Related