Transcription speed and cost

The finished transcript is always produced on your machine, and speaker labels are always on-device and free. The one hardware-sensitive piece is the live caption while you record — and even there you have options (see below). This article covers how long the work takes and what kind of laptop you need, so you can decide whether Daisy is a fit before you commit.

All numbers below come from our own benchmark machines (June 2026). Your own hardware will land in the same ballpark, but treat them as approximate — speed scales with your CPU and GPU.

What "finalize" means

The live transcript you see during a call is a preview. When the call ends, Daisy keeps the text it already has and uses the time after the meeting to:

Sort out who said what.
Remove duplicates between Me and Them.
Write the summary and chapter table.
Save everything to your library and update the search index.

That post-call work is what people mean by "finalize." It runs once, in the background, after the meeting.

How fast transcription runs — by hardware

Turning speech into text is the heaviest job, and it scales with your machine. We benchmark three rough classes, measured as a realtime factor — "60×" means a 60-minute meeting transcribes in 1 minute:

Your setup	Speed	1-hour meeting transcribes in
Apple Silicon Mac, or a desktop with a dedicated GPU	~60–175× realtime	well under a minute
A desktop CPU with no GPU	~20–30× realtime	~2–3 minutes
An Intel laptop	~9–17× realtime	~4–6 minutes

Speaker labeling adds a little on top and is fast on all three. After that, if you've set up AI, Daisy writes the summary and chapters — see the AI timing below.

A couple of things worth saying out loud:

It's genuinely usable. A few minutes after a 1-hour meeting ends, you have a complete, searchable record — with speaker labels, summary, and chapters. No spinning loader, no "your transcript is being prepared."
You can keep working in the meantime. Finalize runs in the background. Closing Daisy mid-finalize is fine; a corner toast catches you up when it's done.
The finished transcript costs $0. No per-meeting bill, no API account to set up. (Live captions on a laptop can use a paid Deepgram key — that's optional, see below.)

Choosing the transcription model

Daisy ships with base.en and runs it out of the box — no setup, no download. If you'd rather trade speed for accuracy (or add languages), pick a different model under Settings → Providers → Advanced → Local transcription model. Daisy downloads the one you choose and uses it from then on.

The trade-off is simple: larger models are more accurate but slower, and use more memory. Rough peak RAM while transcribing:

Model	Peak RAM	Notes
base.en (default)	~640 MB	Fast; solid for most meetings.
small.en	~1.5 GB	We saw occasional hallucinations on small — your mileage may vary.
large-v3-turbo	~4 GB	Most accurate; still quick on Apple Silicon / a GPU.

We recommend base or large-v3-turbo. (English-only models like base.en transcribe English only; the multilingual models cover 99 languages — see Transcription language.)

Live captions: where they run

Live captions are the only part that depends on the machine, because they have to keep up in real time:

Desktops and Apple Silicon Macs run live captions on-device — private, free, nothing leaves the machine.
Lighter laptops choose one of: on-device (heavier on the CPU — more fan and battery), a bring-your-own Deepgram realtime key (lighter on the machine, billed to your own Deepgram account), or skip live captions and take the full transcript at the end.

Either way the finished transcript is finalized locally and diarization stays on-device — the Deepgram path only affects the live overlay. More in Live and real-time transcription.

Hardware notes

Daisy was designed to run on the kind of laptop you already have.

A modern Intel or AMD laptop CPU is enough to finish a meeting comfortably. No GPU required — it just lands in the slower of the three classes above.
A desktop with a dedicated GPU, or an Apple Silicon Mac, is dramatically faster (the GPU does the work) and keeps up with on-device live captions easily.
An older laptop still finalizes every meeting fine; for live captions it leans on the Deepgram option or the take-it-at-the-end path.

How fast the AI features run

The AI features — summary, chapters, coaching, and asking questions across your meetings — are separate from transcription, and their speed depends on how you run them:

A local model (LM Studio or Ollama) on a machine with a capable GPU returns a meeting summary in about 12–21 seconds. On a CPU-only machine the same work can take anywhere from many seconds to a few minutes — it's entirely down to your hardware.
A cloud key returns results in a few seconds regardless of your hardware; there the variable is cost, not wait.

What the AI costs

Transcription and speaker labels are always local and free. The AI features are the only place a cost can show up, and only if you use a cloud key:

Local model or copy-paste into your own ChatGPT/Claude — free.
Cloud key — a full meeting's AI (summary, chapters, coaching, goals) works out to roughly 4¢ on a balanced mix of efficient models, up to about 30¢ if you force a top-tier model for everything. These are approximate and vary by provider; the charge goes straight to your own key, and Daisy never proxies or marks up.

See meeting summaries and AI providers & keys for the full set of options.