Skip to content
← Help

Live and real-time transcription

Daisy shows the transcript while you record. The finished transcript is always produced on your machine, and speaker labels are always on-device and free. The one part that depends on your hardware is the live caption — the words appearing as people talk — because it has to keep up in real time.

What you see while recording

The live transcript labels speech by source: your microphone is Me and your system audio is Them. Words appear as they're spoken and settle into clean text a few seconds later — with punctuation, capitalization, and fewer mishears. Interim words look slightly faded until they settle.

Captions appear in the recording view and in the floating mini-window if you're using it.

Where live captions run

Live captions are private by default and on-device wherever the machine can keep up:

  • Desktops and Apple Silicon Macs — live captions run on-device. Your audio stays on your machine, with no account needed.
  • Lighter laptops — you choose:
    • On-device — same private path, but it works your CPU harder (more fan and battery).
    • Bring-your-own Deepgram key — stream live captions through your own Deepgram realtime key for a lighter footprint on the machine. This is the only place the cloud can touch transcription, and only for the live overlay.
    • Skip live captions — turn the live transcript off and take the full transcript when the call ends. Same finished result.

Whichever you pick, the saved transcript is finalized locally and diarization stays on-device — choosing the Deepgram live path does not send your finished transcript or your speaker labels to the cloud.

Behavior settings

Open Settings → Behavior to change how the live transcript behaves:

  • Show transcript live in the recording window — turns the live transcript on or off entirely during a recording. Recording itself is unaffected.
  • Live captions source — on a laptop, choose on-device or your Deepgram realtime key. Add the key under Settings → Providers (see Configuring AI providers).

You may see the same line twice — that's normal

During a call you might notice the same sentence appear under both Me and Them. This is expected and not a bug.

Daisy captures two independent audio streams while you record:

  • Me — your microphone.
  • Them — the system audio coming out of your speakers (the other side of the call).

Your microphone can pick up the other person's voice through your speakers, and the system stream carries it too — so the line gets transcribed twice, once on each side. This is a side effect of recording locally without joining the call as a bot.

You don't need to do anything about it. When the meeting ends, Daisy reprocesses the recording end-to-end:

  • The two streams are matched against your voiceprints to label each speaker.
  • Duplicates across Me and Them are merged into a single attributed turn.
  • The final transcript, summary, and search index all use that cleaned-up version.

The live transcript is a preview — the saved transcript is the source of truth.

Using headphones eliminates almost all of this at the source: the other side's audio never leaks back into your mic.

What changes when you stop

When you press Finish & summarize, Daisy doesn't start over. It keeps the text it already has, then sorts out who said what, removes the duplicates between Me and Them, and writes the summary. A 12-minute meeting is usually ready in about four minutes. See Transcription speed and cost for the numbers.