Privacy & Security

Cloud Dictation Privacy: What to Read in the Fine Print Before You Trust Any App

May 2026 · 7 min read

When a dictation tool runs in the cloud, your voice is leaving your machine. The audio goes to a server, gets transcribed, and the text comes back. That round-trip is what makes the service work — and it is also what creates a long list of privacy questions you would never have to ask if the tool ran locally.

This article is a field guide to the provisions that actually matter in a cloud dictation tool's terms of service and privacy policy. Read these before you trust any vendor with your voice — Wispr Flow, Otter, Otter for Business, the Whisper-based competitors, the major-platform built-ins, anyone. The provisions vary, the language hides in different sections, but the pattern of what to look for is the same.

The Five Provisions That Matter

Every cloud dictation service's policy can be read against five questions. If you cannot find clear answers to all of them, that is itself the answer.

1. Where is your audio processed?

Look for the section titled "Sub-processors," "Service Providers," "Where We Process Data," or sometimes buried in "How We Use Your Information." This will list the cloud providers — usually some mix of AWS, Google Cloud, Azure, and AI-specific vendors like OpenAI, Anthropic, or Deepgram.

The crucial thing to check is jurisdiction. If you are in the EU and the tool's audio processing is in the United States, the data has crossed a regulatory boundary. The EU has GDPR. The US does not have a federal equivalent. The legal frameworks for what a vendor can do with your data — and what a foreign government can compel them to disclose — are different.

For sensitive professional work (legal, medical, journalism, anything under NDA), this is not theoretical. It changes what your data is exposed to.

2. What rights do they grant themselves over your content?

Look for "License You Grant Us," "Our Use of Your Content," "Customer Content," or similar. You are looking for language like:

"You grant us a worldwide, non-exclusive, royalty-free license to use, reproduce, modify, adapt, publish, translate, and distribute your content..."

The phrase "to provide the service" sounds limited but is broader than it reads. It typically permits the vendor to display your audio to support staff, run it through analysis pipelines, derive metadata from it, and store it as long as their retention policy allows.

The more concerning variants extend the license to product improvement or research purposes — language that authorizes the vendor to feed your dictations into model training. Some vendors offer a way to opt out of this; many do not. Some opt-outs are honored only for paid tiers.

3. How long is your audio retained?

Look for "Data Retention," "How Long We Keep Your Information," or check the privacy policy's appendix. Typical patterns:

Indefinitely — until the user deletes it. (Many SaaS dictation tools.)
30 to 90 days — common for "transient" processing of free-tier audio.
Immediately discarded after transcription — the strongest claim, and the one to verify. If they say this, look for a separate provision about backups; deleted-but-in-backup data may persist for an additional 30–90 days.

Anything labeled "until you delete your account" means it is stored for the lifetime of the relationship. If you dictate a sensitive draft and then forget about it, the audio is on someone else's hard drive years later.

4. What survives account deletion?

Look for "Account Deletion," "Termination," or "Your Rights." When you delete your account, what is irrecoverable and what is kept?

Most policies disclose at least one of these carve-outs:

Anonymized data is retained. Once "anonymized," the data is no longer considered personal under most privacy laws and can be kept indefinitely. Whether anonymization actually defeats re-identification on voice is an open research question.
Aggregated metrics. Usage statistics, error logs, performance data — these are typically kept regardless of account status.
Backups. Even if the live database deletes your record, backups may roll off only at the next backup cycle (often quarterly).
Legal hold material. If any of your dictations were flagged or referenced in a legal dispute, they will be retained until that resolves.

The realistic mental model: an account deletion is not a guarantee of erasure. It is a guarantee that the live system stops returning your data to you.

5. What permissions does the app demand?

This is not in the TOS — it is in the install dialog. A cloud dictation app's permission footprint reveals what data flows are technically possible.

The minimum a dictation app needs is microphone access. Anything beyond that warrants a question:

Screen recording. The app can capture everything visible on your screen. Some tools do this for "context awareness." It also means your IDE, your email, your client documents, and your terminal are accessible to the app's processes — and possibly transmitted.
Accessibility access. The app can read text in any field and inject text into any app. Necessary for tools that "type for you," but it also enables broad observation of what you are typing.
Full disk access. The app can read any file on your machine. Almost no dictation app has a legitimate reason for this.
Network requests during dictation. Verifiable with Activity Monitor, Little Snitch, or lsof. A pure on-device tool will make zero outbound connections during transcription.

We covered the implications of screen capture in a separate post — see Why your AI coding tool shouldn't see your screen for the developer-specific case.

What Wispr Flow's Architecture Implies

We are not going to quote Wispr Flow's terms back at you — they are public, and any specific provision can be revised without notice. But the architecture of the product implies what their terms have to permit.

Wispr Flow processes audio in the cloud. That is core to how the product works. The Wispr application records audio on your Mac and uploads it to Wispr's servers for transcription and post-processing through a language model. The text comes back. That round-trip means the policy must, at minimum, permit transmission and processing of your audio on their servers.

Wispr Flow also captures screenshots of your active window and uses them as context for the language model. That means the policy must permit capture, transmission, and processing of screen content — including, potentially, anything visible behind your active window.

Whether their stated retention is "transient" or "30 days" or "indefinite," the technical architecture is what determines exposure. Audio that has been transmitted to a third-party server is, for any privacy-sensitive purpose, no longer private — regardless of what their policy says they do with it next.

What "On-Device" Actually Eliminates

A truly on-device dictation tool does not have most of the provisions above to worry about, because the data flows that would require them never happen.

SpeakUp processes audio on your Mac's GPU using Metal acceleration. The application has microphone permission and nothing else. It has no screen recording capability. It has no network capability — verifiable with any network monitor. There is no account, no login, no telemetry, no analytics beacon during dictation.

Because the audio never leaves the device:

There is no sub-processor list to audit.
There is no license you grant for cloud usage, because there is no cloud usage.
There is no retention policy for audio, because the audio is discarded as soon as transcription completes.
There is no deletion process to verify, because there is nothing on a remote server to delete.
There is no permission scope question beyond the microphone.

That is what people mean when they say on-device is a different category. It is not a "privacy mode." It is a different architecture, where the privacy questions have already been answered by what the application is technically capable of.

What This Means for Your Decision

If your dictation use is casual — a tweet, a search query, a text message — cloud dictation is fine. Most people will never read the TOS, and most never need to.

If your dictation involves anything that has obligations attached — client work, employer code, patient notes, legal drafts, anything covered by NDA, anything you would not paste into a public webpage — the right answer is to use a tool where the audio physically cannot leave the device. Then you do not need to interpret any provision in any policy.

SpeakUp is built on that constraint. Microphone in, text out, nothing transmitted. €29 once.