Lexicons: Teaching SpeakUp Your Field's Vocabulary
SpeakUp is great at everyday speech. Meetings, emails, slack messages, journal entries — it handles those out of the box. Where it used to stumble was the moment you said something only people in your field say. A doctor rattling off a drug name. A developer dropping a framework into a code review. A legal professional reading out a case citation.
For the last few months we've been building something called Lexicons to fix that. This post is a status update: what Lexicons are, what we've shipped so far, and what we've learned about when they actually help.
What a Lexicon is, in one paragraph
A Lexicon is a free add-on that teaches SpeakUp the vocabulary of a specific field. You turn one on in Settings → Lexicons and it quietly helps with the technical words in that area. Everything still runs on your Mac — no internet, no cloud, no account. And because a Lexicon only activates when you dictate in the language it's built for, it won't meddle with your everyday speech.
We treat each Lexicon as its own small project. Before we ship one, we build a test set of realistic sentences, run it with and without the Lexicon, and check whether the Lexicon actually makes things better — and, just as important, whether it makes anything worse.
What we've shipped
Medical — German
This is the one we've put the most work into. The German medical Lexicon is built from three well-known reference sources:
- ICD-10-GM (2026 edition) — the official German version of the World Health Organization's disease classification, published by BfArM, the German federal institute that maintains medical codings.
- OPS 2026 — the German procedure classification (surgeries, interventions, diagnostics), also from BfArM.
- German MeSH — the medical subject headings used by German biomedical libraries, maintained by ZB MED.
Together these gave us roughly 180,000 German medical terms — diagnoses, procedures, anatomy, medication names.
On our medical-German test set, turning the Lexicon on got about 15 more medical terms right for every 100 you dictate. That's a big jump — the difference between "mostly correct, some re-reading" and "I can trust this". Feedback from a medical professional using SpeakUp matched the numbers: "Medical word recognition works very well."
We also ran a careful safety check. We took 200 everyday German sentences — groceries, weather, politics, small talk — and recorded them with eight different German voices, including regional accents. With the medical Lexicon turned on, not a single one of those everyday sentences came out worse. That was the bar we wanted to clear before shipping: fix the specialist words, don't break the normal ones.
Medical — Italian
The Italian medical Lexicon, which ships in SpeakUp 1.0.26, is built from a different set of sources — we used the best freely available Italian medical references we could find:
- ICD-10 Italian translation — the Italian version of the WHO disease classification, published by the Italian National Institute of Health (ISS).
- AIFA active ingredients — the official list of drug active ingredients from AIFA, the Italian medicines agency.
- AIFA drug classification (ATC) — the Italian version of the international drug-class taxonomy, also from AIFA.
That's about 17,500 Italian medical terms.
The improvement on our Italian test set is smaller than the German one — roughly 2 to 3 more medical terms right for every 100 you dictate. That's a modest but real lift. It won't feel as dramatic as the German version, partly because Whisper (the engine underneath SpeakUp) already handles spoken Italian quite well on its own. If Italian medical dictation is your daily work, this will help; if you're writing in Italian casually, you probably won't notice much.
Software engineering — English
Developers dictate a very specific kind of English. Framework names, command-line tools, cloud services, acronyms that were never meant to be spoken. "Pydantic" is not in Whisper's training data as a word; it's just a surprisingly specific rearrangement of sounds.
For this Lexicon we didn't use an external source. We wrote the vocabulary by hand — roughly 1,000 terms covering the tools, languages, and concepts that come up in code reviews, commit messages, and engineering chat. Git, Kubernetes, SQL dialects, JavaScript frameworks, infrastructure-as-code names, the usual alphabet soup.
On our software-engineering test set, turning the Lexicon on got about 4 more technical terms right for every 100 you dictate. Smaller than the German medical jump, but for a developer writing a PR description or a code review, that's the difference between "I have to fix Hetzner three times a week" and "I don't think about it anymore".
What we decided not to ship (yet)
We also built a medical Lexicon for English, using ICD-10-CM and the English version of MeSH — roughly 247,000 medical terms. We tested it the same way as the others.
The result was a bit of a surprise: on English, Whisper already gets over 93 out of 100 medical terms right on its own. Adding the Lexicon only moved that by a fraction of a percent, while introducing small regressions elsewhere. The remaining 6% of errors were a specific kind that a dictionary can't fix — they're confident mishearings ("denosumab" becoming "Dinosumab"), not word-choice slips.
So we held it back. A Lexicon is only worth shipping if it clearly makes things better for most users, and in this case it didn't. English medical accuracy is a harder problem that we'll tackle with a different approach down the line.
This is the principle we're trying to hold: ship a Lexicon when it clearly helps, skip it when it doesn't.
How to try one
Open SpeakUp → Settings → Lexicons. Pick the one that matches your work. Turn it on. Dictate as usual. That's the whole setup.
If you dictate in more than one language, you'll see a small note when a Lexicon is attached to a language you haven't enabled — it'll stay quiet until you add that language under General → Languages. That's on purpose: we don't want an Italian medical Lexicon trying to second-guess your English dictation.
Lexicons are free. They're included in SpeakUp and always will be.
What we're working on next
A few threads are open:
- Medical German Lexicon refinements — we keep discovering specialist terms that aren't in the official references but come up in clinical dictation. We're collecting those and expanding carefully.
- A better approach for English medical — since a dictionary alone doesn't help here, we're looking at techniques that work at the speech-recognition layer itself. More on that in a future post.
- A legal Lexicon — case citations, court procedure vocabulary, statute names. Still in early research.
- User-added terms — a way for you to add your own specialist vocabulary (client names, product names, internal jargon) without waiting for us to ship an official Lexicon. This is the one we're most excited about.
If you have a field you'd like to see a Lexicon for, or you've hit a specific word that SpeakUp keeps getting wrong, send us an email. Our Lexicon decisions are driven by what our users actually dictate, not by what looks good in a marketing slide.
Thanks for reading. If you're already on 1.0.26, the new Lexicons are in Settings waiting for you. If you're not, the latest version is here.