PII Scrubbing
Defence-in-depth on dictation and chat input
Legal pages
Contents
What gets removed
Contents
What gets removed
What gets removed
In main chat, portfolio chat, fact-checking, task labels, and ordinary dictation, ClinicQuest runs your text through a server-side scrubber before it is persisted, sent to the AI subprocessor, or surfaced in logs. The scrubber strips five categories of structured identifier:
- NHS numbers — 10-digit values that pass the NHS Mod-11 checksum. Random 10-digit runs (order IDs, reference numbers) are left intact because they fail the checksum.
- Full UK postcodes — outward + inward (e.g.
SW1A 1AA). Lone outward codes such asSW1Aare kept because they are commonly non-identifying and produce false positives in clinical context. - UK phone numbers — 11-digit national form (
07…,02…,01…) and the international+44form. - Email addresses — anything matching a standard email shape.
- Dates of birth — only when surrounded by a date-of-birth cue. See the next section for how this is detected.
When the scrubber removes something, you'll see a placeholder like [redacted-nhs] or [redacted-dob] in the message that the AI receives
and the message that lands in your history. The placeholder tells you which category was removed.
Why we do this
ClinicQuest's Terms §4.1 require all clinical cases discussed or recorded in your portfolio to be fully anonymised. The scrubber does not replace that obligation — you are still the Data Controller and the GMC's confidentiality guidance applies to everything you input.
What the scrubber does do is reduce the impact if a structured identifier slips through accidentally — most commonly during voice dictation, when a clinician dictates an NHS number or postcode out loud without thinking. Without the scrubber, that identifier would land in our database and reach the AI provider (under contractual Zero Data Retention, but still in transit). With the scrubber, the identifier is replaced with a placeholder before either of those things happen.
AKT Teacher and the Notebook Agent are educational study panels. They do not run this pre-scrub, including when AI dictation is launched from those panels, so do not enter patient identifiers there.
This matches the defence-in-depth approach described in Terms §4.1B and Privacy Policy §6B.
How dates of birth are detected
Clinical dictation is full of dates that are not identifying — investigation dates, admission dates, last-review dates, prescription start dates. Stripping every date would break the product. ICO and Caldicott guidance is also clear that a date is identifying primarily when it is a date of birth combined with other identifiers; a standalone clinical event date in an otherwise anonymised note is not.
So the scrubber only redacts a date when a date-of-birth cue appears within ±10 whitespace tokens of the date itself. Cues include:
DOB,D.O.B.date of birth,birth date,birthdateyear of birthborn,born on
Worked examples (cue text shown in bold):
- "DOB 25/06/1972, attended today" → date is redacted.
- "date of birth: 25th June 1972, confirmed" → date is redacted.
- "patient's last eGFR was 36 on 25/06/2025" → date is left intact (no cue nearby).
- "admitted 25/06/2025 with chest pain" → date is left intact.
- "started ramipril 12/04/2024, last review 25/06/2025" → both dates are left intact.
One known false-positive: when a clinician writes "DOB 25/06/1972, last bloods 25/06/2025" the cue window covers both dates, so the second date is also redacted. We've accepted that trade-off because erring on the side of redacting clinical-event dates near a DOB cue is safer than leaking the DOB.
What this does NOT do
The scrubber is pattern-based. It does not:
- Detect free-text patient names. "Mr Patel reviewed today" is left unchanged — name detection requires a classifier or NER pass and is on the roadmap.
- Catch unusual identifier formats outside the categories listed in "What gets removed" (e.g. hospital MRNs, GP-practice codes, full home addresses).
- Replace your duty under §4.1. If you wouldn’t read it aloud in a case presentation, don’t dictate it or type it here.
If you spot a category that should be added, contact support@clinicquest.uk.
What you see in chat
On scrubbed chat and dictation surfaces, the scrubbed text is the canonical version of your message. That means three places are guaranteed to match:
- The chat bubble in your message history.
- The text stored in our UK-hosted database.
- The text that reached the AI subprocessor.
If the scrubber removed anything, a small toast appears at the top of the chat confirming the count and category. You can dismiss the toast or open this help page from it. Re-loading the chat thread shows the redacted message immediately — no separate "raw" copy of your text exists anywhere in the system once the scrub completes.
See also: Terms §4.1B, Privacy Policy §2.4 and §6B, Subprocessors.
AI Hygiene Score
Each scrub also writes a content-free metadata event — counts only, no prompt text — that feeds your AI Hygiene Score on the profile page. The score starts at 50 (neutral) and moves with the last 30 days of your prompts: clean prompt rate, identifier severity, high-risk prompt count, clean streak, and a 14-day trend.
It is feedback, not compliance. There is no league table and the dashboard is private to you. The intent is to make "anonymise by default" a visible habit while clinical AI tools are still new.
The dashboard also surfaces your most common redaction category and a targeted nudge — e.g. if NHS numbers dominate, you'll see a reminder to use a label like "the patient" instead. See the Patient Data & AI Hygiene guide for the full walk-through.