A/B testing conversations in healthcare: small, ethical tweaks and simple measurement (PDSA)

TL;DR: This article shows how to run small, ethical A/B tests in patient conversations using the PDSA cycle (Plan–Do–Study–Act). It focuses on tiny changes to wording and order—without touching clinical standards—and on lightweight measurement. You’ll find step-by-step guidance, sample scripts, and safety criteria to stop a test the moment something goes off track.

  • Change one thing at a time, for a short period.
  • Set a clear goal and a behavioral hypothesis.
  • Measure teach-back and “time to understanding.”
  • Capture a minimal data set from conversations.
  • Decide: adopt, adapt, or abandon.

Key takeaway

Short micro-lessons help you keep a development rhythm without stepping away from tasks for hours. Em analyzes each person’s style so internal communication training is precisely matched to their needs. The virtual coach is available before every difficult meeting, removing the need to wait for an open mentor slot.

Watch the video on YouTube

Small, ethical conversation tests (PDSA) — what this means

Here, an A/B test is a short, safe comparison of two versions of a message to see which one better supports patient understanding. It’s not a medical experiment on the patient; it’s a test of how you say things—fully within clinical standards. You change one element at a time, briefly and in controlled conditions, such as sentence order or adding a teach-back request. Before you begin, set a one-sentence goal like, “After the visit, the patient can repeat the three key instructions.” If the test concerns treatment information, the content must follow guidelines and approved medical information—the change is only in delivery. This approach enables incremental improvements instead of big-bang overhauls. Safety is paramount: the standard of care does not change, and you can stop the test at any time.

PLAN: goal, hypothesis, success criteria, and a clear stop rule

In planning, pick 3–5 key terms and write a single behavioral hypothesis, for example: “If we condense the instructions to three steps and ask for a teach-back, more patients will correctly describe the dosing.” Define simple success criteria: the share of patients who can describe, in their own words, what they’ll do at home, and the “time to understanding,” counted in rounds of clarification. Specify two versions—A and B—without changing clinical content, e.g., A: “Take 2 tablets in the morning and 1 in the evening, that’s 2-0-1”; B: “Two tablets in the morning, one in the evening. Could you say that back in your own words?” Set a stop rule: if patient anxiety rises, understanding drops, or misunderstandings appear, stop immediately and return to standard phrasing. Also decide in advance how many cases and over what timeframe you’ll test (e.g., one shift, one clinic). Good planning protects patients and the team while making results easy to interpret.

DO: small sample, stable conditions, minimal data

Run the test on a small, similar group under consistent conditions so results aren’t random. Record a minimal data set for clarity: context (clinic/shift), version used (A or B), patient reactions (questions, concerns), teach-back outcome (accurate/incomplete/inaccurate), and approximate conversation time. Note whether extra clarification was needed and what triggered it. Evaluate the content and process—not the people. The sentences are on trial, not the staff. If there’s a “novelty effect,” plan a second short cycle a week later when curiosity has faded. The goal is a repeatable signal, not a one-off spike. That way, your findings are reliable and actionable.

STUDY: find the moments where things break down

Don’t just look at averages—look for the points where patients lose the thread or tension rises. Identify unclear words, check whether numbers landed, and note when extra questions surfaced. Simple visuals help: a run chart (teach-backs correct over successive cases) or a brief A/B table with comments. Add a quick qualitative check: ask, “What was most clear today, and what was least clear?” One question often pinpoints a fixable fragment instead of yielding vague reflections. Examine distributions, not only means, to see whether issues cluster at specific moments in the conversation. The “STUDY” outcome should be a precise, small adjustment ready for the next cycle.

ACT: adopt, adapt, or abandon—and make it stick

After analysis, choose one: Adopt, Adapt, or Abandon. Capture the decision in one operational sentence, e.g., “Adopt: We end every high-risk medication discussion with a teach-back and a contingency plan.” To lock in the change, create a mini-template in documentation (e.g., three fields: what, how much, when), a cue card at the workstation, or a short system cheat sheet. Hold a two-minute team huddle with a “good sentence” example and when to use it. Check in a week later to see if the habit stuck, and if needed, run a small follow-up cycle. Without sustainment, even good changes fade fast. The real outcome of “ACT” is a visible, everyday habit—not a document on a shelf.

What to test first—and what not to test

Start with low-cost changes in areas that most often derail understanding: information order, sentence length, using numbers instead of adjectives (“2 tablets in the morning”), and a short “red-flag symptoms” checklist with a clear contingency plan. Replacing words like “increase/decrease” or “often/rarely” with specific values frequently boosts clarity. Avoid tests that could alter clinical decisions or create unequal access to information; when in doubt, review the plan with quality or ethics leadership. A “micro-script library” helps—5–10 sentences for common scenarios—each with a “simple” and a “more detailed” version. The team picks the variant based on the patient and time available, and always requests a teach-back. This library speeds up future tests and shortens the path to a stable habit.

Small A/B tests within a PDSA cycle help improve communication without risking the standard of care. The keys: a clear goal, one hypothesis, a brief test, and simple measures—teach-back and the number of clarification rounds. Collect lean but consistent data and close analysis with one qualitative question. Make an operational decision and hardwire it with small templates and a short team brief. After 2–3 cycles, capture one-page “conversation rules” and train new staff on real examples. Track one process metric weekly (e.g., teach-back: yes/no) and one outcome metric (e.g., fewer dosing-clarification calls). This cadence closes the learning loop and drives steady, safe improvement.

Empatyzer in A/B conversation tests and closing the PDSA loop

The “Em” assistant in Empatyzer helps teams quickly craft two versions of a key sentence and choose simpler words for short conversation tests. In practice, it also supports selecting 3–5 keywords and shaping teach-back prompts so the plan is ready before a shift. After the conversation, Em helps draft a brief PDSA note and pinpoint where the patient lost the thread, so the next cycle starts with a concrete fix. Teams can view aggregated communication patterns in Empatyzer (without access to individual data), making it easier to agree on a shared micro-script library. Short micro-lessons twice a week reinforce asking for teach-back and ending with a contingency plan. Empatyzer doesn’t replace clinical training or guidelines, but it lowers the friction of planning and running small tests day to day. Privacy-by-design and a quick, integration-light setup make it feasible to start a pilot at the unit level on a predictable timeline.

Author: Empatyzer

Published:

Updated: