“Prove it”: how to fairly measure the impact of communication assistants in healthcare

TL;DR: This guide shows how to move from an enthusiastic pilot to solid evidence on how a communication assistant affects care. It offers a simple metric pyramid, an honest comparative design, clear success and safety criteria, and reporting rules that avoid finger‑pointing. It’s practical and feasible under time pressure.

Separate piloting from proof and write down your hypotheses.
Build a metric pyramid: behaviors, experience, outcomes.
Log tool usage, not conversation content.
Plan a comparison and blunt the novelty effect.
Set success thresholds and safety criteria up front.

Key takeaway

Coach Em supports managers when they need to clarify differences or defuse tension in the group. With a diagnosis that considers preferences and motivators, interpersonal communication at work becomes more precise and less stressful. Support is available on demand, helping maintain high productivity without unnecessary barriers.

Watch the video on YouTube

Pilot first, then proof: hypotheses and hard metrics

A pilot checks feasibility and acceptance; proof shows real impact on measurable outcomes. Start with a short hypothesis table: what should improve, for whom, and by when. Example: “Among patients discharged from general medicine, the share answering ‘I understood the plan’ increases by 10 percentage points within 4 weeks.” Choose metrics that can’t be conjured up with ad‑hoc claims: PREMs (patient experience), communication‑related complaints, callbacks/contacts within 7 and 30 days, adherence to agreed plans, staff workload (e.g., minutes per visit). Specify which visits are in scope (e.g., follow‑ups vs first visits) and how you exclude atypical cases. This structure lets you quickly test feasibility and, next, whether the tool adds measurable value.

The metric pyramid: behaviors → patient experience → consequences

Arrange metrics into a simple pyramid to see where the effect gets lost. The base is clinician behaviors: Was there a brief visit summary, did the patient paraphrase in their own words, was a fallback plan set for deterioration? The middle is patient experience: Did they feel heard, do they understand the plan, do they know when and whom to contact? The top is consequences: adherence, complaint volume, repeat contacts, avoidable returns. In a minimal study, only log “was the tool used” and “when” (timestamp and visit type), not the conversation content—ethical and data‑minimal. If there’s no effect at the top, check whether the base happened at all (e.g., few cases with paraphrasing). That tells you whether the issue is low uptake or an ineffective technique.

Novelty effect and a fair comparative design

Treat the first 2–3 weeks as implementation and exclude them from impact analysis. If possible, use a before/after design with a parallel control group (a unit/clinic without the intervention) or a staged rollout (cluster stepped‑wedge). This is ethical—everyone eventually gets the intervention—and you get a comparison. Plan time windows to avoid seasonality (e.g., holidays) and organizational changes that could skew results. Where feasible, model time trends to separate natural learning from the tool’s effect. A minimal protocol covers: who starts when; which metrics in which weeks; and what “standard care without the tool” means.

Set success and safety criteria before you begin

Define thresholds for success before you see data. For example: +10 percentage points for “I understood the plan,” −20% communication‑related complaints, no increase in visit time by more than 1 minute. Add safety criteria: no rise in dropped handoffs (e.g., missing “what to do if worse”), no signs of worse experience in vulnerable groups. Pre‑write which results trigger which decision: “scale,” “iterate” (e.g., extra training on paraphrasing), or “shut down.” This protects against post‑hoc narratives and builds team trust.

Evaluate the tool, not people: team‑level, anonymous reporting

Report at team and process level, not by name. If you analyze differences in use, rely on anonymized IDs solely to support implementation (e.g., extra coaching or brief checklist‑based supervision). Be explicit: the goal is to improve the tool and the process, not to grade individuals. In practice, this lowers resistance, improves data quality, and reduces gaming. A short script for leaders: “We’re looking at the process, not for culprits; individual data is only for support, not evaluation.” That encourages people to use the tool and report usage honestly.

Intervention description, versions, real‑world data, and the go/no‑go call

Document the intervention precisely: tool description, version, human role, context of use, and “standard of care” without the tool. If the tool changes midstream, treat each major change as a new version and mark its active period. Monitor side effects (e.g., longer visits, confusing patient instructions) and record how you respond. Bring in real‑world data (RWD/RWE) after the controlled phase, but guard quality: source, cleaning, gaps, and potential biases. The best path is: first a controlled rollout (directional evidence), then real‑world monitoring (stability and safety). Finally, stick to the decision plan: when to scale, refine, or sunset; prepare a brief, plain‑language “pilot report” for stakeholders.

A fair “prove it” starts by separating pilot from proof and setting clear hypotheses. The metric pyramid helps pinpoint where effects vanish: behaviors, experience, or outcomes. Comparative design and damping the novelty effect protect against illusions. Predefined success and safety thresholds prevent narrative spin. Team‑level reporting reduces pushback and strengthens data quality. Solid versioning and prudent use of real‑world data lead to a clear call: scale, iterate, or stop.

Empatyzer: bridging the gap from pilot to hard data

In organizations intent on moving honestly from pilot to hard evidence, Empatyzer helps teams quickly align on hypotheses and simple behavior markers like summarizing, paraphrasing, and setting a fallback plan. The assistant “Em” runs 24/7 and suggests short, ready‑to‑use phrases for the very next shift, making standardization easier and reducing variation across people. After a visit, “Em” can guide a brief self‑reflection with minimal logging (“used?” “when?” “what got in the way?”) without storing conversation content, reinforcing data‑minimization principles. The organization sees only aggregated results, which eases pressure on individuals and supports candid feedback; Empatyzer is not used for hiring or performance evaluation. Twice‑weekly micro‑lessons help lock in communication habits needed to deliver on the metric pyramid amid daily rush. Teams reach the point where impact can be assessed rigorously sooner—and can decide to scale or iterate. In addition, a personal communication‑style snapshot helps leaders tailor support to team needs, often cutting “implementation noise” and improving data quality.

Author: Empatyzer

Published: 5.November (Wednesday), 2025 (elapsed: 16 weeks, 6 days)

Updated: 5.November (Wednesday), 2025 (elapsed: 16 weeks, 6 days)