Q: Can the agent run on-premise / air-gapped?

Yes. For organizations that can't allow cloud LLM inference (typically VHA, certain Catholic Health systems, and specific state Medicaid integrators), we deploy fine-tuned Llama 3.3 (or Qwen 2.5 for multilingual environments) on the client's on-premise NVIDIA H100 cluster, with ambient scribe / RAG agents running entirely within the facility network. Performance is competitive with frontier models for narrow clinical tasks (we measure within 4-7% on internal SOAP-note quality benchmarks vs Claude Sonnet on the same prompts); engineering effort is meaningfully higher (typically +30-40% engagement scope for the on-prem variant, primarily for inference infrastructure setup and model serving optimization with vLLM or TensorRT-LLM).

Question 1

Can I use OpenAI's standard API for healthcare AI agents?

Accepted Answer

Not for any system touching PHI. OpenAI's standard endpoints don't include a Business Associate Agreement (BAA) required by HIPAA: OpenAI offers BAAs only on the Enterprise / ChatGPT Enterprise tier (and only after a separate vendor risk review). For most BearPlex healthcare deployments we use Bedrock + Anthropic Claude under the AWS BAA, deployed in the client's VPC. Azure OpenAI Service is the alternative for Microsoft-stack clients, with the equivalent Microsoft BAA. Both options keep PHI within the BAA-covered cloud account end-to-end.

Question 2

How do you handle multi-step agent attribution for medical board review?

Accepted Answer

Every consequential action in the agent trace is logged to immutable storage (we use AWS QLDB or PostgreSQL with append-only audit tables) keyed to the clinician's NPI on each approval step. When a state medical board reviews a clinical decision the AI participated in, we can produce the full agent trace: which model version drafted, what evidence it retrieved, which clinician approved each step, and at what timestamp. This satisfies the attribution requirement that Texas, Florida, and California medical boards have all published guidance on for AI-augmented clinical workflows.

Question 3

Do BearPlex healthcare agents require FDA SaMD clearance?

Accepted Answer

Most don't, because they keep a clinician in the loop on consequential decisions: this is the FDA's 'augmented' category under [the Clinical Decision Support guidance](https://www.fda.gov/medical-devices/software-medical-device-samd/clinical-decision-support-software), exempt from premarket SaMD clearance. If you want fully autonomous clinical decision-making (no human review), you're in regulated SaMD territory and clearance becomes part of the engagement (we've supported one client through a 510(k) pathway for an AI triage tool, which adds 12-18 months and $400-800K to the program).

Question 4

Can the agent integrate with Epic without going through App Orchard?

Accepted Answer

Yes for read access via FHIR R4 (most BearPlex healthcare agents use this path): Epic exposes FHIR endpoints to authenticated apps without requiring App Orchard listing. App Orchard listing becomes important for write access at scale, distribution to other Epic customers, and access to certain non-FHIR APIs. For a single-customer deployment writing back via FHIR DocumentReference (e.g., ambient scribe inserting SOAP notes), Orchard listing is typically not required, but the customer's Epic account team must approve the integration. We've shipped 4 healthcare agents via the non-Orchard path; 2 have since pursued Orchard listing for cross-organization deployment.

Question 5

How do you handle hallucinations in clinical content?

Accepted Answer

Three layers because no single defense is sufficient in clinical context. (1) RAG with citation tracking: every clinical claim must reference a source document (we typically use [Anthropic's Citations API](https://docs.anthropic.com/en/docs/build-with-claude/citations) when on Claude, custom citation extraction on other models). (2) Constitutional AI-style guardrails specific to the clinical context: refuses to fabricate dosages, lab values, or diagnostic codes; flags uncertainty explicitly. (3) Mandatory clinician review on any output that affects patient care, with the agent's reasoning trace surfaced to the reviewer. Pure prompt-engineering defenses against hallucination aren't sufficient when malpractice liability is at stake: the [Mata v. Avianca (S.D.N.Y. 2023)](https://storage.courtlistener.com/recap/gov.uscourts.nysd.575368/gov.uscourts.nysd.575368.54.0_4.pdf) precedent in legal contexts is being actively cited in physician-AI cases.

Question 6

How do you handle SOC 2 + HITRUST + HIPAA evidence collection across one engagement?

Accepted Answer

We sequence them: HIPAA BAA chain validation in weeks 1-2 (covers the basics: covered entity, business associate, subcontractor BAA flow), HITRUST CSF v11.4 evidence collection in weeks 3-6 (the AI Risk Management track requires specific documentation around model selection, prompt evaluation, and inference logging), SOC 2 Type II evidence collection runs continuously throughout the engagement and produces the audit-ready evidence package by week 12. We work with the customer's existing GRC tooling (Vanta, Drata, Tugboat) where present, and ship Markdown evidence dossiers when no GRC platform exists. Roughly 25% of total engagement effort goes to compliance documentation in heavily-regulated deployments.

Question 7

What's the ROI math on ambient scribe vs prior authorization automation?

Accepted Answer

Ambient scribe ROI is faster but smaller per deployment: 2.4-3.1 hours per physician per day reclaimed (our measurements), translating to 1-2 additional patient visits per day per physician at typical RVU economics. For a 50-physician medical group, that's $1.8-3.6M annual additional revenue against engagement cost of $200-350K. Prior auth automation ROI is slower to realize but larger per organization: cycle time from 14 days to <24 hours reduces denial-related write-offs by 8-15% (per our 3 deployments) and frees 0.5-1.5 FTE per 1,000 monthly PAs. For a 250-bed hospital running ~3,000 monthly PAs, savings land at $400-900K annually against engagement cost of $300-600K. Most clients sequence: ambient scribe first to build physician trust and quick win, prior auth second once governance scaffolding is in place.

Question 8

Can the agent run on-premise / air-gapped?

Accepted Answer

Yes. For organizations that can't allow cloud LLM inference (typically VHA, certain Catholic Health systems, and specific state Medicaid integrators), we deploy fine-tuned Llama 3.3 (or Qwen 2.5 for multilingual environments) on the client's on-premise NVIDIA H100 cluster, with ambient scribe / RAG agents running entirely within the facility network. Performance is competitive with frontier models for narrow clinical tasks (we measure within 4-7% on internal SOAP-note quality benchmarks vs Claude Sonnet on the same prompts); engineering effort is meaningfully higher (typically +30-40% engagement scope for the on-prem variant, primarily for inference infrastructure setup and model serving optimization with vLLM or TensorRT-LLM).

Application	Description	Timeline	Tech stack
Prior authorization automation	Agentic workflow retrieves payor policy, extracts clinical evidence via FHIR, drafts PA submissions, and cuts PA cycle time from 14 days to under 24 hours.	10-14 weeks	LangGraph · Anthropic Claude (Bedrock + BAA) · Epic / Cerner FHIR R4 + Da Vinci PAS · Sovereign deployment in client VPC
Ambient clinical documentation (AI scribe)	Listens to clinical encounters, generates SOAP notes with ICD-10 and CPT codes for EHR sign-off, and cuts 2.4-3.1 hours of daily charting per physician.	8-12 weeks	Whisper Large v3 (sovereign on-prem) · Fine-tuned Llama 3.3 70B with clinical SFT · Epic / Cerner FHIR write APIs · On-prem GPU inference (NVIDIA H100 cluster)
Insurance claims processing	Multi-agent system intakes claims via X12 837 EDI, flags denial risks before submission, routes complex claims to adjusters, and cuts clean-claim cycles 38%.	12-16 weeks	LangGraph · RAG over policy documents (Pinecone) · AWS Bedrock with HIPAA BAA · X12 837/835 EDI integration · Adjuster workflow embed in Pega / Guidewire
Patient navigation and triage	Patient-facing agent for scheduling, refills, and follow-up via FHIR, triaging clinical questions per ESI criteria with a clear escalation path.	10-14 weeks	LangGraph + tool use · Anthropic Claude with BAA · EHR scheduling FHIR API integration · RAG over patient education library + ICD-10 SNOMED CT mappings
Drug discovery research assistance	Pharma agent searches PubMed, ClinicalTrials.gov, and internal libraries, drafting research summaries with PRISMA-style attribution for hit-to-lead work.	8-12 weeks	LlamaIndex for biomedical RAG · Claude Opus or fine-tuned Llama 3.3 · PubMed E-utilities / FAERS / ClinicalTrials.gov API integration · Sovereign deployment

AI Agents for Healthcare: HIPAA-Compliant Workflow Automation

Why Autonomous AI Agents matters in Healthcare (Providers, Pharma, Medical Devices)

The compliance surface rules out most managed AI services

The accuracy bar is unforgiving

Typical autonomous ai agents use cases in healthcare (providers, pharma, medical devices)

What we've learned deploying autonomous ai agents in healthcare (providers, pharma, medical devices)

1. Scope ruthlessly: autonomy fantasies don't survive malpractice risk

2. Ambient scribe is the highest-ROI starter use case

3. Sovereign deployment ≠ unsupported deployment

Healthcare (Providers, Pharma, Medical Devices) compliance considerations

Two compliance challenges unique to agents (vs RAG)

Frameworks that apply to every healthcare agent deployment

Common questions

This service in other industries

Other services for Healthcare

Featured case studies

Ready to deploy autonomous ai agents in healthcare (providers, pharma, medical devices)?