State of AI in Clinical Development: Reflections from SCOPE X

This year SCOPE held its first conference dedicated entirely to AI, called SCOPE X. Roughly 500 of us gathered at the Omni in the Boston Seaport, and the decision to carve a standalone AI program out of the broader SCOPE portfolio was itself worth noting. After thirty years in clinical data sciences — at Serono, at Eliassen, and across a decade at Pfizer — I have learned to read that kind of structural choice as a signal. AI in clinical development has outgrown its slot as one track among many. It now warrants its own room.

What struck me over the two days was not the volume of AI enthusiasm, which was considerable, but the quality of the realism underneath it. The people in those rooms have run pilots. Many have been burned. The conversation has matured accordingly. What follows is a fairly complete account of what I heard and what I think it means — organized roughly the way the themes built on each other across the event.

Value comes from more than the model

The opening message set the tone for everything after it: realizing value from AI takes more than technical experimentation. It requires three things working together — aligned strategic direction, strong operational execution, and sustained domain expertise. Any one of them missing, and the technology stalls. This is not what most buyers expect to hear; the instinct is still to look for a smarter algorithm. But the model is increasingly a commodity. What separates the organizations getting value from those spinning their wheels is everything around the model.

A point raised early stayed with me because it cuts against the usual framing. AI systems can actually be more predictable than people, because they apply rules consistently. When a model is wrong, you can train it and correct it. Human interpretation, by contrast, is inconsistent, because so much of it is subjective. That does not argue for removing people — quite the opposite, as I will get to — but it reframes the reliability conversation in a useful way. The challenge with AI is not that it is erratic. It is that it needs context, training, and oversight to stay accurate and useful in practice.

Piloting is easy. Scaling is the entire problem.

If one theme ran through every session, it was this. The industry has become very good at running AI pilots and remains quite bad at scaling them responsibly.

Novo Nordisk offered a vivid illustration. Faisal M Khan, PhD, Vice President, AI & Analytics at Novo Nordisk, described using AI for SDTM programming and clinical study report generation – concrete, valuable applications – while also noting that the organization already has thousands of AI applications under consideration or in use. The moment you reach that scale, the question stops being “can the model do it?” and becomes “how do we govern this?” Which use cases are regulated and which are not. What the operational impact actually is. And the harder question underneath: a model that performs well today may not hold up a year from now, which means you have to reassess models continuously to confirm they remain useful. AI needs context and oversight, not a one-time validation and a shrug.

The lesson held then as it holds now: a pilot can prove a use case, but it cannot prove an operating model. The two are different achievements, and confusing them is how organizations end up with a hundred promising experiments and no scaled capability.

Data quality is the ceiling on everything else

A panel with leaders – Chelsea Gallagher, Head, Design, Modeling and Simulation for Development, from Sanofi, Gian Prakash, Director, Data & Analytics, Information Research from AbbVie, Inc., and Arnab Roy, Associate Partner from ZS Associates – returned to the truth every veteran of clinical data management already knows in their bones: clinical development data is complicated, and data access is a genuine challenge. AI does not solve this. AI inherits it, and frequently amplifies it.

There was a subtle point raised that deserves more attention than it usually gets. Data interpretation is shaped during planning, by clinical development operations decisions about how data will be created – and if those interpretations shift during the hectic phases of a trial, particularly study startup, the downstream automation built on them will not be accurate. The practical failure is mundane and common: teams resolve issues without updating the metadata that documents what they did, and the automation quietly breaks on assumptions that are no longer true. The fix is equally mundane – use more metadata, and treat it as part of the work rather than an afterthought. And the panel was blunt about the human side of this: teams that do not upskill are not ready for what is coming.

The encouraging counter-example came from Angela Radcliffe, Founder, Intelligence Applied AI, who drew on her earlier years at Bristol Myers Squibb. Asked what had delivered the biggest payoff, she pointed to protocol digitization – work begun in 2018, now numbering around 400 digitized protocols, that measurably accelerated everything built on top of it. That is the kind of long-horizon, deeply unglamorous investment that quietly determines whether your AI initiatives have anything solid to stand on. The next frontier she named is harder still: cleaning up site master data and reliably associating each HCP with the right sites to track performance. The investment there has not paid off yet, and it remains one of the thornier data problems in the field.

One related observation, almost an aside but telling: it is currently easier to get access to synthetic data than to real clinical data. That single fact says a great deal about where our data access and governance frameworks still need work.

Workflow redesign matters more than the AI itself

I want to give this its own heading, because I think it is the single most important takeaway from SCOPE X and the hardest to act on.

Speakers from sponsors and technology providers kept arriving at the same place: redesigning workflows is the key to success. The value of AI in clinical development is not in making existing processes faster. It is in rethinking what those processes should be when AI can do structured work under supervision. Layering AI on top of a workflow that was designed when humans had to do everything by hand produces, at best, a faster version of an outdated process.

The sharpest framing I heard all conference was this: do not chase automation, chase quality. And there was a useful test offered for any pilot – afterward, ask whether the work has genuinely become simpler, or whether you have merely added a new dependency that someone now has to manage. A great many AI deployments fail this test without anyone admitting it. They add capability while quietly adding complexity. The honest measure of success is whether the net result is less work, not more tools.

This is intellectually obvious and operationally very hard. It asks an organization to rethink a process from scratch – to ask what the workflow would look like if designed today, knowing what AI can structurally do better, faster, and with higher consistency – rather than to bolt automation onto the familiar. Easy to pilot a use case; far harder to scale and maintain one. Success requires thinking strategically, technically, and operationally at the same time.

There were concrete signs of what becomes possible when this is done well. Sabrina Steffen, Vice President, Head of Data Sciences Innovation & Data Strategy, IQVIA, described a capability that reads the digital protocol, designs the forms, and creates the basic edit checks automatically – making CRF development roughly sixty-two percent faster against a standard eight-week EDC build time. The lesson she drew was unambiguous: standards matter, and you have to use them, because automation of this kind only works on a standardized foundation. Having run many use cases, the IQVIA team pointed to the three that consistently added the most value – medical writing, data reconciliation, and the detection of safety signals. That is a useful map for any organization wondering where to concentrate early effort rather than scattering it across two hundred experiments at once.

Teams have to work differently, and that means watching for drift

A consistent thread was that AI does not let teams stop thinking – it requires them to think differently. Staff need to apply critical thinking and actively watch for model drift and hallucinations rather than trusting outputs at face value. This is where change management genuinely matters, and several speakers wrestled with how to embed AI into functions like RBQM and centralized monitoring services, where elements have to be correlated across sources and much of the work still happens in silos. Getting varied use out of these tools, rather than isolated point applications, is still an unsolved operational problem for most organizations.

The push toward cleaner data at the source connects directly here. The FDA’s newly announced real-time clinical trials initiative is pressing the industry toward high-quality data captured at the source, with information flowing to regulators in near real time. Ivy Altomare, MD, Vice President of Clinical Research, Paradigm Health, described how the company is working with the FDA on a proof-of-concept study alongside an AI patient-matching recruiting solution. The implication is that this shift has to be proactive rather than reactive. Organizations waiting to clean up their data until a regulator forces the issue will find themselves badly behind, because the initiative will expose every weak link in upstream data discipline.

Technology is outpacing adoption

One of the most candid talks came from David Carruthers, Vice President, BioPharmaceuticals R&D Clinical Operations, AstraZeneca, whose central point was that technology is outpacing adoption. The problem is no longer capability; it is how we operationalize the tools we already have. AstraZeneca, like many large sponsors, has tended to develop pilots for discrete use cases and then struggled to scale the change across the organization. The scale of the integration burden is staggering when you look at it from the site’s perspective – AstraZeneca alone has fifteen different systems that sites are expected to work with, which is plainly not sustainable.

Two of their responses are instructive. They are exploring a digital twin approach – modeling how the clinical study report would come out and redesigning the protocol with the end in mind of what the CSR ultimately needs to produce. And they have deliberately moved away from a posture of two years ago, when they had something like two hundred pilots running simultaneously. The proliferation itself was the failure. A scatter of disconnected proofs-of-concept produces integration debt, vendor fatigue, and no compounding learning. Volume of experimentation is not progress, though it often masquerades as it.

Build skilled teams, not just deploy tools

The talk that gave me the most to think about came from Angela Radcliffe. Her focus was on building skilled teams and how to upskill people on AI – and she made a provocative case that if AI is taught correctly, your teams do not need traditional change management at all.

Her argument was that users themselves should build the agents – that tools should be built by the people closest to the work, through citizen development. The key, in her view, is to teach the skills rather than the platform. Platforms like the various copilots change roughly every sixty days; if you train people on a specific interface, the training is obsolete almost immediately. Teach the underlying skills and cultivate curiosity instead, and people can adapt as the tools shift beneath them. She broke down building an agent into roughly four steps, beginning with mapping the workflow before building anything, and argued that restricting tools without teaching people why is exactly the wrong approach. What you need instead is a governance model that gives people a decision tree rather than a rule book – clear guidance on what can and cannot go into a tool, and the judgment to navigate the rest.

Radcliffe used the framing of hiring “AI teammates” – treating an AI agent much as you would an intern or a new hire. You give it a role, define its tasks and constraints, provide examples and formats, even give it a name; you onboard it, and if it does not work out, you let it go. It is a memorable way to make the point that an AI agent needs the same clarity of expectations and the same oversight you would give a person, rather than being treated as a piece of software you brief once and forget.

She also made the point about procurement that several others echoed: procurement cycles are so long that by the time you finish a pilot, the product has already changed underneath you – which led her to a deliberately contrarian position of not running drawn-out pilots at all.

The contrast she drew between two operating philosophies was the part I keep returning to. On one side is conventional change management: a top-down rollout, training plans, adoption metrics, resistance to be managed, tools imposed on people, and a centralized AI team acting as gatekeeper. On the other is what she called applied AI literacy: bottom-up enablement, building plans rather than rollout plans, the user as the builder, agency to expand, tools built by the people, citizen developers working closest to the pain. Having watched many top-down technology rollouts succeed only partially over the years, I find the second philosophy more convincing – though I would add that it only works when paired with the kind of governance and oversight that keeps execution accountable.

A real case, including the part most case studies leave out

Paulius Ojeras, Vice President, Clinical Operations, Perceive Biotherapeutics, presented one of the more honest case studies of the conference, on trial master file management, which the company had built in partnership with Tilda Research. With human oversight in place, he reported greater than a ninety percent reduction in manual file handling – documents that previously took real time were processed in under a minute, with issues identified and escalated immediately and no backlog accumulating. They intended to extend the same approach into study startup, site management, finance operations, and data monitoring.

But the part that mattered most was the part most case studies omit: they ran into real problems with staff not adopting it. That single detail validates everything Angela Radcliffe argued about upskilling and bottom-up enablement. Capability without adoption produces no value. A tool that works brilliantly and sits unused is not a success – it is a more expensive version of the status quo. The lesson the speaker drew was the right one: rethink, do not automate. Ask how the workflow would look if you designed it from scratch knowing what AI can now do – and chase quality rather than automation for its own sake.

“Human in the loop” is being retired, and the replacement matters

A vocabulary shift was underway across the conference that I think is more than cosmetic. Speaker after speaker pushed back on the phrase “human in the loop” — not because human oversight is receding, but because the phrasing frames the human as a checkpoint on the machine’s activity rather than as the directing intelligence in charge of it. The alternatives raised included “human in charge,” “human on the loop,” and “human in the lead.”

The distinction is real. “Human in the loop” implies the AI proposes and the human approves. “Human in the lead” implies the human sets the strategy, governs the workflow, and directs AI agents to perform defined work within defined boundaries — with accountability sitting squarely with the human and the AI executing under supervision. This is the framing I expect to hold up best as regulators, sponsors, and CROs work out what governance means in practice.

It is also, candidly, why I joined the advisory board at Maxis AI. The company frames its approach as an AI Workforce — a supervised execution layer where AI agents perform defined operational work across clinical workflows, with human experts directing and validating that work and audit traceability built into the process rather than bolted on afterward. I have spent enough of my career watching well-intentioned automation create unaccountable black boxes to value that distinction. The point of an AI Workforce is not to remove the human from clinical development operations. It is to put experienced people in charge of more than they could otherwise govern, and to keep the execution traceable to withstand the scrutiny our industry rightly demands.

The economics, and the honest questions about cost

A recurring frustration surfaced repeatedly: the gap between pilot pricing and production pricing. There is real fear of technical debt — you run a pilot at one number, then discover at implementation that integration, customization, and ongoing maintenance push the cost to double or triple the original. Speakers were direct about what this means in practice. You need transparency about cost, and a clear-eyed view of build versus buy.

The framing one speaker offered was useful: in times of certainty, you can have a vendor build for you; in times of uncertainty, you bet on yourself. That is part of why so many large pharmas are currently choosing to build internal AI capability rather than depend on vendor pricing they cannot forecast. It is a rational hedge against an unpredictable market.

The broader economic backdrop is sobering, and worth stating plainly. Most pharma organizations are operating with flat budgets, which raises the real question of appetite for risk. And most AI investment today is going into discovery and manufacturing, not clinical development. Within clinical operations, the honest truth is that many people still love their spreadsheets. Any realistic assessment of AI adoption in our field has to account for that starting point rather than assuming an organization eager and resourced to transform overnight.

What scaling AI in clinical trials actually requires

I will close with the conviction I left Boston holding more firmly than when I arrived. Successful AI adoption in clinical development is not, at its core, a technology problem. It is a workforce and operating model problem. Standards matter. Data foundations matter. But above all, the way people and AI work together matters.

The organizations getting real value are the ones building hybrid teams — clinical experts working alongside structured, supervised AI execution under clear governance — rather than treating AI as a tool individual functions adopt in isolation. This is the shift I am working on with the team at Maxis AI, and it is the shift I believe will separate the organizations that compound their AI investments from those that keep funding pilots that never quite scale. The framing matters because it places AI in the right relationship to the clinical organization: not as a replacement for expert judgment, and not as an unaccountable autonomous system, but as a governed execution capability that lets experienced people direct far more work than they could by hand.

The conferences will keep coming and the announcements will keep arriving. The genuine signal underneath the noise is that our industry has started asking a more mature question. We have moved on from “can AI do this?” to “how do we govern AI doing this at scale, and how do we build the teams capable of running it?” SCOPE X felt like the first time the industry sat down together to take those questions seriously. That alone made it worth the trip.

Sessions referenced from SCOPE X 2026:

Novo Nordisk, AbbVie, Sanofi, ZS Associates, and IQVIA presentations – AI and Data for Clinical Trial Optimization: https://www.scopesummit.com/scopex/clinical-trials-ai-data-foundations

AstraZeneca, Intelligence Applied AI, Perceive Biotherapeutics, and Paradigm Health presentations – AI Strategy and Business Value in Clinical Development: https://www.scopesummit.com/scopex/clinical-trials-ai-strategy-use-cases