17 Apr 2019 Blog
De-mystifying data management with automation through metadata

Things seem to be set in motion in the clinical industry when programmers ready to perform the analysis get their code-happy hands on the study data – however, once the CROs drop the data off how does it actually make the journey to the statistical programmer’s desktop? This is a critical process, but too often compressed deadlines result in fragmented systems built with a collection of one-off programs that tend to become complicated over time and the data workflow lifecycle becomes fragmented. The article discusses the challenges of supporting data from multiple vendors including the issues of not using a consistent framework. It goes on to discuss the benefits of a common platform, breaking data management activities down into “actions” – and driving those actions with object-oriented metadata and rapidly building capability through automated processes.

 

The problem with outsourced data collection

Most clinical organizations has a need to receive data from outside vendors and prepare this data for statistical processing. Whether as a consequence of dealing with multiple outside vendors or due to the specialized needs that differentiate the underlying science within one study from another, the format of the data delivered by the CROs will vary (one vendor delivering SAS datasets, another delivering Excel spreadsheets) as will the preparatory activities that ready the data for analysis (passwords to extract a zipped archive, applying coding and edit checks).

Another element complicating this ecosystem is the organizational structure of business. Inevitably, the strategic responsibility for defining the systems within the company will be jointly shared by functional groups, such as between informatics and IT.

To receive data from CROs, a system exposed to the outside world through the corporate firewall will exist (likely managed by IT) and data consumers (like informatics) will have to use these shared systems rather than build their own.

Many organizations lack a standard operating system and operate in a mixed environment using UNIX, Linux, Windows, and various file servers. A formidable enterprise consideration is scalability, manifest as the ability of the system to accommodate changes for future business without disrupting ongoing or legacy activities.

Speaking in the language of an applications architect, the system must:

  • Support varying input data formats from CROs
  • Externalize configuration details such as user credentials and file server paths
  • Support varying data preparation activities
  • The system should be flexible to changes in the infrastructure and portable across operating systems.
  • Be easily extensible as the business places new demands on the framework.

 

The Chaotic approach to defining workflows

Many organizations function on a day-to-day basis within the chaotic environment of getting data from multiple vendors into the analysis pipeline. Perhaps when the group had to manage data for only a study or two it was possible to write unique programs for each workflow. As the organization begins to support more studies and concurrently more data, the one-off program approach becomes less ideal.

As people leave the company or move on to other roles, the knowledge about how to maintain the systems is lost. Fewer and fewer people are capable of fixing problems within each data flow, meaning there are numerous single points of failure within the data management lifecycle where communicating program failures and quickly implementing solutions becomes quite taxing.

Over time, the tidy collection of custom programs dealing with one or two studies evolves into an unmanageable assortment of code and scripts. Challenged to support multiple legacy workflows and new business, the staff finds itself running around in circles trying to overcome breakdowns in the system just to keep the business in motion. Some organizations feel this pain profusely having long outgrown the ad-hoc approach to data management; other organizations are at the early stages of growth and are realizing the need for something more robust while the pain is still manageable.

 

Automation through Metadata

Metadata is a nebulous term, one overused in the industry. In the context of this article, metadata is taken to mean the elements of data management that differentiate processes, and therefore enables automating workflows described as a sequence of reusable actions driven by externally captured metadata. The important thing to focus on is the flexibility of the technology to allow you to define and group metadata in a valuable way. A metadata ontology or syntax needn’t be too verbose, it simply needs to provide a vehicle to define data elements and their values. The system can interpret these values any way that is helpful.

The art is in determining the proper tradeoff between automation and flexibility. Certain elements of the system are inevitably not going to be standard – for example, some vendors may provide data that requires transposition. The system can support custom tasks while still preserving consistency by capturing the metadata about the locations and names of custom programs being run. This provides insight into the “unique” pieces of the system over time and potentially offers an opportunity to can additional efficiencies by integrating those features as they stabilize. At a minimum, at least the framework maintains a “single source of the truth” regarding the unique components of each flow and the standard components of each flow.

Metadata is one of those buzz-words that gets thrown around in a number of contexts and takes vastly different meanings. If the business had needs that could benefit from a metadata driven solution, it is extremely possible to build these needs into any re-design without forcing a technology or product choice. As an example, the normal ranges check and unit conversions are metadata driven processes. By associating a specific type of data transformation with a column of a table or row of data, the system could easily capture the calculation lifecycle for given data points.

Likewise, study level metadata such as the name of the trial could easily be associated with studies. Processing level metadata (such as the location of directories) is already used by the script, but could be used more aggressively to support less rigid coupling between the infrastructure and the systems. When discussing data, things are clearly defined (what is this value). Metadata can capture details not easily understood by having just the data (such as where did this value come from, or what error checks or value ranges were tested against this value, or where is this value present elsewhere in this data). This type of data, centrally managed, could provide a number of gains for generating documents and automating processes. Further, seeing a comprehensive view of the data and metadata could be valuable in making business decisions and clinical analysis. Metadata can also be used to drive jobs and processes, making system automation and configuration less painful.

 

To conclude

Data management is a task common to all clinical organizations. Supporting multiple vendors and simultaneously building scalable systems requires some attention to automation. The key benefits of understanding how to abstract your processes into definable tasks configurable via metadata is that the programming logic and business logic are separated. By compartmentalizing these two facets of the system, each can evolve without introducing undue strain on the other. The success of any new technology introduction depends heavily on understanding the needs of the user community and building a system to address the pain points of existing approaches. By automating data management through metadata, the laborious task of getting data to statistical programmers becomes greatly simplified and as such is exposed to a larger community than with any ad-hoc system. This creates a collaborative environment where data management, IT, applications development, and statistical programming could work together easily to get things done – and most importantly provide a common framework and therefore language for dealing with this task.

MaxisIT

At MaxisIT, we clearly understand strategic priorities within clinical R&D, and we can resonate that well with our similar experiences of implementing solutions for improving Clinical Development Portfolio via an integrated platform-based approach; which delivers timely access to study specific as well as standardized and aggregated clinical trial operations as well as patient data, allows efficient trial oversight via remote monitoring, statistically assessed controls, data quality management, clinical reviews, and statistical computing.

Moreover, it provides capabilities for planned vs. actual trending, optimization, as well as for fraud detection and risk-based monitoring. MaxisIT’s Integrated Technology Platform is a purpose-built solution, which helps Pharmaceutical & Life sciences industry by “Empowering Business Stakeholders with Integrated Computing, and Self-service Dashboards in the strategically externalized enterprise environment with major focus on the core clinical operations data as well as clinical information assets; which allows improved control over externalized, CROs and partners driven, clinical ecosystem; and enable in-time decision support, continuous monitoring over regulatory compliance, and greater operational efficiency at a measurable rate”.

This website uses cookies to help us give you the best experience when you visit. By using this website you consent to our use of these cookies. For more information on our use of cookies, please review our cookie policy.