23 Jul 2019 Blog
Leveraging Big Data in Clinical Trials

Industry wide Clinical Trial collaborative efforts offers significant improvement over siloed individual databases in providing superior Patient Outcomes. The efforts however were still limited to Rare Disease categories and Data Sources resulting in limited Clinical Analyses and Insight. A Clinical Data Repository utilizing Big Data will enable Pharmaceutical Cos to utilize new Analytic techniques and optimize Patient Journey from Drug Discovery to bedside treatment.

Mining for Big Data enables us to

  • Learn from historical data to optimize Study Design, Conduct and Analysis.
  • Perform simulations to mitigate the risk of time delay for clinical trials.
  • Perform predictive modelling with EHR and genomic datasets across numerous data providers.
  • Glean insights from clinical data including unstructured patient’s notes, scans and pathology reports.
  • Empowers government agencies, payers, and providers to make decisions about drug discovery, patient access, and marketing.



Let us consider some possible use cases for Big Data techniques in the domain of Clinical Research.


The more informed a company is on the conduct of its operations, the better placed it is to realize efficiencies. Large amounts of data can become an impediment to good reporting, as the time required to provide results can scale terribly. Big Data techniques make it simpler to generate stratified reports quickly. Using the map function to partition a transactional feed by date make summarizing transactions by hour, day, week, month or quarter simple and using the Lambda Architecture provides the capability for continual real time reporting. Many vendors today offer reporting solution that leverages big data but none as comprehensive as MaxisIT’s Analytics & Reporting platform which satisfies effective decision-making requirements of diverse clinical business functions in a self-services manner. It is completely web-based and scalable platform designed to view large sets of data and provides various simple to advance analytics & reporting functionalities with drag-n-drop configurations for effective reporting.


Market intelligence can pull from an ever-expanding selection of data sources when looking at the impact of a drug. One of the most valuable is social media, with consumers often referencing a drug or indication. Using tools like the Lambda Architecture we can mine the large feeds of information to pull out interesting messages that can form the basis of impression analyses. The large amounts of information that people share could mean these techniques may serve in trial recruitment. A recent study regarding cancer trials showed that clinical trials were offered to patients only 20% of the time, but of those 75% accepted. Further investigation identified that 32% said they would be very willing to participate in a clinical trial if asked. If, using these techniques, it is possible to identify candidates for inclusion then studies can be bought to subjects; the statistics suggest that there is a good probability of a successful match benefitting the patients.


Simple transformations can be achieved using standard libraries with MapReduce. Examples include mapping a raw date string to an ISO8601 format or populating standard and raw variables given the datatype. Assuming that rows are independent, then it is possible to parallelize mapping processes across many workers. By minimizing the time required to generate the transformed dataset, the requirement for holding many copies of intermediate datasets can be mitigated as it becomes possible to regenerate the required datasets from the source data on demand (or even continuously). The MapReduce approach can assist ETL processes; mapper nodes can be configured to execute SQL queries against a database making it possible to parallelize extracting data from databases. An open source tool called Apache Sqoop uses this technique to extract data from a RDBMS into HDFS. If an enterprise service bus or equivalent messaging layer is attached as a Spout, then real time transformations are possible feeding data directly into standardized analysis platforms.


Much of the data in clinical studies needs to be processed to provide overall metrics for the study; for example, how many CRFs pages need reviewing? These status metrics need to be processed from the point at which changes are made (the CRF level) up to the subject, site or study level. The status at any level is constituted of the status of its children (and their children, etc.). Viewing the hierarchy as a tree, it is possible to split the overall calculations as a series of sub-calculations based on bifurcation points, which can be processed independently. The child calculations are recursively computed upwards, with the result of each level giving the status of its parent. Such hierarchical review is possible with MaxisIT’s MaxisIT’s Analytics & Reporting platform which offers features like drag-n-drop analytical data modelling, statistical as well as data-driven matrices-based analytics configurator, reports portfolio manager, multiple visualizations and reporting structure support, template management, role-based access controls, version controls, interactive dashboard, and analytical sandbox environment with automated refresh – configurability, reusability and usability.


Big Data is a valuable approach and as an industry we should have a strategy for incorporating it into our data practices. The advantages that can be gained by enabling rapid processing of large amounts of data and being able to use the results to make informed decisions are significant. As we see with the Apache Hadoop project, many tools are built up around the central platform that lower the cost of adoption for organizations seeking to embrace these new technologies. The definition of Big Data being a problem that necessitates continual innovation presents many opportunities for improvements in what can be achieved with data generated by clinical studies, benefitting the industry and those depending on what it produces. Data analysis becomes less bound by data logistics; both in terms of storage considerations and time-to-generate considerations; this will open the organization to wider ranges of approaches to data processing and the types of analytics that can be attempted. The process becomes data driven, which is where it should be – subjects have contributed their data and we should be able to get maximum value from this for their sake.


At MaxisIT, we clearly understand strategic priorities within clinical R&D, and we can resonate that well with our similar experiences of implementing solutions for improving Clinical Development Portfolio via an integrated platform-based approach; which delivers timely access to study specific as well as standardized and aggregated clinical trial operations as well as patient data, allows efficient trial oversight via remote monitoring, statistically assessed controls, data quality management, clinical reviews, and statistical computing. Moreover, it provides capabilities for planned vs. actual trending, optimization, as well as for fraud detection and risk-based monitoring. MaxisIT’s Integrated Technology Platform is a purpose-built solution, which helps Pharmaceutical & Life sciences industry by “Empowering Business Stakeholders with Integrated Computing, and Self-service Dashboards in the strategically externalized enterprise environment with major focus on the core clinical operations data as well as clinical information assets; which allows improved control over externalized, CROs and partners driven, clinical ecosystem; and enable in-time decision support, continuous monitoring over regulatory compliance, and greater operational efficiency at a measurable rate”.

This website uses cookies to help us give you the best experience when you visit. By using this website you consent to our use of these cookies. For more information on our use of cookies, please review our cookie policy.