post-add

Data Curation For AI Development In Healthcare

Artificial Intelligence (AI) refers to machines that imitate human intelligence to perform tasks and can iteratively improve themselves based on the data fed to them. Machine learning is a subset of artificial intelligence where algorithms that are data driven enable software applications to accurately predict outcomes in the absence of explicit programming. An AI algorithm is as good as the data that is used to train the models. Hence, data must be complete, comprehensive, consistent, accurate, authentic and unique. As the volume of data given to the AI device increases, it understands new patterns and learns more, ultimately increasing its accuracy. 

Artificial Intelligence and Predictive Analytics is redefining healthcare for better and affordable patient care. The world has witnessed how COVID-19 data has pushed technology to develop Artificial Intelligence (AI) and Machine Learning (ML) models to screen, diagnose and treat the pandemic. However, data utilisation merely does not stop at patient care but continues to play a big role in drug development, healthcare operations, procurement, inventory, finance, human resource, staffing, insurance, support services and healthcare investments. 

With the rapid growth of AI, digital health and technology innovations in healthcare, researchers have found that business of healthcare data is projected to grow faster than any other industry. Data Readiness Condition (DATCON) is an index that assesses data management, utilisation and monetisation, to determine the level of data-readiness in different industries. According to IDC's DATCON index in 2021, the healthcare data explosion will approach the 4ZB level and exceed 10ZB by 2025. Although healthcare and life science organizations currently manage on an average 21Penta Bytes of data (i.e., 25 per cent less than the industry average), they also retain data almost 20 per cent longer for regulatory purposes. However, the question is can this retained data be accessed in real time and can it be directly used for AI development in healthcare?

Data preparation for AI in the healthcare system involves primarily 5 major steps - The 5D’s

Data Acquisition

Data Anonymisation 

Data Curation

Data Storage

Data Training

There is an enormous amount of structured and unstructured healthcare data generated from EHR systems, paper-based reports, insurance claims, lab reports, clinical trials, wearable devices, scientific literature, safety data etc. However, these records cannot be directly integrated for the development and training of AI algorithms, as currently the data stored is heterogeneous in nature, lacking standardisation and most importantly data anonymisation which is a critical step to preserving patient privacy. It is of utmost importance to be compliant with national or international regulatory bodies like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA) in protecting patient privacy. 

Data goes through multiple transformations throughout its AI development lifecycle. Data curation is the process of aggregating data from multiple sources, standardisation, cleaning the data, verification and validation, extraction, integration, conversion, quality assurance and usability of data. The important role of curation in AI and ML is in training the algorithm with relevant, error-free, quality data sets. Data curation benefits in easy data access, removes data duplication & bias and accelerates the AI model development. The challenges in AI data curation are noticed in data accuracy and security. While working with large volumes of unstructured data there might be challenges where inaccurate data could creep into the system, there by defeating the aim of AI development. In addition, with increased data leaks and hacking, data security is of utmost concern. 

The need of the hour is to have a collaborative platform for real time data collection, storage, curation, indexing and retrieval.  

More than 80 per cent of world’s population has no access to reliable and affordable healthcare and in India, the overall doctor to population ratio is 1:1000. The innovation in AI and ML in healthcare would help bridge this gap. From drug development to delivering value-based care, there is a huge opportunity for Real World Data (RWD) and Real-World Evidence (RWE) to improve patient outcomes and the healthcare ecosystem. The key to healthcare digital revolution lies in quick adaptation of digital technology at healthcare centres. In the advent of this data regulations should favour and enable easy data sharing. 

Many healthcare centres are still using paper-based systems and this data is in different formats and has missing information. This is a huge limitation for AI predictive and cohort analysis. The bigger challenge is in understanding digitisation, as majority of the healthcare centres are of the opinion that scanned paper-based reports stored as PDF is digital data. Our healthcare systems are yet to scale up in integration of Electronic Health Record (EHR) systems at all stages of patient interaction and hospital administration. 

Through, the National Digital Health Mission (NDHM) the government has taken a major step towards digitalisation of patient health records stored on a central database. However, healthcare data digitalisation is only the tip of the iceberg; the biggest battle that the healthcare industry is fighting is against unclear data privacy laws. From past years, National Health Authority (NHA), Policy makers, Healthcare professionals and Industry are debating on the healthcare data privacy and requirements of the Personal Data Protection Bill, under which healthcare data is considered as sensitive personal data. Earlier in December 2020, the government went on record to claim that the Personal Data Protection Bill would not be passed in the current form. Laws and policy should be enablers for development and at present it looks like the struggle between right to privacy of individuals verses right to save life considering the disease burden that we carry is never-ending. It is high time that regulatory bodies and healthcare stakeholder take a stand on this! A choice between privacy and progress.

profile-image

Swetha Jonnalagadda

Guest Author The author is Marketing Consultant, Healthminds Consulting

Also Read

Subscribe to our newsletter to get updates on our latest news