Navigating Data Governance for AI in Healthcare: Control, Provenance, and Responsible Use
Artificial intelligence (AI) is rapidly transforming healthcare, offering incredible potential for improved diagnostics, personalized treatment, and streamlined operations. However, realizing this potential hinges on robust data governance. You need to confidently address how data is used to train AI, how to track which data was used, how to control its application in clinical decisions, and how to identify AI-generated outputs.This article provides a thorough overview of these critical considerations, drawing on existing standards and best practices to guide your association.
1. Controlling Data Access for AI Training: Permission & Consent
The question isn’t can data be used to train AI, but how do you control which data can be? You need mechanisms to authorize some data for AI training while protecting other sensitive information. This applies at both the dataset level (like an entire Electronic Health Record or EHR) and the individual patient level.
Here’s how to approach it:
Dataset-Level Restrictions: Implement policies that define permissible data subsets for AI training. This might involve excluding specific data types (e.g., genetic information) or limiting access to de-identified datasets.
Patient-Specific Consent: Empower patients to control whether their data is included in AI training. This requires clear, granular consent mechanisms.
Layered Approach: Combine dataset-level policies with patient consent. A patient can always override a broader organizational permission, ensuring individual autonomy.
2.Establishing AI Model Data Provenance: Knowing Your Roots
Once an AI model is built, it’s vital to maintain a detailed record of the data used in its training. This ”data provenance” is crucial for accountability, auditing, and addressing potential biases or concerns. If an issue arises,you need to quickly determine if it’s related to the data used to train your AI.
Think of it as a complete audit trail. Key elements of data provenance include:
Specific Datasets: Identify exactly which datasets were used.
Data Versions: Track the version of the data used at the time of training. Preprocessing Steps: Document any data cleaning,conversion,or feature engineering applied.
Training Parameters: Record the specific algorithms and parameters used during training.
This information allows you to understand the AI’s “lineage” and assess its reliability.
3. Controlling Data Use in AI-Driven Clinical Decisions: Purpose of Use
How do you ensure patient data is used appropriately when an AI assists in clinical decision-making? The key is defining a clear “Purpose of Use.” This concept allows you to control data access based on why the AI is accessing it.
Here’s how it works:
PurposeOfUse Codes: Utilize standardized codes to categorize AI access:
PMTDS: AI aiding in payment decisions.
TREATDS: AI aiding in clinical treatment decisions.
Consent & Permissions: Integrate these PurposeOfUse codes into your consent management system and organizational permissions.
Hierarchy of Rules: if a specific PurposeOfUse rule isn’t defined, the broader “payment” or “treatment” permission applies.
Openness: Ensure both consent forms and organizational policies clearly articulate these rules, allowing patients to understand and potentially override them.
4. Identifying AI-Generated Data: Provenance for Outputs
When an AI produces a decision or recommendation, it’s essential to clearly mark that data as AI-generated within the EHR or other data systems. This “output provenance” prevents confusion and ensures clinicians understand the source of the information.
Here are several approaches:
Data Resource/Element Tagging: Add a tag directly to the data element indicating it originated from AI.
Security Tags: Utilize existing security tagging mechanisms to flag AI-generated data.
Full Provenance Records: Create detailed provenance records that include:
AI Model Version: Which version of the AI was used?
Model Details: What specific model was employed?
* Input Data: what portion of the patient’s chart was used as input?
This tagging allows for easy identification of AI contributions and facilitates auditing and quality control.