专栏名称: 知识图谱科技
务实的人工智能布道者。跟踪介绍国内外前沿的认知智能技术(知识图谱,大语言模型GenAI)以及医药大健康、工业等行业落地案例,产品市场进展,创业商业化等
目录
相关文章推荐
德鲁克博雅管理  ·  卓有成效的管理者 ·  昨天  
田俊国讲坛  ·  【2月26日】第32期《10天非凡心力训练营 ... ·  2 天前  
重庆校园频道  ·  2025年全国中小学生英语作文征集活动正式启 ... ·  3 天前  
重庆校园频道  ·  2025年全国中小学生英语作文征集活动正式启 ... ·  3 天前  
51好读  ›  专栏  ›  知识图谱科技

医学GraphRAG案例研究:将医生记录转换为医学时序知识图谱

知识图谱科技  · 公众号  ·  · 2024-11-13 08:05

正文

摘要

该案例研究描述了如何将医生的病历转录转换为医学记录时序知识图谱,以进行更复杂的医疗数据分析和问题解答,并强调了WhyHow.AI的平台在该过程中的独特架构和优势。

Key Takeaways:

* 本研究使用合成医疗转录数据构建了一个时序医学记录知识图谱。

* 该知识图谱包含五种节点类型(病人、观察、免疫接种、病情和遭遇类型)和六种文本块类型,这些文本块包含与三元组相关的关键信息摘要。

* WhyHow.AI平台的独特之处在于其向量搜索、独立的三元组对象和JSON数据格式支持,这使得构建和查询时间知识图谱更加高效和准确。

* 该系统能够回答关于病人治疗、历史病历分析等复杂问题,并展示了其在多病人分析和跨多份转录本分析中的优势。

* 通过Claude等LLM工具,可以将非结构化文本数据转换为WhyHow.AI平台可接受的JSON格式,方便知识图谱构建。

* 与仅限向量的RAG系统相比,该方法在多病人分析和跨多份病历分析中具有显著优势。

*  整个过程耗时25个开发小时,其中大部分时间用于迭代完善知识图谱的模式。

正文

有兴趣将医生/患者的医疗记录和成绩单转换为时序知识图谱,以便您可以跨多个病史、时间段和患者提出复杂的问题吗?

在本案例研究中,我们展示了如何将医疗成绩单转换为您可以依赖用于 RAG 和分析目的的时序知识图谱。我们展示了针对这个系统的真实问答是什么,以及你可以通过这个系统实现什么样的商业结果。据我们所知,这里的步骤组合是一种相对较新的知识图谱实现。

使用的数据

出于数据隐私原因,我们使用了根据 Synthea 数据创建的医学成绩单合成数据集:https://synthea.mitre.org/downloads。以下是用作知识图谱创建输入数据的医学转录文本之一的示例。我们将这些转录数据与 Synthea 数据中的结构化医疗记录相结合。我们有 ~75 个转录本,涵盖 10 名患者(即每个患者有 5-10 个转录本)。以下是使用的成绩单示例:

新颖的知识图谱架构概述

节点:

我们有 5 种类型的节点:患者、观察、免疫、条件和遭遇类型

Triples 三元组 (样本列表):

患者 -> 遭遇 -> 遭遇

患者 -> 患有 -> 状况

患者 -> 接受 -> 免疫接种

患者 -> 进行测量 -> 观察

块:

块是作为独立对象的文本块。Chunk 与每个 Triple 相关联,并且可以有许多 Chunk 与单个 Triple 相关联。在这种情况下,Chunk 不是 Triple 的非结构化源,而是与每个 Triple 类型相关的摘要和关键点。因此,我们有 6 种类型的块:- 患者人口统计块、病情摘要块、访问块、观察块、免疫块和条件详细信息块。

与三元组关联的不同类型的 chunk 的示例如下所示:

1. Patient -> EncounterType
Triple: (Patient) -[had_encounter]-> (EncounterType)
- Chunk_ids link to specific visit instances
- Example Chunk: "Annual physical on 2024–01–15. BP 120/80, routine screenings
updated."


2. Patient -> Condition
Triple: (Patient) -[has_condition]-> (Condition)
- Chunk_ids link to condition episodes
- Example Chunk: "Diagnosed with hypertension on 2020–03–10. Status: active.
Managed with medication."


3. Patient -> Immunization
Triple: (Patient) -[received]-> (Immunization)
- Chunk_ids link to administration records
- Example Chunk: "Influenza vaccine administered on 2024–01–15."

4. Patient -> Observation
Triple: (Patient) -[has_measurement]-> (Observation)
- Chunk_ids link to measurement instances
- Example Chunk: "2024–01–15: Blood Pressure 120/80 mmHg, Weight 70kg."

链接到创建的图表:

https://main--whyhowai.netlify.app/public/graph/673032011997e08c8849316c

使用这种特定的图形架构,您可以将关键点和摘要与三元组相关联,然后您可以专注于通过非结构化搜索找到正确的三元组,然后以结构化的方式通过链接的块引入所有相关的关键信息。

WhyHow 图谱技术架构

WhyHow 图谱基础设施有一些独特之处,使我们能够以简单的方式构建此架构。

首先,通过向量搜索嵌入和检索 Triples,避免了必须使用 Text2Cypher 来识别节点、关系,然后构建 Cypher 查询才能找到正确的 Triple 的常见检索问题。事实证明,这可以显著提高检索准确性高达 3 倍。

其次,Triples 是 WhyHow 中的独立对象,你可以将 chunk 链接到它。这样,您就可以提取每个 Triple 要检索的关键信息,并在找到正确的 Triple 后将其直接引入上下文。这避免了必须以图形格式表示关键信息和上下文(使架构构建过程复杂化),并在初始非结构化向量搜索后以结构化方式引入信息。这在过程中类似于 LinkedIn 在其系统中应用知识图谱,其中“重现步骤”等关键信息以类似的方式表示和检索,而步骤本身则表示为单独的“块”/“节点”。

第三,WhyHow 接受 JSON 格式的数据,这允许任何提取框架之间直接无缝交互到图形创建中。在本例中,我们使用 Claude 将转录数据初始转换为必要的 JSON 结构,以加载到 WhyHow 中。如果你已经有 JSON 格式的信息,那么将数据加载到 WhyHow 中会容易得多。

第四,由于 Chunks 和检索过程在 WhyHow 系统中的设计方式,您可以轻松包含可用于控制答案构建方式的临时数据。时态数据在知识图谱中一直是一个很难建模的东西(以至于领先的 KG 专家通常建议不要这样做),但它显然是工作流程的重要组成部分。现有的方法甚至尝试对时态数据进行建模,都会尝试将其摄取到知识图谱本身中,然后根据结构化的 Cypher 查询进行检索,而我们的架构则独特地使用 LLM 来帮助筛选时态数据。

将 LLM 的强大功能与知识图谱等结构化知识表示相结合是实现业务成果的重要方式 ,我们认为这种时态知识图谱架构将有助于通过成功实施时态数据来释放大量商业价值。

使用的数据转换过程

首先,我们使用 Claude 将 transcript 信息转换为基于每个 transcript 的架构对齐信息集。除了来自结构化医疗记录的信息外,成绩单还转换为 JSON 摘要,如下所示:

PATIENT SUMMARY
Name: Joseph Crona
DOB: 2022–08–29
Age: 2 years
Gender: male
MRN: #dbfbaa

CURRENT MEASUREMENTS (as of 2024–08–05)
Height: 84.1cm (50th percentile)
Weight: 14.5kg (52nd percentile)
ALLERGIES
No known allergies

IMMUNIZATIONS
- DTaP: 2022–12–05, 2023–02–06, 2023–03–06, 2024–02–05
- Hepatitis A: 2023–11–06
- Hepatitis B: 2022–08–29, 2022–10–03, 2023–03–06
- Hib: 2022–12–05, 2023–02–06, 2023–11–06
- Influenza: 2023–03–06, 2024–08–05
- MMR: 2023–11–06
- PCV13: 2022–12–05, 2023–02–06, 2023–03–06, 2023–11–06
- Polio: 2022–12–05, 2023–02–06, 2023–03–06
- Rotavirus: 2022–12–05, 2023–02–06
- Varicella: 2023–11–06

MEDICAL HISTORY
- Viral sinusitis (disorder)
Onset: 2023–03–13
Status: resolved
Outcome: Resolved

GROWTH & DEVELOPMENT
- 2023–11–06: Body Weight: 12.7 kg
- 2024–02–05: Body Height: 79 cm
- 2024–02–05: Body Weight: 13.4 kg
- 2024–08–05: Body Height: 84.1 cm
- 2024–08–05: Body Weight: 14.5 kg
Development: Age-appropriate milestones met
- Gross motor: Age appropriate
- Fine motor: Age appropriate
- Language: Age appropriate
- Social: Age appropriate

PREVENTIVE CARE
Well-Child Visits:
- 2024–08–05: 2yo well visit - Development on track
- 2024–02–05: 1yo well visit - Development on track
- 2023–11–06: 1yo well visit - Development on track
- 2023–08–07: 1yo well visit - Development on track
- 2023–05–08: 9mo well visit - Age appropriate exam completed
- 2023–02–06: 6mo well visit - Age appropriate exam completed
- 2022–12–05: 4mo well visit - Age appropriate exam completed
- 2022–10–03: 2mo well visit - Age appropriate exam completed
- 2022–08–29: Newborn visit - Normal exam

FAMILY HISTORY
Mother: Healthy
Father: Healthy
Siblings: None documented

SOCIAL HISTORY
Living Situation: Lives with parents
Development: Meeting age-appropriate milestones
Sleep: Age-appropriate pattern
Nutrition: Age-appropriate diet

其次,我们将此 JSON 架构映射到 WhyHow 架构中,然后将所有信息导入到 WhyHow.AI KG Studio 中。

下面是最终加载到 WhyHow 中的 KG 结构的示例。

Knowledge Graph Structure (Timeless):


Nodes:
1. Patient Node
Structure: {
name: str, # "John Smith"
label: "Patient",
properties: {
gender: str, # FHIR gender
patient_type: str # "adult" | "pediatric"
},
chunk_ids: List[str] # Links to demographic chunks
}


2. EncounterType Node
Structure: {
name: str, # "Well-child visit" | "Annual physical"
label: "EncounterType",
properties: {
category: str, # "preventive" | "acute" | "chronic"
specialty: str # "primary_care" | "pediatrics" | "emergency"
},
chunk_ids: List[str] # Links to visit pattern chunks
}


3. Condition Node
Structure: {
name: str, # "Essential hypertension"
label: "Condition",
properties: {
category: str, # "chronic" | "acute" | "resolved"
system: str, # "respiratory" | "cardiovascular" | etc
is_primary: bool # True if primary diagnosis
},
chunk_ids: List[str] # Links to condition history chunks
}


4. Immunization Node
Structure: {
name: str, # "DTaP" | "MMR"
label: "Immunization",
properties: {
series: str, # "primary" | "booster"
target: str # "tetanus" | "measles" | etc
},
chunk_ids: List[str] # Links to immunization records
}


5. Observation Node
Structure: {
name: str, # "Blood Pressure" | "Height"
label: "Observation",
properties: {
category: str, # "vital" | "lab" | "growth"
unit: str # "mmHg" | "cm" | etc
},
chunk_ids: List[str] # Links to measurement records
}


Relations:
1. Patient -> EncounterType
Triple: (Patient) -[had_encounter]-> (EncounterType)
- Chunk_ids link to specific visit instances


2. Patient -> Condition
Triple: (Patient) -[has_condition]-> (Condition)
- Chunk_ids link to condition episodes


3. Patient -> Immunization
Triple: (Patient) -[received]-> (Immunization)
- Chunk_ids link to administration records


4. Patient -> Observation
Triple: (Patient) -[has_measurement]-> (Observation)
- Chunk_ids link to measurement instances


5. Condition -> EncounterType
Triple: (Condition) -[managed_in]-> (EncounterType)
- Links conditions to typical encounter types


6. Immunization -> EncounterType
Triple: (Immunization) -[given_during]-> (EncounterType)
- Links vaccines to visit types

然后,我们运行一个自定义提示,在每次自然语言查询后,将从知识图谱中检索到的三元组置于上下文中。

有了这个架构,一件有趣的事情是,我们现在可以轻松地继续将有关患者就诊、患者治疗和病情的信息添加到知识图谱中,因为只需向现有的三元组添加额外的块即可。如果 Patient 患有新疾病,则会将其他 Condition 节点添加到 Patient 节点。

此过程花费了 25 个开发小时,可分为以下几部分:

  • 2 小时 (8%) 用于查找和理解数据(探索性数据分析)

  • 18 小时 (72%) 用于迭代架构,并弄清楚图中应该包含哪些节点,哪些节点应该连接到什么,应该存在哪些块,它应该如何连接到各种三元组,用一组问题测试检索到的答案,并相应地迭代。

  • 2 小时 (8%) 用于编写代码以创建要加载的三元组集

  • 编写验证检查和输出检查花费了 3 小时 (12%) 来捕获任何错误

针对医疗记录知识图谱的问答

使用 WhyHow 自然语言图查询引擎从知识图谱中检索相关上下文后,用于构建答案的提示

    You are an AI assistant specializing in medical records analysis. 
Use the following information to answer the user's question.
The information is derived from a knowledge graph of patient medical records.

Relevant Nodes (these represent patients, encounters, and conditions):
{
node_context}

Relevant Relationships (these show connections between patients,
encounters, and conditions)
:
{
triple_context}

Relevant Medical Records:
{chunk_context}

Answer the question using the following steps:
1. Analyze the medical information, focusing on the specific topic of the
question
2. Identify key medical events, conditions, and temporal relationships
3. Formulate a clear and concise answer that directly addresses the
question
4. If there are any uncertainties in the medical information, mention
them explicitly
5. If you don't have enough information to answer confidently, say so
6. Consider temporal relationships between medical events
7. It's currently {datetime.now().strftime("%Y-%m-%d")}. Consider this
when interpreting dates
8. Maintain patient privacy by using only first names or initials when
necessary

Question: {question}

商业价值:患者治疗分析

历史患者分析

问题:Joseph Crona 在第一年接受了哪些类型的儿童健康检查和免疫接种?

答:

To determine the types of well-child visits and immunizations Joseph Crona 
received in his first year, we need to consider the relevant medical events
and their timing. Joseph was born in 2022, and it is currently November 9, 2024.
Therefore, his first year would span from 2022 to 2023.

**Well-Child Visits:**
- Joseph had an encounter categorized as a "Well child visit," which falls
under preventive care in pediatrics. This type of visit is typically part of
routine check-ups for children, especially in their first year.

**Immunizations:**
During his first year, Joseph received the following immunizations, all of
which are common for pediatric patients:
1. **Hep B, adolescent or pediatric** - Completed
2. **Hib (PRP-OMP)** - Completed
3. **Rotavirus, monovalent** - Completed
4. **IPV (Inactivated Poliovirus Vaccine)** - Completed
5. **DTaP (Diphtheria, Tetanus, and Pertussis)** - Completed
6. **Pneumococcal conjugate PCV 13** - Completed

These immunizations are typically administered during the first year of life
as part of the standard immunization schedule for infants.

In summary, Joseph Crona received a well-child visit and completed a series of
standard pediatric immunizations during his first year.

问题:对于中耳炎患者,记录的就诊和观察顺序是怎样的?

答:

To address the question regarding the sequence of encounters and observations 
for patients with otitis media, we will focus on the medical records of Miles
Dickinson, as he is the patient identified with this condition.

1. **Condition Onset:**
- Miles Dickinson was diagnosed with otitis media on June 10, 2024. The
condition is currently resolved.

2. **Encounters:**
- Miles had two types of encounters recorded:
- **Well Child Visit:** This is categorized as a preventive encounter in
pediatrics.
- **Encounter for Symptom:** This is categorized as a follow-up encounter in
primary care.

3. **Observations:**
- Various observations were recorded for Miles, including:
- **Pain Severity:** Recorded multiple times with scores ranging from 0 to 4.
- **Body Weight:** Recorded with values ranging from 3.2 kg to 8.2 kg.
- **Head Occipital-frontal Circumference and Percentile:** Recorded with
specific values in






请到「今天看啥」查看全文