专栏名称: GitHubStore
分享有意思的开源项目
目录
相关文章推荐
云南市场监管  ·  一次性筷子都是用二氧化硫漂白的?!还能用吗? ·  12 小时前  
烂板套利  ·  人形机器人,具有唯一性的7家公司 ·  16 小时前  
烂板套利  ·  人形机器人,具有唯一性的7家公司 ·  16 小时前  
51好读  ›  专栏  ›  GitHubStore

从任何文本中提取知识图谱的AI工具kg-gen

GitHubStore  · 公众号  ·  · 2025-02-20 09:36

正文

项目简介

欢迎! kg-gen 帮助您从任何纯文本中提取知识图谱,使用 AI。它可以处理小型和大型文本输入,还可以处理对话格式的消息。
为什么生成知识图谱? kg-gen 如果你想:


  • 创建一个图来辅助 RAG(检索增强生成)
  • 创建用于模型训练和测试的图合成数据
  • 将任何文本结构化为图
  • 分析源文本中概念之间的关系

我们通过 LiteLLM 支持基于 API 和本地模型提供商,包括 OpenAI、Ollama、Anthropic、Gemini、Deepseek 等,还使用 DSPy 进行结构化输出生成。
尝试通过运行 tests/ 中的脚本来试用。

运行我们的 KG 基准测试 MINE 的说明在 MINE/

阅读论文:KGGen:使用语言模型从纯文本中提取知识图谱
Quick

快速开始

安装模块:
pip install kg-gen

然后导入并使用 kg-gen 。您可以以两种格式之一提供您的文本输入:

A single string
消息对象列表(每个对象具有角色和内容)

以下是一些示例片段:
from kg_gen import KGGen
# Initialize KGGen with optional configurationkg = KGGen(  model="openai/gpt-4o",  # Default model  temperature=0.0,        # Default temperature  api_key="YOUR_API_KEY"  # Optional if set in environment)
# EXAMPLE 1: Single string with contexttext_input = "Linda is Josh's mother. Ben is Josh's brother. Andrew is Josh's father."graph_1 = kg.generate(  input_data=text_input,  context="Family relationships")# Output: # entities={'Linda''Ben''Andrew''Josh'# edges={'is brother of''is father of''is mother of'# relations={('Ben''is brother of''Josh'), #           ('Andrew''is father of''Josh'), #           ('Linda''is mother of''Josh')}
# EXAMPLE 2: Large text with chunking and clusteringwith open('large_text.txt''r'as f:  large_text = f.read()
# Example input text:"""# Neural networks are a type of machine learning model. Deep learning is a subset of machine learning# that uses multiple layers of neural networks. Supervised learning requires training data to learn# patterns. Machine learning is a type of AI technology that enables computers to learn from data.# AI, also known as artificial intelligence, is related to the broader field of artificial intelligence.# Neural nets (NN) are commonly used in ML applications. Machine learning (ML) has revolutionized# many fields of study.# ...# """
graph_2 = kg.generate(   input_data=large_text,  chunk_size=5000,  # Process text in chunks of 5000 chars  cluster=True      # Cluster similar entities and relations)# Output:# entities={'neural networks''deep learning''machine learning''AI''artificial intelligence'#          'supervised learning''unsupervised learning''training data', ...} # edges={'is type of''requires''is subset of''uses''is related to', ...} # relations={('neural networks''is type of''machine learning'),#           ('deep learning''is subset of''machine learning'),#           ('supervised learning''requires''training data'),#           ('machine learning''is type of''AI'),#           ('AI''is related to''artificial intelligence'), ...}# entity_clusters={#   'artificial intelligence': {'AI''artificial intelligence'},#   'machine learning': {'machine learning''ML'},#   'neural networks': {'neural networks''neural nets''NN'}#   ...# }# edge_clusters={#   'is type of': {'is type of''is a type of''is a kind of'},#   'is related to': {'is related to''is connected to''is associated with'#  ...}# }
# EXAMPLE 3: Messages arraymessages = [  {"role""user""content""What is the capital of France?"},   {"role""assistant""content""The capital of France is Paris."}]graph_3 = kg.generate(input_data=messages)# Output: # entities={'Paris''France'# edges={'has capital'# relations={('France''has capital''Paris')}
# EXAMPLE 4: Combining multiple graphstext1 = "Linda is Joe's mother. Ben is Joe's brother."
# Input text 2: also goes by Joe."text2 = "Andrew is Joseph's father. Judy is Andrew's sister. Joseph also goes by Joe."
graph4_a = kg.generate(input_data=text1)graph4_b = kg.generate(input_data=text2)
# Combine the graphscombined_graph = kg.aggregate([graph4_a, graph4_b])
# Optionally cluster the combined graphclustered_graph = kg.cluster(  combined_graph,  context="Family relationships")# Output:# entities={'Linda', 'Ben', 'Andrew', 'Joe', 'Joseph', 'Judy'} # edges={'is mother of', 'is father of', 'is brother of', 'is sister of'} # relations={('Linda', 'is mother of', 'Joe'),#           ('Ben', 'is brother of', 'Joe'),#           ('Andrew', 'is father of', 'Joe'),#           ('Judy', 'is sister of', 'Andrew')}# entity_clusters={#   'Joe': {'Joe', 'Joseph'},#   ...# }# edge_clusters={ ... }

功能

大文本分块
对于长文本,您可以指定一个 chunk_size 参数以将文本分块处理:

graph = kg.generate(  input_data=large_text,  chunk_size=5000  # Process in chunks of 5000 characters)

聚类相似实体和关系
您可以聚类相似实体和关系,无论是在生成过程中还是之后:

# During generationgraph = kg.generate(  input_data=text,  cluster=True,  context="Optional context to guide clustering")
# Or after generationclustered_graph = kg.cluster(  graph,  context="Optional context to guide clustering")

聚合多个图
您可以使用聚合方法组合多个图表:

graph1 = kg.generate(input_data=text1)graph2 = kg.generate(input_data=text2)combined_graph = kg.aggregate([graph1, graph2])

消息数组处理
处理消息数组时,kg-gen:

  1. 保留每条消息的角色信息
  2. 维护消息顺序和边界
  3. 能提取实体和关系:
  • 消息中提到的概念之间
  • 演讲者(角色)与概念之间
  • 在对话中的多条消息

例如,给定这个对话:

messages = [  {"role""user""content""What is the capital of France?"},  {"role""assistant""content""The capital of France is Paris."}]


生成的图形可能包括以下实体:

"user"
"assistant"
  • "France"
  • "Paris"


并且关系如下:

(user, "asks about", "France")
(assistant, "states", "Paris")
(Paris, "is capital of", "France")

API 参考


KGGen 类

构造函数参数


model







请到「今天看啥」查看全文