Chroma DB 是一个开源矢量存储,用于存储和检索矢量嵌入。它的主要用途是保存嵌入以及元数据,交由LLM模型使用。此外,它还可以用于文本数据的语义搜索引擎。
工作流程:
创建
collection
(类似关系型数据库中的表)
使用元数据和唯一的ID将文本添加到集合中。
通过文本或嵌入查询集合。也可以通过元数据过滤结果。
测试数据
employee_info = """ James Smith, a 32-year-old software engineer with 8 years of experience, is a member of the development and hiking clubs who enjoys cooking, photography, and playing guitar in his free time. He aspires to lead a tech team after gaining more expertise in his field at XYZ Corporation. """
department_info = """ The development department at XYZ Corporation is responsible for designing, implementing, and maintaining the software products of the company. It consists of skilled professionals ranging from software engineers to UX designers, working collaboratively to innovate and deliver high-quality solutions to clients' needs. """
company_info = """ XYZ Corporation, established in 2005 in Silicon Valley, is a leading software development company specializing in creating cutting-edge solutions for various industries. With over 1000 employees and offices in multiple countries, XYZ Corp is known for its commitment to excellence and innovation in the tech sector. """
# 更新 collection.update(ids=["id1"], documents=["Bob, a 28-year-old test engineer with 4 years of experience"], metadatas=[{"source":"employee info"}], )
# 相似性搜索 results = collection.query( query_texts=["What is the employee's name?"], n_results=2 )
# 输出的员工信息变为了 `Bob` # { # "ids": [["id1", "id2"]], # "distances": [[1.4391101598739624, 1.6440917253494263]], # "metadatas": [[{"source": "employee info"}, {"source": "department info"}]], # "embeddings": None, # "documents": [ # [ # "Bob, a 28-year-old test engineer with 4 years of experience", # "The development department at XYZ Corporation is responsible for designing, implementing,\nand maintaining the software products of the company. It consists of skilled professionals\nranging from software engineers to UX designers, working collaboratively to innovate\nand deliver high-quality solutions to clients' needs.\n" # ] # ], # "uris": None, # "data": None # }