检索增强生成(RAG)是一种结合信息检索与生成模型的混合方法。它通过引入外部知识来提升语言模型的性能,从而提高回答的准确性和事实正确性。
基础RAG(Basic RAG)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/1_simple_rag.ipynb
在简单RAG设置中,我们遵循以下步骤:
-
-
数据分块
:将数据拆分成更小的块,以提升检索性能。
-
-
-
响应生成
:基于检索到的文本,利用语言模型生成回答。
def chunk_text(text, n, overlap):
chunks = [] # Initialize an empty list to store the chunks
# Loop through the text with a step size of (n - overlap)
for i in range(0, len(text), n - overlap):
# Append a chunk of text from index i to i + n to the chunks list
chunks.append(text[i:i + n])
return chunks # Return the list of text chunks
# Initialize the OpenAI client with the base URL and API key
client = OpenAI(
base_url="https://api.studio.nebius.com/v1/",
api_key=os.getenv("OPENAI_API_KEY") # Retrieve the API key from environment variables
)
# Define the system prompt for the AI assistant
system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"
def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"):
"""
Generates a response from the AI model based on the system prompt and user message.
Args:
system_prompt (str): The system prompt to guide the AI's behavior.
user_message (str): The user's message or query.
model (str): The model to be used for generating the response. Default is "meta-llama/Llama-2-7B-chat-hf".
Returns:
dict: The response from the AI model.
"""
response = client.chat.completions.create(
model=model,
temperature=0,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
)
return response
语义分块(Semantic Chunking)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/2_semantic_chunking.ipynb
与传统的固定长度分块方法不同,语义分块会根据句子之间的语义相似性来确定分块的边界。
这种方法通过计算句子嵌入向量的相似度来确定分块。当句子之间的语义相似度低于某个阈值时,就会将文本划分为不同的块。例如,可以使用滑动窗口技术计算句子之间的语义相关性。
def compute_breakpoints(similarities, method="percentile", threshold=90):
# Determine the threshold value based on the selected method
if method == "percentile":
# Calculate the Xth percentile of the similarity scores
threshold_value = np.percentile(similarities, threshold)
elif method == "standard_deviation":
# Calculate the mean and standard deviation of the similarity scores
mean = np.mean(similarities)
std_dev = np.std(similarities)
# Set the threshold value to mean minus X standard deviations
threshold_value = mean - (threshold * std_dev)
elif method == "interquartile":
# Calculate the first and third quartiles (Q1 and Q3)
q1, q3 = np.percentile(similarities, [25, 75])
# Set the threshold value using the IQR rule for outliers
threshold_value = q1 - 1.5 * (q3 - q1)
else:
# Raise an error if an invalid method is provided
raise ValueError("Invalid method. Choose 'percentile', 'standard_deviation', or 'interquartile'.")
# Identify indices where similarity drops below the threshold value
return [i for i, sim in enumerate(similarities) if sim < threshold_value]
评估分块大小(Evaluating Chunk Size)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/3_chunk_size_selector.ipynb
-
评估忠实度和相关性:评估生成回答的忠实度(是否准确反映了检索到的分块内容)和相关性(是否与用户查询相关)。
-
比较不同分块大小的结果:通过对比不同分块大小下的检索和生成效果,确定最适合的分块大小。
# Define strict evaluation prompt templates
FAITHFULNESS_PROMPT_TEMPLATE = """
Evaluate the faithfulness of the AI response compared to the true answer.
User Query: {question}
AI Response: {response}
True Answer: {true_answer}
Faithfulness measures how well the AI response aligns with facts in the true answer, without hallucinations.
INSTRUCTIONS:
- Score STRICTLY using only these values:
* {full} = Completely faithful, no contradictions with true answer
* {partial} = Partially faithful, minor contradictions
* {none} = Not faithful, major contradictions or hallucinations
- Return ONLY the numerical score ({full}, {partial}, or {none}) with no explanation or additional text.
"""
上下文增强检索(Context-Enriched Retrieval)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/4_context_enriched_rag.ipynb
-
带重叠上下文的分块
:将文本分割成带有重叠上下文的块,以保留语义连贯性。
-
-
上下文感知检索
:检索相关文本块及其相邻块,以提高回答的完整性。
def context_enriched_search(query, text_chunks, embeddings, k=1, context_size=1):
"""
Retrieves the most relevant chunk along with its neighboring chunks.
"""
# Convert the query into an embedding vector
query_embedding = create_embeddings(query).data[0].embedding
similarity_scores = []
# Compute similarity scores between query and each text chunk embedding
for i, chunk_embedding in enumerate(embeddings):
# Calculate cosine similarity between the query embedding and current chunk embedding
similarity_score = cosine_similarity(np.array(query_embedding), np.array(chunk_embedding.embedding))
# Store the index and similarity score as a tuple
similarity_scores.append((i, similarity_score))
# Sort chunks by similarity score in descending order (highest similarity first)
similarity_scores.sort(key=lambda x: x[1], reverse=True)
# Get the index of the most relevant chunk
top_index = similarity_scores[0][0]
# Define the range for context inclusion
# Ensure we don't go below 0 or beyond the length of text_chunks
start = max(0, top_index - context_size)
end = min(len(text_chunks), top_index + context_size + 1)
# Return the relevant chunk along with its neighboring context chunks
return [text_chunks[i] for i in range(start, end)]
上下文块标题(Contextual Chunk Headers, CCH)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/5_contextual_chunk_headers_rag.ipynb
CCH通过在每个分块的前面添加高级别的上下文信息(例如文档标题或章节标题),然后再对分块进行嵌入处理。这种方法能够显著提升检索质量,避免生成与上下文无关的回答。
-
添加上下文标题
:在对文本进行分块之前,提取文档的高级别上下文信息,如标题、章节名或小标题等。
-
构建带上下文的分块
:将提取到的上下文信息作为“标题”附加到每个分块的开头,形成带上下文的分块。
def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"):
"""
Generates a title/header for a given text chunk using an LLM.
"""
# Define the system prompt to guide the AI's behavior
system_prompt = "Generate a concise and informative title for the given text."
# Generate a response from the AI model based on the system prompt and text chunk
response = client.chat.completions.create(
model=model,
temperature=0,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": chunk}
]
)
# Return the generated header/title, stripping any leading/trailing whitespace
return response.choices[0].message.content.strip()
问题生成(Document Augmentation)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/6_doc_augmentation_rag.ipynb
通过为每个文本块生成相关的问题,我们能够改进检索过程,从而让语言模型生成更优质、更准确的回答。
def generate_questions(text_chunk, num_questions=5, model="meta-llama/Llama-3.2-3B-Instruct"):
"""
Generates relevant questions that can be answered from the given text chunk.
Args:
text_chunk (str): The text chunk to generate questions from.
num_questions (int): Number of questions to generate.
model (str): The model to use for question generation.
"""
# Generate questions using the OpenAI API
response = client.chat.completions.create(
model=model,
temperature=0.7,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
)
# Extract and clean questions from the response
questions_text = response.choices[0].message.content.strip()
questions = []
# Extract questions using regex pattern matching
for line in questions_text.split('\n'):
# Remove numbering and clean up whitespace
cleaned_line = re.sub(r'^\d+\.\s*', '', line.strip())
if cleaned_line and cleaned_line.endswith('?'):
questions.append(cleaned_line)
return questions
查询转换(Query Transformations)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/7_query_transform.ipynb
-
查询重写(Query Rewriting):通过分析用户查询的意图,添加更具体的关键词或限定条件;
-
后退提示(Step-back Prompting):通过扩展查询的范围,提取更广泛的背景信息;
-
子查询分解(Sub-query Decomposition):将一个复杂的查询拆分为多个简单查询,分别检索,然后整合结果;
def rewrite_query(original_query, model="meta-llama/Llama-3.2-3B-Instruct"):
# Define the system prompt to guide the AI assistant's behavior
system_prompt = "You are an AI assistant specialized in improving search queries. Your task is to rewrite user queries to be more specific, detailed, and likely to retrieve relevant information."
# Define the user prompt with the original query to be rewritten
user_prompt = f"""
Rewrite the following query to make it more specific and detailed. Include relevant terms and concepts that might help in retrieving accurate information.
Original query: {original_query}
Rewritten query:
"""
# Generate the rewritten query using the specified model
response = client.chat.completions.create(
model=model,
temperature=0.0, # Low temperature for deterministic output
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
)
# Return the rewritten query, stripping any leading/trailing whitespace
return response.choices[0].message.content.strip()
重排序(Reranker)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/8_reranker.ipynb
对初始检索得到的每个文档或文本块进行相关性评估,计算其与用户查询的匹配程度。此外将最相关的文档或文本块排在前面,确保高质量的内容优先被考虑用于回答生成。
相关片段提取(Relevant Segment Extraction, RSE)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/9_rse.ipynb
从检索到的文本块中识别出与用户查询高度相关的片段,将识别出的相关片段重新组合成连续的文本段落。
上下文压缩(Contextual Compression)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/10_contextual_compression.ipynb
在RAG中检索文档时,我们常常会得到包含相关和不相关信息的文本块。上下文压缩可以帮助我们:
反馈循环(Feedback Loop)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/11_feedback_loop_rag.ipynb
传统的RAG系统是静态的——它们仅基于嵌入向量的相似性检索信息。而引入反馈循环后,我们创建了一个动态系统,它能够:
-
记住哪些有效(哪些无效)
:通过记录用户对回答的评价,系统可以了解哪些检索结果和回答是高质量的,哪些需要改进。
-
调整文档相关性评分
:根据用户反馈,系统会动态调整文档或文本块的相关性评分,使其在未来的检索中更精准地反映用户需求。
-
将成功的问答对纳入知识库
:系统会将用户认可的问答对作为新的知识纳入其知识库,以便在后续交互中直接使用。
-
随着每次用户交互变得更智能
:通过不断学习用户的偏好和需求,系统能够逐步优化其检索和生成策略,从而提供更优质的回答。
自适应检索(Adaptive Retrieval)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/12_adaptive_rag.ipynb
不同的问题需要不同的检索策略。我们的系统通过以下步骤实现自适应检索:
-
识别用户查询的类型,例如事实性问题(Factual)、分析性问题(Analytical)、观点性问题(Opinion)或上下文性问题(Contextual)。
-
-
Self-RAG
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/13_self_rag.ipynb
Self-RAG在检索和生成过程中引入了动态决策机制,能够根据具体需求决定何时以及如何使用检索到的信息。这种机制使得Self-RAG能够生成更高质量、更可靠的回答。
命题分块(Proposition Chunking)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/14_proposition_chunking.ipynb
命题分块用于将文档分解为原子化的事实性陈述,从而实现更精准的检索。与传统的基于字符数的简单分块方法不同,命题分块能够保留每个事实的语义完整性。
多模态RAG
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/15_multimodel_rag.ipynb
多模态检索增强生成(Multi-Modal RAG)系统能够从文档中提取文本和图像内容,为图像生成描述性标题,并结合这两种内容类型来回答用户查询。这种方法通过将视觉信息纳入知识库,显著增强了传统RAG系统的能力。
传统RAG系统仅处理文本数据,但许多文档中的关键信息实际上包含在图像、图表和表格中。通过为这些视觉元素生成描述,并将其纳入检索系统,我们能够:
-
获取图表和示意图中的信息
:解锁图像和图表中隐藏的信息,使其能够被检索和利用。
-
理解表格和图表
:通过图像描述理解与文本内容相辅相成的表格和图表。
-
构建更全面的知识库
:将视觉信息与文本信息相结合,形成更丰富、更全面的知识库。
-
回答依赖视觉数据的问题
:能够回答那些需要结合视觉信息才能解答的问题。
融合检索(Fusion Retrieval)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/16_fusion_rag.ipynb
融合检索(Fusion Retrieval)系统结合了语义向量搜索(Semantic Vector Search)和基于关键词的BM25检索的优势。这种方法通过捕捉概念上的相似性和精确的关键词匹配,显著提升了检索质量。
Graph RAG
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/17_graph_rag.ipynb
与传统的RAG系统相比,Graph RAG通过将知识组织为一个连接的图结构,而不是一个平面的文档集合,从而显著提升了系统的检索能力和生成质量。这种方法允许系统在相关概念之间进行导航,检索出比标准向量相似性方法更具上下文相关性的信息。
层次化索引
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/18_hierarchy_rag.ipynb
层次化索引(Hierarchical Indices)方法通过采用两层检索策略来提升检索效率和质量:首先通过摘要识别相关的文档部分,然后从这些部分中检索具体的细节。
Hypothetical Document Embedding (HyDE)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/19_HyDE_rag.ipynb
HyDE通过将用户查询转换为假设的文档答案,再进行检索,从而弥合了简短查询与长篇文档之间的语义鸿沟。
Corrective RAG (CRAG)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/20_crag.ipynb
纠正型检索增强生成(Corrective RAG, CRAG)通过动态评估检索到的信息,并在必要时使用网络搜索作为备用手段来纠正检索过程,从而显著提升了系统的准确性和可靠性。
与
36000+
来自竞赛爱好者一起交流~