专栏名称: Coggle数据科学

Coggle全称Communication For Kaggle，专注数据科学领域竞赛相关资讯分享。

RAG技术全探索：20种方法源码解读与实践

Coggle数据科学 · 公众号 · · 2025-03-19 15:05

正文

请到「今天看啥」查看全文

检索增强生成（RAG）是一种结合信息检索与生成模型的混合方法。它通过引入外部知识来提升语言模型的性能，从而提高回答的准确性和事实正确性。

基础RAG（Basic RAG）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/1_simple_rag.ipynb

在简单RAG设置中，我们遵循以下步骤：

数据读取 ：加载并预处理文本数据。
数据分块 ：将数据拆分成更小的块，以提升检索性能。
嵌入生成 ：利用嵌入模型将文本块转换为数值表示。
语义搜索 ：根据用户查询检索相关文本块。
响应生成 ：基于检索到的文本，利用语言模型生成回答。

def chunk_text(text, n, overlap):
    chunks = []  # Initialize an empty list to store the chunks
    
    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks

# Initialize the OpenAI client with the base URL and API key
client = OpenAI(
    base_url="https://api.studio.nebius.com/v1/",
    api_key=os.getenv("OPENAI_API_KEY")  # Retrieve the API key from environment variables
)

# Define the system prompt for the AI assistant
system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"

def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"):
    """
    Generates a response from the AI model based on the system prompt and user message.






    
    Args:
    system_prompt (str): The system prompt to guide the AI's behavior.
    user_message (str): The user's message or query.
    model (str): The model to be used for generating the response. Default is "meta-llama/Llama-2-7B-chat-hf".

    Returns:
    dict: The response from the AI model.
    """
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )
    return response

语义分块（Semantic Chunking）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/2_semantic_chunking.ipynb

与传统的固定长度分块方法不同，语义分块会根据句子之间的语义相似性来确定分块的边界。

这种方法通过计算句子嵌入向量的相似度来确定分块。当句子之间的语义相似度低于某个阈值时，就会将文本划分为不同的块。例如，可以使用滑动窗口技术计算句子之间的语义相关性。

def compute_breakpoints(similarities, method="percentile", threshold=90):
    # Determine the threshold value based on the selected method
    if method == "percentile":
        # Calculate the Xth percentile of the similarity scores
        threshold_value = np.percentile(similarities, threshold)
    elif method == "standard_deviation":
        # Calculate the mean and standard deviation of the similarity scores
        mean = np.mean(similarities)
        std_dev = np.std(similarities)
        # Set the threshold value to mean minus X standard deviations
        threshold_value = mean - (threshold * std_dev)
    elif method == "interquartile":
        # Calculate the first and third quartiles (Q1 and Q3)
        q1, q3 = np.percentile(similarities, [25, 75])
        # Set the threshold value using the IQR rule for outliers
        threshold_value = q1 - 1.5 * (q3 - q1)
    else:
        # Raise an error if an invalid method is provided
        raise ValueError("Invalid method. Choose 'percentile', 'standard_deviation', or 'interquartile'.")

    # Identify indices where similarity drops below the threshold value
    return [i for i, sim in enumerate(similarities) if sim < threshold_value]

评估分块大小（Evaluating Chunk Size）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/3_chunk_size_selector.ipynb

评估忠实度和相关性：评估生成回答的忠实度（是否准确反映了检索到的分块内容）和相关性（是否与用户查询相关）。
比较不同分块大小的结果：通过对比不同分块大小下的检索和生成效果，确定最适合的分块大小。

# Define strict evaluation prompt templates
FAITHFULNESS_PROMPT_TEMPLATE = """
Evaluate the faithfulness of the AI response compared to the true answer.
User Query: {question}
AI Response: {response}
True Answer: {true_answer}

Faithfulness measures how well the AI response aligns with facts in the true answer, without hallucinations.






    
INSTRUCTIONS:
- Score STRICTLY using only these values:
    * {full} = Completely faithful, no contradictions with true answer
    * {partial} = Partially faithful, minor contradictions
    * {none} = Not faithful, major contradictions or hallucinations
- Return ONLY the numerical score ({full}, {partial}, or {none}) with no explanation or additional text.
"""

上下文增强检索（Context-Enriched Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/4_context_enriched_rag.ipynb

带重叠上下文的分块 ：将文本分割成带有重叠上下文的块，以保留语义连贯性。
嵌入生成 ：将文本块转换为数值表示（嵌入向量）。
上下文感知检索 ：检索相关文本块及其相邻块，以提高回答的完整性。

def context_enriched_search(query, text_chunks, embeddings, k=1, context_size=1):
    """
    Retrieves the most relevant chunk along with its neighboring chunks.
    """
    # Convert the query into an embedding vector
    query_embedding = create_embeddings(query).data[0].embedding
    similarity_scores = []

    # Compute similarity scores between query and each text chunk embedding
    for i, chunk_embedding in enumerate(embeddings):
        # Calculate cosine similarity between the query embedding and current chunk embedding
        similarity_score = cosine_similarity(np.array(query_embedding), np.array(chunk_embedding.embedding))
        # Store the index and similarity score as a tuple
        similarity_scores.append((i, similarity_score))

    # Sort chunks by similarity score in descending order (highest similarity first)
    similarity_scores.sort(key=lambda x: x[1], reverse=True)

    # Get the index of the most relevant chunk
    top_index = similarity_scores[0][0]

    # Define the range for context inclusion
    # Ensure we don't go below 0 or beyond the length of text_chunks
    start = max(0, top_index - context_size)
    end = min(len(text_chunks), top_index + context_size + 1)

    # Return the relevant chunk along with its neighboring context chunks
    return [text_chunks[i] for i in range(start, end)]

上下文块标题（Contextual Chunk Headers, CCH）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/5_contextual_chunk_headers_rag.ipynb

CCH通过在每个分块的前面添加高级别的上下文信息（例如文档标题或章节标题），然后再对分块进行嵌入处理。这种方法能够显著提升检索质量，避免生成与上下文无关的回答。

添加上下文标题 ：在对文本进行分块之前，提取文档的高级别上下文信息，如标题、章节名或小标题等。
构建带上下文的分块 ：将提取到的上下文信息作为“标题”附加到每个分块的开头，形成带上下文的分块。

def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"):
    """
    Generates a title/header for a given text chunk using an LLM.
    """
    # Define the system prompt to guide the AI's behavior
    system_prompt = "Generate a concise and informative title for the given text."
    
    # Generate a response from the AI model based on the system prompt and text chunk
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": chunk}
        ]
    )

    # Return the generated header/title, stripping any leading/trailing whitespace
    return response.choices[0].message.content.strip()

问题生成（Document Augmentation）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/6_doc_augmentation_rag.ipynb

通过为每个文本块生成相关的问题，我们能够改进检索过程，从而让语言模型生成更优质、更准确的回答。

def generate_questions(text_chunk, num_questions=5, model="meta-llama/Llama-3.2-3B-Instruct"):
    """
    Generates relevant questions that can be answered from the given text chunk.

    Args:
    text_chunk (str): The text chunk to generate questions from.
    num_questions (int): Number of questions to generate.
    model (str): The model to use for question generation.
    """
    
    # Generate questions using the OpenAI API
    response = client.chat.completions.create(
        model=model,
        temperature=0.7,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    
    # Extract and clean questions from the response
    questions_text = response.choices[0].message.content.strip()
    questions = []
    
    # Extract questions using regex pattern matching
    for line in questions_text.split('\n'):
        # Remove numbering and clean up whitespace
        cleaned_line = re.sub(r'^\d+\.\s*', '', line.strip())
        if cleaned_line and cleaned_line.endswith('?'):
            questions.append(cleaned_line)
    
    return questions

查询转换（Query Transformations）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/7_query_transform.ipynb

查询重写（Query Rewriting）：通过分析用户查询的意图，添加更具体的关键词或限定条件；
后退提示（Step-back Prompting）：通过扩展查询的范围，提取更广泛的背景信息；
子查询分解（Sub-query Decomposition）：将一个复杂的查询拆分为多个简单查询，分别检索，然后整合结果；

def rewrite_query(original_query, model="meta-llama/Llama-3.2-3B-Instruct"):
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = "You are an AI assistant specialized in improving search queries. Your task is to rewrite user queries to be more specific, detailed, and likely to retrieve relevant information."
    
    # Define the user prompt with the original query to be rewritten
    user_prompt = f"""
    Rewrite the following query to make it more specific and detailed. Include relevant terms and concepts that might help in retrieving accurate information.
    
    Original query: {original_query}
    
    Rewritten query:
    """
    
    # Generate the rewritten query using the specified model
    response = client.chat.completions.create(
        model=model,
        temperature=0.0,  # Low temperature for deterministic output
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    
    # Return the rewritten query, stripping any leading/trailing whitespace
    return response.choices[0].message.content.strip()

重排序（Reranker）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/8_reranker.ipynb

对初始检索得到的每个文档或文本块进行相关性评估，计算其与用户查询的匹配程度。此外将最相关的文档或文本块排在前面，确保高质量的内容优先被考虑用于回答生成。

上下文压缩（Contextual Compression）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/10_contextual_compression.ipynb

在RAG中检索文档时，我们常常会得到包含相关和不相关信息的文本块。上下文压缩可以帮助我们：

去除无关的句子和段落；
仅关注与查询相关的信息；
在上下文窗口中最大化有用信号。

反馈循环（Feedback Loop）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/11_feedback_loop_rag.ipynb

传统的RAG系统是静态的——它们仅基于嵌入向量的相似性检索信息。而引入反馈循环后，我们创建了一个动态系统，它能够：

记住哪些有效（哪些无效） ：通过记录用户对回答的评价，系统可以了解哪些检索结果和回答是高质量的，哪些需要改进。
调整文档相关性评分 ：根据用户反馈，系统会动态调整文档或文本块的相关性评分，使其在未来的检索中更精准地反映用户需求。
将成功的问答对纳入知识库 ：系统会将用户认可的问答对作为新的知识纳入其知识库，以便在后续交互中直接使用。
随着每次用户交互变得更智能 ：通过不断学习用户的偏好和需求，系统能够逐步优化其检索和生成策略，从而提供更优质的回答。

自适应检索（Adaptive Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/12_adaptive_rag.ipynb

不同的问题需要不同的检索策略。我们的系统通过以下步骤实现自适应检索：

识别用户查询的类型，例如事实性问题（Factual）、分析性问题（Analytical）、观点性问题（Opinion）或上下文性问题（Contextual）。
为不同类型的问题设计不同的检索策略；
根据选定的检索策略，执行具体的检索技术；

Self-RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/13_self_rag.ipynb

Self-RAG在检索和生成过程中引入了动态决策机制，能够根据具体需求决定何时以及如何使用检索到的信息。这种机制使得Self-RAG能够生成更高质量、更可靠的回答。

判断对于给定的查询是否需要进行检索。
在需要时检索可能相关的文档。

命题分块（Proposition Chunking）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/14_proposition_chunking.ipynb

命题分块用于将文档分解为原子化的事实性陈述，从而实现更精准的检索。与传统的基于字符数的简单分块方法不同，命题分块能够保留每个事实的语义完整性。

多模态RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/15_multimodel_rag.ipynb

多模态检索增强生成（Multi-Modal RAG）系统能够从文档中提取文本和图像内容，为图像生成描述性标题，并结合这两种内容类型来回答用户查询。这种方法通过将视觉信息纳入知识库，显著增强了传统RAG系统的能力。

传统RAG系统仅处理文本数据，但许多文档中的关键信息实际上包含在图像、图表和表格中。通过为这些视觉元素生成描述，并将其纳入检索系统，我们能够：

获取图表和示意图中的信息 ：解锁图像和图表中隐藏的信息，使其能够被检索和利用。
理解表格和图表 ：通过图像描述理解与文本内容相辅相成的表格和图表。
构建更全面的知识库 ：将视觉信息与文本信息相结合，形成更丰富、更全面的知识库。
回答依赖视觉数据的问题 ：能够回答那些需要结合视觉信息才能解答的问题。

融合检索（Fusion Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/16_fusion_rag.ipynb

融合检索（Fusion Retrieval）系统结合了语义向量搜索（Semantic Vector Search）和基于关键词的BM25检索的优势。这种方法通过捕捉概念上的相似性和精确的关键词匹配，显著提升了检索质量。

Graph RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/17_graph_rag.ipynb

与传统的RAG系统相比，Graph RAG通过将知识组织为一个连接的图结构，而不是一个平面的文档集合，从而显著提升了系统的检索能力和生成质量。这种方法允许系统在相关概念之间进行导航，检索出比标准向量相似性方法更具上下文相关性的信息。

层次化索引

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/18_hierarchy_rag.ipynb

层次化索引（Hierarchical Indices）方法通过采用两层检索策略来提升检索效率和质量：首先通过摘要识别相关的文档部分，然后从这些部分中检索具体的细节。

Hypothetical Document Embedding (HyDE)

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/19_HyDE_rag.ipynb

HyDE通过将用户查询转换为假设的文档答案，再进行检索，从而弥合了简短查询与长篇文档之间的语义鸿沟。

Corrective RAG (CRAG)

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/20_crag.ipynb

纠正型检索增强生成（Corrective RAG, CRAG）通过动态评估检索到的信息，并在必要时使用网络搜索作为备用手段来纠正检索过程，从而显著提升了系统的准确性和可靠性。

# 学习大模型 & 讨论Kaggle #

△长按添加竞赛小助手

每天大模型、算法竞赛、干货资讯

与 36000+ 来自竞赛爱好者一起交流~

RAG技术全探索：20种方法源码解读与实践

正文

请到「今天看啥」查看全文

基础RAG（Basic RAG）

语义分块（Semantic Chunking）

评估分块大小（Evaluating Chunk Size）

上下文增强检索（Context-Enriched Retrieval）

上下文块标题（Contextual Chunk Headers, CCH）

问题生成（Document Augmentation）

查询转换（Query Transformations）

重排序（Reranker）

相关片段提取（Relevant Segment Extraction, RSE）

上下文压缩（Contextual Compression） (adsbygoogle = window.adsbygoogle || []).push({});

反馈循环（Feedback Loop）

自适应检索（Adaptive Retrieval）

Self-RAG

命题分块（Proposition Chunking）

多模态RAG

融合检索（Fusion Retrieval）

Graph RAG

层次化索引

Hypothetical Document Embedding (HyDE)

Corrective RAG (CRAG)

请到「今天看啥」查看全文

上下文压缩（Contextual Compression）