RAG(Retrieval-Augmented Generation)是一种先进的人工智能技术,由Facebook AI Research(FAIR)团队在2020年提出。它结合了检索(Retrieval)和生成(Generation)两个步骤,通过从大量数据中检索相关信息来辅助语言模型生成更准确、更丰富的文本。
from langchain.document_loaders import UnstructuredWordDocumentLoader loader = UnstructuredWordDocumentLoader("example_data/fake.docx") data = loader.load() data LangChain 0.0.148from langchain.document_loaders import UnstructuredWordDocumentLoader loader = UnstructuredWordDocumentLoader("example_data/fake.docx") data = loader.load() data
from langchain.vectorstores import FAISS from langchain.embeddings.openai import OpenAIEmbeddings
faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings()) docs = faiss_index.similarity_search("How will the community be engaged?", k=2) for doc in docs: print(str(doc.metadata["page"]) + ":", doc.page_content)
2.3.3 在线读取工具
from langchain.document_loaders import OnlinePDFLoader loader = OnlinePDFLoader("https://arxiv.org/pdf/2302.03803.pdf") data = loader.load() print(data)
2.3.4 PDFMiner
from langchain.document_loaders import PDFMinerLoader loader = PDFMinerLoader("example_data/layout-parser-paper.pdf") data = loader.load()
2.4 Email邮寄解析
from langchain.document_loaders import UnstructuredEmailLoader loader = UnstructuredEmailLoader('example_data/fake-email.eml') data = loader.load()