獨(dú)家 | 開始使用LangChain：幫助你構(gòu)建LLM驅(qū)動(dòng)應(yīng)用的新手教程（2）

發(fā)布人：數(shù)據(jù)派THU 時(shí)間：2023-07-16 來源：工程師

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對(duì)面交流
  海量資料庫查詢

提示: 管理LLM輸入

LLM有怪異的api。盡管用自然語言向LLM輸入提示應(yīng)該感覺很直觀，但在從LLM獲得所需的輸出之前，需要對(duì)提示進(jìn)行大量調(diào)整。這個(gè)過程稱為提示工程。一旦有了好的提示，您可能希望將其用作其他目的的模板。因此，LangChain為您提供了所謂的提示模板，可幫助您從多個(gè)組件構(gòu)建提示。

from langchain import PromptTemplate
template = "What is a good name for a company that makes {product}?"
prompt = PromptTemplate(    input_variables=["product"],    template=template,)
prompt.format(product="colorful socks")

上面的提示可以看作是Zero-shot Learning（零射擊學(xué)習(xí)是一種設(shè)置，模型可以學(xué)習(xí)識(shí)別以前在訓(xùn)練中沒有明確看到的事物），您希望LLM在足夠的相關(guān)數(shù)據(jù)上進(jìn)行了訓(xùn)練，以提供令人滿意的結(jié)果。改善LLM輸出的另一個(gè)技巧是在提示中添加一些示例，并使其成為一些問題設(shè)置。

from langchain import PromptTemplate, FewShotPromptTemplate
examples = [    {"word": "happy", "antonym": "sad"},    {"word": "tall", "antonym": "short"},]
example_template = """Word: {word}Antonym: {antonym}\n"""
example_prompt = PromptTemplate(    input_variables=["word", "antonym"],    template=example_template,)
few_shot_prompt = FewShotPromptTemplate(    examples=examples,    example_prompt=example_prompt,    prefix="Give the antonym of every input",    suffix="Word: {input}\nAntonym:",    input_variables=["input"],    example_separator="\n",)
few_shot_prompt.format(input="big")

上面的代碼將生成一個(gè)提示模板，并根據(jù)提供的示例和輸入組成以下提示：

Give the antonym of every input
Word: happyAntonym: sad


Word: tallAntonym: short

Word: bigAntonym:

Chain: 將LLMs與其他組件組合
在LangChain中Chain簡單地描述了將LLMs與其他組件組合以創(chuàng)建應(yīng)用程序的過程。一些示例是: 將LLM與提示模板組合 (請(qǐng)參閱本節(jié))，通過將第一個(gè)LLM的輸出作為第二個(gè)LLM的輸入來順序組合多個(gè)LLM (請(qǐng)參閱本節(jié))，將LLM與外部數(shù)據(jù)組合，例如，對(duì)于問題回答 (請(qǐng)參閱索引)，將LLM與長期記憶相結(jié)合，例如，對(duì)于上一節(jié)中的聊天記錄 (請(qǐng)參閱內(nèi)存)，我們創(chuàng)建了一個(gè)提示模板。當(dāng)我們想將其與我們的LLM一起使用時(shí)，我們可以使用LLMChain，如下所示：

from langchain.chains import LLMChain
chain = LLMChain(llm = llm,                   prompt = prompt)
# Run the chain only specifying the input variable.chain.run("colorful socks")

如果我們想使用這個(gè)第一個(gè)LLM的輸出作為第二個(gè)LLM的輸入，我們可以使用一個(gè)SimpleSequentialChain：

from langchain.chains import LLMChain, SimpleSequentialChain
# Define the first chain as in the previous code example# ...
# Create a second chain with a prompt template and an LLMsecond_prompt = PromptTemplate(    input_variables=["company_name"],    template="Write a catchphrase for the following company: {company_name}",)
chain_two = LLMChain(llm=llm, prompt=second_prompt)
# Combine the first and the second chain overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)
# Run the chain specifying only the input variable for the first chain.catchphrase = overall_chain.run("colorful socks")

結(jié)果示例

索引：訪問外部數(shù)據(jù)
LLM的一個(gè)限制是它們?nèi)狈ι舷挛男畔?nbsp;(例如，訪問某些特定文檔或電子郵件)。您可以通過允許LLMs訪問特定的外部數(shù)據(jù)來解決此問題。為此，您首先需要使用文檔加載器加載外部數(shù)據(jù)。LangChain為不同類型的文檔提供了各種加載程序，從pdf和電子郵件到網(wǎng)站和YouTube視頻。讓我們從YouTube視頻中加載一些外部數(shù)據(jù)。如果你想加載一個(gè)大的文本文檔并用文本拆分器拆分它，你可以參考官方文檔。

# pip install youtube-transcript-api# pip install pytube
from langchain.document_loaders import YoutubeLoader
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
documents = loader.load()

現(xiàn)在，您已經(jīng)準(zhǔn)備好外部數(shù)據(jù)作為文檔，您可以使用矢量數(shù)據(jù)庫 (VectorStore) 中的文本嵌入模型 (請(qǐng)參閱模型) 對(duì)其進(jìn)行索引。流行的矢量數(shù)據(jù)庫包括Pinecone，weavviate和Milvus。在本文中，我們使用的是Faiss，因?yàn)樗恍枰狝PI密鑰。

# pip install faiss-cpufrom langchain.vectorstores import FAISS
# create the vectorestore to use as the indexdb = FAISS.from_documents(documents, embeddings)

您的文檔 (在本例中為視頻) 現(xiàn)在作為嵌入存儲(chǔ)在矢量存儲(chǔ)中?，F(xiàn)在你可以用這個(gè)外部數(shù)據(jù)做各種各樣的事情。讓我們將其用于帶有信息尋回器的問答任務(wù):

from langchain.chains import RetrievalQA
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(    llm=llm,     chain_type="stuff",     retriever=retriever,     return_source_documents=True)
query = "What am I never going to do?"result = qa({"query": query})
print(result['result'])

結(jié)果示例

存儲(chǔ)器: 記住先前的對(duì)話
對(duì)于像聊天機(jī)器人這樣的應(yīng)用程序來說，他們能夠記住以前的對(duì)話是至關(guān)重要的。但是默認(rèn)情況下，LLMs沒有任何長期記憶，除非您輸入聊天記錄。

有無記憶的聊天機(jī)器人的對(duì)比

LangChain通過提供處理聊天記錄的幾種不同選項(xiàng)來解決此問題：

保留所有對(duì)話
保留最新的k對(duì)話
總結(jié)對(duì)話

在這個(gè)例子中，我們將使用ConversationChain作為這個(gè)應(yīng)用程序會(huì)話內(nèi)存。

from langchain import ConversationChain
conversation = ConversationChain(llm=llm, verbose=True)
conversation.predict(input="Alice has a parrot.")
conversation.predict(input="Bob has two cats.")
conversation.predict(input="How many pets do Alice and Bob have?")

這將生成上圖中的右手對(duì)話。如果沒有ConversationChain來保持對(duì)話記憶，對(duì)話將看起來像上圖中左側(cè)的對(duì)話。
代理: 訪問其他工具
盡管LLMs非常強(qiáng)大，但仍有一些局限性: 它們?nèi)狈ι舷挛男畔?nbsp;(例如，訪問訓(xùn)練數(shù)據(jù)中未包含的特定知識(shí))，它們可能很快就會(huì)過時(shí) (例如，GPT-4在2021年9月之前就接受了數(shù)據(jù)培訓(xùn))，并且他們不擅長數(shù)學(xué)。
因?yàn)長LM可能會(huì)對(duì)自己無法完成的任務(wù)產(chǎn)生幻覺，所以我們需要讓他們?cè)L問補(bǔ)充工具，例如搜索 (例如Google搜索)，計(jì)算器 (例如Python REPL或Wolfram Alpha) 和查找 (例如，維基百科)。此外，我們需要代理根據(jù)LLM的輸出來決定使用哪些工具來完成任務(wù)。
請(qǐng)注意，某些LLM(例如google/flan-t5-xl) 不適用于以下示例，因?yàn)樗鼈儾蛔裱瓡?huì)話-反應(yīng)-描述模板。對(duì)我來說，這是我在OpenAI上設(shè)置付費(fèi)帳戶并切換到OpenAI API的原因。
下面是一個(gè)例子，代理人首先用維基百科查找奧巴馬的出生日期，然后用計(jì)算器計(jì)算他2022年的年齡。

# pip install wikipediafrom langchain.agents import load_toolsfrom langchain.agents import initialize_agentfrom langchain.agents import AgentType
tools = load_tools(["wikipedia", "llm-math"], llm=llm)agent = initialize_agent(tools,                          llm,                          agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,                          verbose=True)

agent.run("When was Barack Obama born? How old was he in 2022?")

結(jié)果圖片
總結(jié)
就在幾個(gè)月前，我們所有人 (或至少我們大多數(shù)人) 都對(duì)ChatGPT的功能印象深刻。現(xiàn)在，像LangChain這樣的新開發(fā)人員工具使我們能夠在幾個(gè)小時(shí)內(nèi)在筆記本電腦上構(gòu)建同樣令人印象深刻的原型--這是一些真正令人興奮的時(shí)刻！
LangChain是一個(gè)開源的Python庫，它使任何可以編寫代碼的人都可以構(gòu)建以LLM為動(dòng)力的應(yīng)用程序。該軟件包為許多基礎(chǔ)模型提供了通用接口，可以進(jìn)行提示管理，并在撰寫本文時(shí)通過代理充當(dāng)其他組件 (如提示模板，其他LLM，外部數(shù)據(jù)和其他工具) 的中央接口。該庫提供了比本文中提到的更多的功能。以目前的發(fā)展速度，這篇文章也可能在一個(gè)月內(nèi)過時(shí)。
在撰寫本文時(shí)，我注意到庫和文檔圍繞OpenAI的API展開。盡管許多示例與開源基礎(chǔ)模型google/flan-t5-xl一起使用，但我在兩者之間選擇了OpenAI API。盡管不是免費(fèi)的，但在本文中嘗試OpenAI API只花了我大約1美元。

原文標(biāo)題：Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications原文鏈接：https://towardsdatascience.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c

*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布，僅代表博主個(gè)人觀點(diǎn)，如有侵權(quán)請(qǐng)聯(lián)系工作人員刪除。

博客專欄

獨(dú)家 | 開始使用LangChain：幫助你構(gòu)建LLM驅(qū)動(dòng)應(yīng)用的新手教程（2）

相關(guān)推薦

技術(shù)專區(qū)