LangChain

Runhouse

Runhouse (opens in a new tab) 允许在环境和用户之间进行远程计算和数据处理。请参阅 Runhouse docs (opens in a new tab)。

此示例介绍了如何使用LangChain和 Runhouse (opens in a new tab)，与托管在您自己的GPU上，或在AWS，GCP，AWS或Lambda上提供的按需GPU交互的模型。

注意：此代码中使用 SelfHosted 而非 Runhouse 作为名称。

!pip install runhouse

from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh

INFO | 2023-04-17 16:47:36,173 | No auth token provided, so not using RNS API to save and load configs

# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)
 
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')
 
# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'], 
# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
# name='rh-a10x')

template = """Question: {question}
 
Answer: Let's think step by step."""
 
prompt = PromptTemplate(template=template, input_variables=["question"])

llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["pip:./", "transformers", "torch"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
 
llm_chain.run(question)

INFO | 2023-02-17 05:42:23,537 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:24,016 | Time to send message: 0.48 seconds

"  Let's say we're talking sports teams who won the Super Bowl in the year Justin Beiber"

您还可以通过SelfHostedHuggingFaceLLM接口加载更多自定义模型：

llm = SelfHostedHuggingFaceLLM(
    model_id="google/flan-t5-small",
    task="text2text-generation",
    hardware=gpu,
)

llm("What is the capital of Germany?")

INFO | 2023-02-17 05:54:21,681 | Running _generate_text via gRPC
INFO | 2023-02-17 05:54:21,937 | Time to send message: 0.25 seconds

'berlin'

使用自定义加载函数，我们可以直接在远程硬件上加载自定义流水线：

def load_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline  # Need to be inside the fn in notebooks
    model_id = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    pipe = pipeline(
        "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
    )
    return pipe
 
def inference_fn(pipeline, prompt, stop = None):
    return pipeline(prompt)[0]["generated_text"][len(prompt):]

llm = SelfHostedHuggingFaceLLM(model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn)

llm("Who is the current US president?")

INFO | 2023-02-17 05:42:59,219 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:59,522 | Time to send message: 0.3 seconds

'john w. bush'

您可以直接通过网络将您的流水线发送给您的模型，但这仅适用于小模型（'<2 Gb'），并且速度较慢：

pipeline = load_pipeline()
llm = SelfHostedPipeline.from_pipeline(
    pipeline=pipeline, hardware=gpu, model_reqs=model_reqs
)

相反，我们还可以将其发送到硬件的文件系统，这将更快。

rh.blob(pickle.dumps(pipeline), path="models/pipeline.pkl").save().to(gpu, path="models")
 
llm = SelfHostedPipeline.from_pipeline(pipeline="models/pipeline.pkl", hardware=gpu)

Replicate Sagemaker