Runhouse
Runhouse (opens in a new tab) 允许在环境和用户之间进行远程计算和数据处理。请参阅 Runhouse docs (opens in a new tab)。
此示例介绍了如何使用LangChain和 Runhouse (opens in a new tab),与托管在您自己的GPU上,或在AWS,GCP,AWS或Lambda上提供的按需GPU交互的模型。
注意:此代码中使用 SelfHosted
而非 Runhouse
作为名称。
!pip install runhouse
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh
INFO | 2023-04-17 16:47:36,173 | No auth token provided, so not using RNS API to save and load configs
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')
# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
# name='rh-a10x')
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["pip:./", "transformers", "torch"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
llm_chain.run(question)
INFO | 2023-02-17 05:42:23,537 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:24,016 | Time to send message: 0.48 seconds
" Let's say we're talking sports teams who won the Super Bowl in the year Justin Beiber"
您还可以通过SelfHostedHuggingFaceLLM接口加载更多自定义模型:
llm = SelfHostedHuggingFaceLLM(
model_id="google/flan-t5-small",
task="text2text-generation",
hardware=gpu,
)
llm("What is the capital of Germany?")
INFO | 2023-02-17 05:54:21,681 | Running _generate_text via gRPC
INFO | 2023-02-17 05:54:21,937 | Time to send message: 0.25 seconds
'berlin'
使用自定义加载函数,我们可以直接在远程硬件上加载自定义流水线:
def load_pipeline():
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # Need to be inside the fn in notebooks
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
"text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
)
return pipe
def inference_fn(pipeline, prompt, stop = None):
return pipeline(prompt)[0]["generated_text"][len(prompt):]
llm = SelfHostedHuggingFaceLLM(model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn)
llm("Who is the current US president?")
INFO | 2023-02-17 05:42:59,219 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:59,522 | Time to send message: 0.3 seconds
'john w. bush'
您可以直接通过网络将您的流水线发送给您的模型,但这仅适用于小模型('<2 Gb')
,并且速度较慢:
pipeline = load_pipeline()
llm = SelfHostedPipeline.from_pipeline(
pipeline=pipeline, hardware=gpu, model_reqs=model_reqs
)
相反,我们还可以将其发送到硬件的文件系统,这将更快。
rh.blob(pickle.dumps(pipeline), path="models/pipeline.pkl").save().to(gpu, path="models")
llm = SelfHostedPipeline.from_pipeline(pipeline="models/pipeline.pkl", hardware=gpu)