Xorbits 推理 (Xinference)

本页面演示了如何使用 Xinference 与 LangChain。

Xinference 是一个强大且多功能的库，旨在为 LLM、语音识别模型和多模态模型提供服务，即使在您的笔记本电脑上也能实现。使用 Xorbits 推理，您可以仅通过一条命令轻松部署和提供您或最先进的内置模型。

安装与设置

Xinference 可以通过 pip 从 PyPI 安装：

pip install "xinference[all]"

LLM

Xinference 支持多种与 GGML 兼容的模型，包括 chatglm、baichuan、whisper、vicuna 和 orca。要查看内置模型，请运行以下命令：

xinference list --all

Xinference 的封装

您可以通过运行以下命令启动 Xinference 的本地实例：

xinference

您也可以在分布式集群中部署 Xinference。为此，首先在您想要运行的服务器上启动 Xinference 监视器：

xinference-supervisor -H "${supervisor_host}"

然后，在您想要运行 Xinference 工作节点的其他服务器上启动 Xinference 工作节点：

xinference-worker -e "http://${supervisor_host}:9997"

您可以通过运行以下命令再次启动 Xinference 的本地实例：

xinference

一旦 Xinference 正在运行，您将可以通过 CLI 或 Xinference 客户端访问模型管理的端点。

对于本地部署，端点将是 http://localhost:9997。

对于集群部署，端点将是 http://${supervisor_host}:9997。

然后，您需要启动一个模型。您可以指定模型名称和其他属性，包括 model_size_in_billions 和 quantization。您可以使用命令行界面 (CLI) 来完成此操作。例如，

xinference launch -n orca -s 3 -q q4_0

将返回一个模型 uid。

示例用法：

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # 替换 model_uid 为从启动模型返回的模型 UID
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

用法

有关更多信息和详细示例，请参阅关于 xinference LLMs 的示例

嵌入

Xinference 还支持嵌入查询和文档。请参阅 xinference 嵌入示例以获取更详细的演示。

Xorbits 推理 (Xinference)

安装与设置

LLM

Xinference 的封装

用法

嵌入

此页面是否有帮助？

您还可以留下详细的反馈在 GitHub 上

Xorbits 推理 (Xinference)

安装与设置​

LLM​

Xinference 的封装​

用法​

嵌入​

此页面是否有帮助？

您还可以留下详细的反馈 在 GitHub 上

安装与设置

LLM

Xinference 的封装

用法

嵌入

您还可以留下详细的反馈在 GitHub 上