PGVector
使用
postgres
作为后端并利用pgvector
扩展实现的 LangChain 向量存储抽象。
代码位于一个名为: langchain_postgres 的集成包中。
您可以运行以下命令以启动一个带有 pgvector
扩展的 postgres 容器:
docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
状态
此代码已从 langchain_community
移植到一个专用包 langchain-postgres
。已进行以下更改:
- langchain_postgres 仅与 psycopg3 一起使用。请将您的连接字符串从
postgresql+psycopg2://...
更新为postgresql+psycopg://langchain:langchain@...
(是的,驱动程序名称是psycopg
而不是psycopg3
,但它将使用psycopg3
)。 - 嵌入存储和集合的模式已更改,以使 add_documents 能够正确处理用户指定的 ID。
- 现在必须传递一个显式的连接对象。
目前,没有机制支持在模式更改时轻松迁移数据。因此,向量存储中的任何模式更改都将要求用户重新创建表并重新添加文档。 如果这是一个问题,请使用其他向量存储。如果没有,这个实现应该适合您的用例。
安装依赖
在这里,我们使用 langchain_cohere
进行嵌入,但您可以使用其他嵌入提供者。
!pip install --quiet -U langchain_cohere
!pip install --quiet -U langchain_postgres
初始化向量存储
from langchain_cohere import CohereEmbeddings
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"
embeddings = CohereEmbeddings(model="embed-english-v3.0")
vectorstore = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
删除表
如果您需要删除表(例如,将嵌入更新为不同的维度或仅更新嵌入提供者):
vectorstore.drop_tables()
添加文档
将文档添加到向量存储中
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]
vectorstore.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
vectorstore.similarity_search("kitty", k=10)
[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),
Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),
Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),
Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),
Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'}),
Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),
Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'})]
通过 ID 添加文档将覆盖任何匹配该 ID 的现有文档。
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]
过滤支持
向量存储支持一组可以应用于文档元数据字段的过滤器。
操作符 | 意义/类别 |
---|---|
\$eq | 等于 (==) |
\$ne | 不等 (!=) |
\$lt | 小于 (<) |
\$lte | 小于或等于 (<=) |
\$gt | 大于 (>) |
\$gte | 大于或等于 (>=) |
\$in | 特殊案例 (in) |
\$nin | 特殊案例 (not in) |
\$between | 特殊案例 (between) |
\$like | 文本 (like) |
\$ilike | 文本 (不区分大小写的 like) |
\$and | 逻辑 (and) |
\$or | 逻辑 (or) |
vectorstore.similarity_search("kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}})
[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),
Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'})]
如果您提供一个包含多个字段的字典,但没有操作符,则顶级将被解释为逻辑 AND 过滤器
vectorstore.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]
vectorstore.similarity_search(
"ducks",
k=10,
filter={
"$and": [
{"id": {"$in": [1, 5, 2, 9]}},
{"location": {"$in": ["pond", "market"]}},
]
},
)
[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]
vectorstore.similarity_search("bird", k=10, filter={"location": {"$ne": "pond"}})
[Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),
Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),
Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),
Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),
Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'}),
Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'})]