Google Imagen

Imagen on Vertex AI 将谷歌最先进的图像生成 AI 能力带给应用开发者。借助 Vertex AI 上的 Imagen，应用开发者可以构建下一代 AI 产品，利用 AI 生成技术在几秒钟内将用户的想象转化为高质量的视觉资产。

使用 Langchain 上的 Imagen，您可以执行以下任务

VertexAIImageGeneratorChat : 仅使用文本提示生成新颖的图像（文本到图像的 AI 生成）。
VertexAIImageEditorChat : 使用文本提示编辑整个上传或生成的图像。
VertexAIImageCaptioning : 获取图像的文本描述和视觉说明。
VertexAIVisualQnAChat : 使用视觉问答（VQA）获取有关图像的问题答案。
- 注意：目前我们仅支持视觉问答（VQA）的单轮聊天。

图像生成

仅使用文本提示生成新颖的图像（文本到图像 AI 生成）

from langchain_core.messages import AIMessage, HumanMessage
from langchain_google_vertexai.vision_models import VertexAIImageGeneratorChat

# 创建图像生成模型对象
generator = VertexAIImageGeneratorChat()

messages = [HumanMessage(content=["a cat at the beach"])]
response = generator.invoke(messages)

# 查看生成的图像
generated_image = response.content[0]

import base64
import io

from PIL import Image

# 解析响应对象以获取图像的 base64 字符串
img_base64 = generated_image["image_url"]["url"].split(",")[-1]

# 将 base64 字符串转换为图像
img = Image.open(io.BytesIO(base64.decodebytes(bytes(img_base64, "utf-8"))))

# 查看图像
img

图像编辑

使用文本提示编辑整个上传或生成的图像。

编辑生成的图像

from langchain_core.messages import AIMessage, HumanMessage
from langchain_google_vertexai.vision_models import (
    VertexAIImageEditorChat,
    VertexAIImageGeneratorChat,
)

# 创建图像生成模型对象
generator = VertexAIImageGeneratorChat()

# 提供图像的文本输入
messages = [HumanMessage(content=["一个在海滩上的猫"])]

# 调用模型生成图像
response = generator.invoke(messages)

# 从响应中读取图像对象
generated_image = response.content[0]

# 创建图像编辑模型对象
editor = VertexAIImageEditorChat()

# 编写编辑提示并传递“generated_image”
messages = [HumanMessage(content=[generated_image, "一个在海滩上的狗 "])]

# 调用模型编辑图像
editor_response = editor.invoke(messages)

import base64
import io

from PIL import Image

# 解析响应对象以获取图像的 base64 字符串
edited_img_base64 = editor_response.content[0]["image_url"]["url"].split(",")[-1]

# 将 base64 字符串转换为图像
edited_img = Image.open(
    io.BytesIO(base64.decodebytes(bytes(edited_img_base64, "utf-8")))
)

# 查看图像
edited_img

图像字幕生成

from langchain_google_vertexai import VertexAIImageCaptioning

# 初始化图像字幕生成对象
model = VertexAIImageCaptioning()

NOTE : 我们在图像生成部分中使用生成的图像

# 使用在图像生成部分生成的图像
img_base64 = generated_image["image_url"]["url"]
response = model.invoke(img_base64)
print(f"生成的字幕 : {response}")

# 将base64字符串转换为图像
img = Image.open(
    io.BytesIO(base64.decodebytes(bytes(img_base64.split(",")[-1], "utf-8")))
)

# 显示图像
img

生成的字幕 : a cat sitting on the beach looking at the camera

视觉问答 (VQA)

from langchain_google_vertexai import VertexAIVisualQnAChat

model = VertexAIVisualQnAChat()

注意：我们在图像生成部分中使用生成的图像

question = "What animal is shown in the image?"
response = model.invoke(
    input=[
        HumanMessage(
            content=[
                {"type": "image_url", "image_url": {"url": img_base64}},
                question,
            ]
        )
    ]
)

print(f"question : {question}\nanswer : {response.content}")

# Convert base64 string to Image
img = Image.open(
    io.BytesIO(base64.decodebytes(bytes(img_base64.split(",")[-1], "utf-8")))
)

# display Image
img

question : What animal is shown in the image?
answer : cat

Google Imagen

图像生成

图像编辑

编辑生成的图像

图像字幕生成

视觉问答 (VQA)

相关

此页面是否有帮助？

您还可以留下详细的反馈在 GitHub 上

Google Imagen

图像生成​

图像编辑​

编辑生成的图像​

图像字幕生成​

视觉问答 (VQA)​

相关​

此页面是否有帮助？

您还可以留下详细的反馈 在 GitHub 上

图像生成

图像编辑

编辑生成的图像

图像字幕生成

视觉问答 (VQA)

相关

您还可以留下详细的反馈在 GitHub 上