Skip to content

Commit

Permalink
.Net: Add support for ImageContent to use data URIs in ChatPromptPars…
Browse files Browse the repository at this point in the history
…er so templates can use base64 encoded images. (#8401)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

At present, including images in prompt templates using base64 data
encoding is not possible. This limitation is due to
`ChatPromptParser.cs` exclusively calling the `ImageContent` constructor
that requires a URI, which leads to an `InvalidOperationException`. The
change required is straightforward and the limitation has been discussed
before, [for example
here](#7121).
Closes #7150.

### Description

The proposed trivial fix involves a simple check to determine if the
content starts with `data:`, and if it does the `ImageContent`
constructor that accepts a `dataUri` is utilized instead.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Marcelo Garcia 🛸 <marcgarc@microsoft.com>
  • Loading branch information
MarceloAGG and marcgarcms authored Aug 30, 2024
1 parent 3bfee7b commit 78289af
Show file tree
Hide file tree
Showing 4 changed files with 110 additions and 2 deletions.
51 changes: 51 additions & 0 deletions dotnet/samples/Concepts/PromptTemplates/HandlebarsVisionPrompts.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Copyright (c) Microsoft. All rights reserved.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.PromptTemplates.Handlebars;

namespace PromptTemplates;

// This example shows how to use chat completion handlebars template prompts with base64 encoded images as a parameter.
public class HandlebarsVisionPrompts(ITestOutputHelper output) : BaseTest(output)
{
[Fact]
public async Task RunAsync()
{
const string HandlebarsTemplate = """
<message role="system">You are an AI assistant designed to help with image recognition tasks.</message>
<message role="user">
<text>{{request}}</text>
<image>{{imageData}}</image>
</message>
""";

var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(
modelId: TestConfiguration.OpenAI.ChatModelId,
apiKey: TestConfiguration.OpenAI.ApiKey)
.Build();

var templateFactory = new HandlebarsPromptTemplateFactory();
var promptTemplateConfig = new PromptTemplateConfig()
{
Template = HandlebarsTemplate,
TemplateFormat = "handlebars",
Name = "Vision_Chat_Prompt",
};
var function = kernel.CreateFunctionFromPrompt(promptTemplateConfig, templateFactory);

var arguments = new KernelArguments(new Dictionary<string, object?>
{
{"request","Describe this image:"},
{"imageData", "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAAXNSR0IArs4c6QAAACVJREFUKFNj/KTO/J+BCMA4iBUyQX1A0I10VAizCj1oMdyISyEAFoQbHwTcuS8AAAAASUVORK5CYII="}
});

var response = await kernel.InvokeAsync(function, arguments);
Console.WriteLine(response);

/*
Output:
The image is a solid block of bright red color. There are no additional features, shapes, or textures present.
*/
}
}
3 changes: 2 additions & 1 deletion dotnet/samples/Concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,8 @@ Down below you can find the code snippets that demonstrate the usage of many Sem
- [MultiplePromptTemplates](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptTemplates/MultiplePromptTemplates.cs)
- [PromptFunctionsWithChatGPT](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptTemplates/PromptFunctionsWithChatGPT.cs)
- [TemplateLanguage](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptTemplates/TemplateLanguage.cs)
- [PromptyFunction](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptYemplates/PromptyFunction.cs)
- [PromptyFunction](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptTemplates/PromptyFunction.cs)
- [HandlebarsVisionPrompts](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/PromptTemplates/HandlebarsVisionPrompts.cs)

## RAG - Retrieval-Augmented Generation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,14 @@ private static ChatMessageContent ParseChatNode(PromptNode node)
{
if (childNode.TagName.Equals(ImageTagName, StringComparison.OrdinalIgnoreCase))
{
items.Add(new ImageContent(new Uri(childNode.Content!)));
if (childNode.Content!.StartsWith("data:", StringComparison.OrdinalIgnoreCase))
{
items.Add(new ImageContent(childNode.Content));
}
else
{
items.Add(new ImageContent(new Uri(childNode.Content!)));
}
}
else if (childNode.TagName.Equals(TextTagName, StringComparison.OrdinalIgnoreCase))
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,40 @@ public void ItReturnsChatHistoryWithValidContentItemsIncludeCData()
""", c.Content));
}

[Fact]
public void ItReturnsChatHistoryWithValidDataImageContent()
{
// Arrange
string prompt = GetValidPromptWithDataUriImageContent();

// Act
bool result = ChatPromptParser.TryParse(prompt, out var chatHistory);

// Assert
Assert.True(result);
Assert.NotNull(chatHistory);

Assert.Collection(chatHistory,
c => Assert.Equal("What can I help with?", c.Content),
c =>
{
Assert.Equal("Explain this image", c.Content);
Assert.Collection(c.Items,
o =>
{
Assert.IsType<TextContent>(o);
Assert.Equal("Explain this image", ((TextContent)o).Text);
},
o =>
{
Assert.IsType<ImageContent>(o);
Assert.Equal("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAAXNSR0IArs4c6QAAACVJREFUKFNj/KTO/J+BCMA4iBUyQX1A0I10VAizCj1oMdyISyEAFoQbHwTcuS8AAAAASUVORK5CYII=", ((ImageContent)o).DataUri);
Assert.Equal("image/png", ((ImageContent)o).MimeType);
Assert.NotNull(((ImageContent)o).Data);
});
});
}

[Fact]
public void ItReturnsChatHistoryWithValidContentItemsIncludeCode()
{
Expand Down Expand Up @@ -210,6 +244,21 @@ Second line.
""";
}

private static string GetValidPromptWithDataUriImageContent()
{
return
"""

<message role="assistant">What can I help with?</message>

<message role='user'>
<text>Explain this image</text>
<image>data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAAXNSR0IArs4c6QAAACVJREFUKFNj/KTO/J+BCMA4iBUyQX1A0I10VAizCj1oMdyISyEAFoQbHwTcuS8AAAAASUVORK5CYII=</image>
</message>

""";
}

private static string GetValidPromptWithCDataSection()
{
return
Expand Down

0 comments on commit 78289af

Please sign in to comment.