Enhancing OpenSearch With AG-UI Protocol Support

by Admin 49 views
Enhancing OpenSearch with AG-UI Protocol Support

Hey there, data enthusiasts! Ready to dive into something super cool? This article is all about bringing the AG-UI protocol to OpenSearch ml-commons. In simple terms, we're making it easier for AI agents in OpenSearch to chat with you in real-time on your website or app. This is a game-changer for building awesome, interactive experiences. Let's dig in and see how we're making it happen!

1. Introduction: What's the Buzz About AG-UI?

So, what's AG-UI? It stands for Agent-User Interaction Protocol, and it's all about making sure AI agents and your users can talk to each other smoothly. It's an open, lightweight protocol that sets the rules for how these agents can connect to your apps. By adding AG-UI support to OpenSearch ml-commons, we're opening the door for seamless, real-time convos between your frontends and OpenSearch's smart agents. Imagine having agents right in your app, using cool tools, and streaming interactions like a pro. That's the power of AG-UI!

Understanding AG-UI's Role:

  • MCP (Model Context Protocol): Gives agents the power of backend tools.
  • A2A (Agent-to-Agent Protocol): Lets agents chat with each other.
  • AG-UI: Brings agents into your apps and helps them use frontend tools.

Our plan is to make OpenSearch ml-commons agents play nicely with any app that supports AG-UI. Pretty neat, huh?

Why is AG-UI important?

Well, it's like this: AG-UI helps AI agents show up in your app and interact like a normal user. The agents can use various tools. For example, if you are looking for information on a product, the agent can use search tool to find it. This makes your apps more dynamic and helpful. So, by adding AG-UI, OpenSearch agents will be able to do more and work better within apps.

2. Motivation: Why Are We Doing This?

Let's be real, the main reason we're doing this is for website chatbot experiences. Think about it: AI agents are perfect for creating interactive chatbots on websites and apps. With AG-UI, OpenSearch ml-commons agents can power these chatbots directly. This means you can build super smart conversational interfaces that use OpenSearch's search, knowledge retrieval, and ML magic. All while keeping that real-time, streaming feel that users love.

The Problem We're Solving

Right now, OpenSearch ml-commons agents work best with APIs, not the AG-UI protocol, which is what apps are using. This causes some headaches:

  • No standardized events: No smooth event system that follows AG-UI's rules.
  • Backend only: Tools only work on the backend, missing out on AG-UI's cool frontend tools.
  • Compatibility issues: Frontend developers can't easily use AG-UI apps with OpenSearch.

How AG-UI Solves These Problems

Implementing AG-UI solves these pain points by offering:

  • Protocol Compliance: Full compatibility with AG-UI's protocol.
  • Hybrid Tool Architecture: Support for both backend and frontend tools.
  • Ecosystem Integration: Seamless compatibility with existing AG-UI frameworks.
  • Standardized Formats: AG-UI compatible input/output formats.

In a nutshell, we want to make it super easy for developers to build interactive experiences with OpenSearch and AG-UI.

3. Proposed Solution: How We'll Make It Happen

We're proposing three key enhancements to make OpenSearch ml-commons AG-UI compatible:

3.1. AG-UI Protocol Input Support: Speaking the Right Language

First up, we'll make sure OpenSearch ml-commons understands AG-UI. This means we'll automatically detect and convert AG-UI requests into a format that ml-commons can work with. This way, any app using AG-UI can talk directly to your OpenSearch agents.

// Example AG-UI Input Format
{
  "threadId": "thread_abc123",
  "runId": "run_def456",
  "messages": [
    {"role": "user", "content": "Search for documents"},
    {"role": "assistant", "toolCalls": [...]},
    {"role": "tool", "content": "...", "toolCallId": "call_123"}
  ],
  "tools": [...],
  "context": []
}

3.2. Frontend Integration via AG-UI Client SDK: Making it Easy for Developers

Next, we'll give developers an AG-UI client SDK with Server-Sent Events (SSE) implemented. This allows for smooth, real-time streaming of events. Imagine building your frontend apps without a hitch. This means less work and more time for creativity!

// Frontend Integration Example using HttpAgent
import { HttpAgent } from '@ag-ui/client';

const agent = new HttpAgent({
  url: 'https://your-opensearch-cluster/_plugins/_ml/agents/agent_123/_execute/stream'
});

3.3. AG-UI Tool Integration Model: A Hybrid Approach

Finally, we'll implement AG-UI's hybrid tool execution model. This means the agents can use both backend tools (OpenSearch's search capabilities) and frontend tools (interactive elements in your app). We're integrating this with the ReAct (Reasoning and Acting) loop.

graph TD
    A[Initial AG-UI Request] --> B[ReAct Loop 1]
    B --> C{Tool Selection}
    C -->|Backend Tool| D[Execute in OpenSearch]
    C -->|Frontend Tool| E[End ReAct Loop 1]
    
    D --> F[Tool Results]
    F --> G[Continue ReAct Loop 1]
    G --> H[Next Iteration]
    H --> I[Final Answer]
    I --> J[Stream Events]
    
    E --> K[Return Tool Call]
    K --> L[User Executes Tool]
    L --> M[New Request with Results]
    M --> N[ReAct Loop 2]
    N --> O[Continue Reasoning]
    O --> P[Final Answer]
    P --> Q[Stream Events]

This architecture is designed to make sure the process flows easily and integrates smoothly.

4. Technical Design: The Nitty-Gritty

Alright, let's get into the technical details and explore the architecture.

4.1. Core Components Architecture: Building Blocks of the System

Here are the key components that will make this all possible. This is the foundation upon which everything else is built:

Input Processing Layer
├── AGUIInputConverter - Format detection and conversion
├── AGUIConstants - Centralized constants and field definitions
└── Validation - Input structure and content validation

Agent Execution Layer
├── MLAGUIAgentRunner - AGUI-specific agent processing
├── MLAgentExecutor - Routing and agent selection
└── Context Processing - Chat history and context extraction

Streaming Event System
├── BaseEvent - Abstract event foundation
├── Event Types - 13+ specific event implementations
├── AGUIStreamingEventManager - Lifecycle and state management
└── REST Integration - SSE streaming endpoint handling

Tool Integration System
├── AGUIFrontendTool - Frontend tool representation
├── Tool Coordination - Backend/frontend tool routing
├── Function Calling - Multi-LLM interface support
└── Result Aggregation - Tool result consolidation

This structure helps keep everything organized and efficient.

4.2. AG-UI Input Processing Implementation: How It All Starts

The AGUIInputConverter is the key here. It's responsible for:

  • Format Detection: Checking for the AG-UI format.
  • Parameter Mapping: Matching AG-UI fields with internal parameters.
  • Tool Result Extraction: Processing user messages that have tool results.

4.3. Event System Implementation: Keeping the Conversation Flowing

  • AG-UI Events: Based on the protocol's event types.
  • Run Events: RUN_STARTED, RUN_FINISHED, RUN_ERROR
  • Text Events: TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END
  • Tool Events: TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT
  • State Events: MESSAGES_SNAPSHOT
  • AGUIStreamingEventManager: Handles the state of events and auto cleanup.

4.4. Agent Execution Flow: How the Agent Works

This is how an agent will process the information, step by step:

// Agent type routing based on input format
if (AGUIInputConverter.isAGUIInput(inputJson)) {
    return new MLAGUIAgentRunner(client, settings, clusterService, ...);
} else {
    return new MLChatAgentRunner(client, settings, clusterService, ...);
}
// MLAGUIAgentRunner Processing
@Override
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. Process AG-UI messages into chat history
    processAGUIMessages(mlAgent, params, llmInterface);

    // 2. Process AG-UI context into ml-commons format
    processAGUIContext(mlAgent, params);

    // 3. Delegate to MLChatAgentRunner for actual execution
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);
}

Key steps include:

  • Tool Call Detection: Finding assistant messages that have toolCalls.
  • Tool Result Processing: Handling tool role messages that contain toolCallId.
  • Chat History Generation: Making a clean history by filtering out tool messages.
  • Context Transformation: Converting AG-UI context parameters.
  • LLM Format Conversion: Using FunctionCalling.formatAGUIToolCalls().

4.5. Frontend Tool Execution and ReAct Loop Integration: Bridging the Gap

We're integrating frontend tool execution with the ReAct loop for hybrid tool use:

1. ReAct Loop Tool Selection Phase:

// Inside runReAct method - when LLM selects a tool to execute
if (tools.containsKey(action)) {
    // Determine if tool is backend or frontend
    boolean isBackendTool = backendTools != null && backendTools.containsKey(action);
    boolean isFrontendTool = !isBackendTool;

    if (isFrontendTool) {
        // PAUSE REACT LOOP: Create tool delegation response for frontend
        ModelTensorOutput frontendToolResponse = createFrontendToolCallResponse(toolCallId, action, actionInput);
        listener.onResponse(frontendToolResponse); // Return control to frontend
        return; // Exit ReAct loop - frontend will execute tool and send results back
    } else {
        // CONTINUE REACT LOOP: Execute backend tool normally
        runTool(tools, toolSpecMap, tmpParameters, nextStepListener, action, actionInput, toolParams, interactions, toolCallId, functionCalling);
    }
}

2. Tool Result Processing and ReAct Resumption:

// processAGUIToolResults() - when frontend tool results return
private void processAGUIToolResults(..., String aguiToolCallResults) {
    // 1. Parse frontend tool execution results
    List<Map<String, String>> toolResults = gson.fromJson(aguiToolCallResults, listType);

    // 2. Convert to LLM message format using FunctionCalling
    List<LLMMessage> llmMessages = functionCalling.supply(formattedResults);

    // 3. Reconstruct conversation context
    List<String> interactions = new ArrayList<>();
    // Add original assistant message with tool_calls
    interactions.addAll(assistantMessages);
    // Add tool result messages
    for (LLMMessage llmMessage : llmMessages) {
        interactions.add(llmMessage.getResponse());
    }

    // 4. RESUME REACT LOOP: Continue with tool results integrated
    processUnifiedTools(mlAgent, updatedParams, listener, memory, sessionId, functionCalling, frontendTools);
}

Tool Visibility Strategy:

// Unified tool approach - both frontend and backend tools visible to LLM
Map<String, Tool> unifiedToolsMap = new HashMap<>(backendToolsMap);
unifiedToolsMap.putAll(wrapFrontendToolsAsToolObjects(frontendTools));

// LLM sees all tools but execution is differentiated at runtime
runReAct(llm, unifiedToolsMap, toolSpecMap, params, memory, sessionId, tenantId, listener, functionCalling, backendToolsMap);

This setup lets the LLM coordinate backend and frontend tools in the same conversation.

4.6. AG-UI Agent Type: Inside the Architecture

Let's break down how the AG-UI agent type works.

// AG-UI Agent Execution Flow
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. AG-UI Protocol Processing
    processAGUIMessages(mlAgent, params, llmInterface);    // Convert AG-UI messages to chat history
    processAGUIContext(mlAgent, params);                   // Extract and format contextual information

    // 2. Delegate to Standard Conversational Runner
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);

    // 3. Streaming events are generated in RestMLExecuteStreamAction
}

Key steps:

  1. Message Array Processing: Convert AG-UI messages to chat history format.
  2. Tool Call Extraction: Identifies and processes tool calls for LLM.
  3. Tool Result Integration: Handles frontend tool results.
  4. Context Transformation: Converts AG-UI context arrays.
  5. Chat History Generation: Creates the chat history.

Comparison with Existing Conversational Agent:

Aspect Conversational Agent (MLChatAgentRunner) AG-UI Agent (MLAGUIAgentRunner)
Input Format ml-commons native format with question parameter AG-UI protocol format with message arrays, threadId, runId
Message Handling Single question + optional chat history parameter Message array processing with role-based conversation flow
Tool Execution Backend tools only Hybrid: backend tools + frontend tool delegation
Tool Result Processing Direct tool execution results Frontend tool results via AG-UI message format
Streaming Output ml-commons response format AG-UI event stream (RUN_STARTED, TEXT_MESSAGE_CONTENT, etc.)
LLM Integration Direct LLM interface calls AG-UI tool call formatting + standard LLM integration

Conclusion: What's Next?

So, what does all this mean? We're taking a big step toward making OpenSearch even more powerful and user-friendly. By adding AG-UI protocol support, we're opening up exciting possibilities for building engaging, real-time conversational experiences. It's all about making OpenSearch agents more versatile and easier to integrate into your applications. This work will also make OpenSearch a great choice for developers looking to build cutting-edge chatbot and conversational AI solutions. Stay tuned for more updates and, as always, happy coding!