feat: enhance README with VoiceAgent usage example and configuration options

2026-03-02 18:36:39 +00:00 · 2026-02-13 17:36:18 +05:30
parent 4bda956530
commit 7725f66e39
1 changed files with 77 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -29,7 +29,60 @@ Streaming voice/text agent SDK built on AI SDK with optional WebSocket transport

 `VOICE_WS_ENDPOINT` is optional for text-only usage.

-## VoiceAgent configuration
+## VoiceAgent usage (as in the demo)
+
+Minimal end-to-end example using AI SDK tools, streaming text, and streaming TTS:
+
+```ts
+import "dotenv/config";
+import { VoiceAgent } from "./src";
+import { tool } from "ai";
+import { z } from "zod";
+import { openai } from "@ai-sdk/openai";
+
+const weatherTool = tool({
+   description: "Get the weather in a location",
+   inputSchema: z.object({ location: z.string() }),
+   execute: async ({ location }) => ({ location, temperature: 72, conditions: "sunny" }),
+});
+
+const agent = new VoiceAgent({
+   model: openai("gpt-4o"),
+   transcriptionModel: openai.transcription("whisper-1"),
+   speechModel: openai.speech("gpt-4o-mini-tts"),
+   instructions: "You are a helpful voice assistant.",
+   voice: "alloy",
+   speechInstructions: "Speak in a friendly, natural conversational tone.",
+   outputFormat: "mp3",
+   streamingSpeech: {
+      minChunkSize: 40,
+      maxChunkSize: 180,
+      parallelGeneration: true,
+      maxParallelRequests: 2,
+   },
+   endpoint: process.env.VOICE_WS_ENDPOINT,
+   tools: { getWeather: weatherTool },
+});
+
+agent.on("text", ({ role, text }) => {
+   const prefix = role === "user" ? "👤" : "🤖";
+   console.log(prefix, text);
+});
+
+agent.on("chunk:text_delta", ({ text }) => process.stdout.write(text));
+agent.on("speech_start", ({ streaming }) => console.log("speech_start", streaming));
+agent.on("audio_chunk", ({ chunkId, format, uint8Array }) => {
+   console.log("audio_chunk", chunkId, format, uint8Array.length);
+});
+
+await agent.sendText("What's the weather in San Francisco?");
+
+if (process.env.VOICE_WS_ENDPOINT) {
+   await agent.connect(process.env.VOICE_WS_ENDPOINT);
+}
+```
+
+### Configuration options

 The agent accepts:

@@ -49,6 +102,25 @@ The agent accepts:
    - `parallelGeneration`
    - `maxParallelRequests`

+### Common methods
+
+- `sendText(text)` – process text input (streamed response)
+- `sendAudio(base64Audio)` – process base64 audio input
+- `sendAudioBuffer(buffer)` – process raw audio buffer input
+- `transcribeAudio(buffer)` – transcribe audio directly
+- `generateAndSendSpeechFull(text)` – non-streaming TTS fallback
+- `interruptSpeech(reason)` – interrupt streaming speech (barge‑in)
+- `connect(url?)` / `handleSocket(ws)` – WebSocket usage
+
+### Key events (from demo)
+
+- `text` – user/assistant messages
+- `chunk:text_delta` – streaming text deltas
+- `chunk:tool_call` / `tool_result` – tool lifecycle
+- `speech_start` / `speech_complete` / `speech_interrupted`
+- `speech_chunk_queued` / `audio_chunk` / `audio`
+- `connected` / `disconnected`
+
 ## Run (text-only check)

 This validates LLM + tool + streaming speech without requiring WebSocket: