WIP

2026-03-02 18:36:39 +00:00 · 2026-02-14 12:26:47 +05:30
parent 7725f66e39
commit 8e8dd9d9f6
8 changed files with 531 additions and 81 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,81 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/),
+and this project adheres to [Semantic Versioning](https://semver.org/).
+
+## [0.1.0] - 2025-07-15
+
+### Added
+
+- **Conversation history limits** — new `history` option with `maxMessages` (default 100)
+  and `maxTotalChars` (default unlimited) to prevent unbounded memory growth.
+  Oldest messages are trimmed in pairs to preserve user/assistant turn structure.
+  Emits `history_trimmed` event when messages are evicted.
+- **Audio input size validation** — new `maxAudioInputSize` option (default 10 MB).
+  Oversized or empty audio payloads are rejected early with an `error` / `warning` event
+  instead of being forwarded to the transcription model.
+- **Serial input queue** — `sendText()`, WebSocket `transcript` messages, and
+  transcribed audio are now queued and processed one at a time. This prevents
+  race conditions where concurrent calls could corrupt `conversationHistory` or
+  interleave streaming output.
+- **LLM stream cancellation** — an `AbortController` is now threaded into
+  `streamText()` via `abortSignal`. Barge-in, disconnect, and explicit
+  interrupts abort the LLM stream immediately (saving tokens) instead of only
+  cancelling TTS.
+- **`interruptCurrentResponse(reason)`** — new public method that aborts both
+  the LLM stream *and* ongoing speech in a single call. WebSocket barge-in
+  (`transcript` / `audio` / `interrupt` messages) now uses this instead of
+  `interruptSpeech()` alone.
+- **`destroy()`** — permanently tears down the agent, releasing the socket,
+  clearing history and tools, and removing all event listeners.
+  A `destroyed` getter is also exposed. Any subsequent method call throws.
+- **`history_trimmed` event** — emitted with `{ removedCount, reason }` when
+  the sliding-window trims old messages.
+- **Input validation** — `sendText("")` now throws, and incoming WebSocket
+  `transcript` / `audio` messages are validated before processing.
+
+### Changed
+
+- **`disconnect()` is now a full cleanup** — aborts in-flight LLM and TTS
+  streams, clears the speech queue, rejects pending queued inputs, and removes
+  socket listeners before closing. Previously it only called `socket.close()`.
+- **`connect()` and `handleSocket()` are idempotent** — calling either when a
+  socket is already attached will cleanly tear down the old connection first
+  instead of leaking it.
+- **`sendWebSocketMessage()` is resilient** — checks `socket.readyState` and
+  wraps `send()` in a try/catch so a socket that closes mid-send does not throw
+  an unhandled exception.
+- **Speech queue completion uses a promise** — `processUserInput` now awaits a
+  `speechQueueDonePromise` instead of busy-wait polling
+  (`while (queue.length) { await sleep(100) }`), reducing CPU waste and
+  eliminating a race window.
+- **`interruptSpeech()` resolves the speech-done promise** — so
+  `processUserInput` can proceed immediately after a barge-in instead of
+  potentially hanging.
+- **WebSocket message handler uses `if/else if`** — prevents a single message
+  from accidentally matching multiple type branches.
+- **Chunk ID wraps at `Number.MAX_SAFE_INTEGER`** — avoids unbounded counter
+  growth in very long-running sessions.
+- **`processUserInput` catch block cleans up speech state** — on stream error
+  the pending text buffer is cleared and any in-progress speech is interrupted,
+  so the agent does not get stuck in a broken state.
+- **WebSocket close handler calls `cleanupOnDisconnect()`** — aborts LLM + TTS,
+  clears queues, and rejects pending input promises.
+
+### Fixed
+
+- Typo in JSDoc: `"Process text deltra"` → `"Process text delta"`.
+
+## [0.0.1] - 2025-07-14
+
+### Added
+
+- Initial release.
+- Streaming text generation via AI SDK `streamText`.
+- Multi-step tool calling with `stopWhen`.
+- Chunked streaming TTS with parallel generation and barge-in support.
+- Audio transcription via AI SDK `experimental_transcribe`.
+- WebSocket transport with full stream/tool/speech lifecycle events.
+- Browser voice client example (`example/voice-client.html`).