mirror of
https://github.com/Bijit-Mondal/VoiceAgent.git
synced 2026-03-02 18:36:39 +00:00
WIP
This commit is contained in:
81
CHANGELOG.md
Normal file
81
CHANGELOG.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/).
|
||||
|
||||
## [0.1.0] - 2025-07-15
|
||||
|
||||
### Added
|
||||
|
||||
- **Conversation history limits** — new `history` option with `maxMessages` (default 100)
|
||||
and `maxTotalChars` (default unlimited) to prevent unbounded memory growth.
|
||||
Oldest messages are trimmed in pairs to preserve user/assistant turn structure.
|
||||
Emits `history_trimmed` event when messages are evicted.
|
||||
- **Audio input size validation** — new `maxAudioInputSize` option (default 10 MB).
|
||||
Oversized or empty audio payloads are rejected early with an `error` / `warning` event
|
||||
instead of being forwarded to the transcription model.
|
||||
- **Serial input queue** — `sendText()`, WebSocket `transcript` messages, and
|
||||
transcribed audio are now queued and processed one at a time. This prevents
|
||||
race conditions where concurrent calls could corrupt `conversationHistory` or
|
||||
interleave streaming output.
|
||||
- **LLM stream cancellation** — an `AbortController` is now threaded into
|
||||
`streamText()` via `abortSignal`. Barge-in, disconnect, and explicit
|
||||
interrupts abort the LLM stream immediately (saving tokens) instead of only
|
||||
cancelling TTS.
|
||||
- **`interruptCurrentResponse(reason)`** — new public method that aborts both
|
||||
the LLM stream *and* ongoing speech in a single call. WebSocket barge-in
|
||||
(`transcript` / `audio` / `interrupt` messages) now uses this instead of
|
||||
`interruptSpeech()` alone.
|
||||
- **`destroy()`** — permanently tears down the agent, releasing the socket,
|
||||
clearing history and tools, and removing all event listeners.
|
||||
A `destroyed` getter is also exposed. Any subsequent method call throws.
|
||||
- **`history_trimmed` event** — emitted with `{ removedCount, reason }` when
|
||||
the sliding-window trims old messages.
|
||||
- **Input validation** — `sendText("")` now throws, and incoming WebSocket
|
||||
`transcript` / `audio` messages are validated before processing.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`disconnect()` is now a full cleanup** — aborts in-flight LLM and TTS
|
||||
streams, clears the speech queue, rejects pending queued inputs, and removes
|
||||
socket listeners before closing. Previously it only called `socket.close()`.
|
||||
- **`connect()` and `handleSocket()` are idempotent** — calling either when a
|
||||
socket is already attached will cleanly tear down the old connection first
|
||||
instead of leaking it.
|
||||
- **`sendWebSocketMessage()` is resilient** — checks `socket.readyState` and
|
||||
wraps `send()` in a try/catch so a socket that closes mid-send does not throw
|
||||
an unhandled exception.
|
||||
- **Speech queue completion uses a promise** — `processUserInput` now awaits a
|
||||
`speechQueueDonePromise` instead of busy-wait polling
|
||||
(`while (queue.length) { await sleep(100) }`), reducing CPU waste and
|
||||
eliminating a race window.
|
||||
- **`interruptSpeech()` resolves the speech-done promise** — so
|
||||
`processUserInput` can proceed immediately after a barge-in instead of
|
||||
potentially hanging.
|
||||
- **WebSocket message handler uses `if/else if`** — prevents a single message
|
||||
from accidentally matching multiple type branches.
|
||||
- **Chunk ID wraps at `Number.MAX_SAFE_INTEGER`** — avoids unbounded counter
|
||||
growth in very long-running sessions.
|
||||
- **`processUserInput` catch block cleans up speech state** — on stream error
|
||||
the pending text buffer is cleared and any in-progress speech is interrupted,
|
||||
so the agent does not get stuck in a broken state.
|
||||
- **WebSocket close handler calls `cleanupOnDisconnect()`** — aborts LLM + TTS,
|
||||
clears queues, and rejects pending input promises.
|
||||
|
||||
### Fixed
|
||||
|
||||
- Typo in JSDoc: `"Process text deltra"` → `"Process text delta"`.
|
||||
|
||||
## [0.0.1] - 2025-07-14
|
||||
|
||||
### Added
|
||||
|
||||
- Initial release.
|
||||
- Streaming text generation via AI SDK `streamText`.
|
||||
- Multi-step tool calling with `stopWhen`.
|
||||
- Chunked streaming TTS with parallel generation and barge-in support.
|
||||
- Audio transcription via AI SDK `experimental_transcribe`.
|
||||
- WebSocket transport with full stream/tool/speech lifecycle events.
|
||||
- Browser voice client example (`example/voice-client.html`).
|
||||
Reference in New Issue
Block a user