mirror of
https://github.com/Bijit-Mondal/VoiceAgent.git
synced 2026-03-02 18:36:39 +00:00
4.1 KiB
4.1 KiB
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.1.0] - 2025-07-15
Added
- Conversation history limits — new
historyoption withmaxMessages(default 100) andmaxTotalChars(default unlimited) to prevent unbounded memory growth. Oldest messages are trimmed in pairs to preserve user/assistant turn structure. Emitshistory_trimmedevent when messages are evicted. - Audio input size validation — new
maxAudioInputSizeoption (default 10 MB). Oversized or empty audio payloads are rejected early with anerror/warningevent instead of being forwarded to the transcription model. - Serial input queue —
sendText(), WebSockettranscriptmessages, and transcribed audio are now queued and processed one at a time. This prevents race conditions where concurrent calls could corruptconversationHistoryor interleave streaming output. - LLM stream cancellation — an
AbortControlleris now threaded intostreamText()viaabortSignal. Barge-in, disconnect, and explicit interrupts abort the LLM stream immediately (saving tokens) instead of only cancelling TTS. interruptCurrentResponse(reason)— new public method that aborts both the LLM stream and ongoing speech in a single call. WebSocket barge-in (transcript/audio/interruptmessages) now uses this instead ofinterruptSpeech()alone.destroy()— permanently tears down the agent, releasing the socket, clearing history and tools, and removing all event listeners. Adestroyedgetter is also exposed. Any subsequent method call throws.history_trimmedevent — emitted with{ removedCount, reason }when the sliding-window trims old messages.- Input validation —
sendText("")now throws, and incoming WebSockettranscript/audiomessages are validated before processing.
Changed
disconnect()is now a full cleanup — aborts in-flight LLM and TTS streams, clears the speech queue, rejects pending queued inputs, and removes socket listeners before closing. Previously it only calledsocket.close().connect()andhandleSocket()are idempotent — calling either when a socket is already attached will cleanly tear down the old connection first instead of leaking it.sendWebSocketMessage()is resilient — checkssocket.readyStateand wrapssend()in a try/catch so a socket that closes mid-send does not throw an unhandled exception.- Speech queue completion uses a promise —
processUserInputnow awaits aspeechQueueDonePromiseinstead of busy-wait polling (while (queue.length) { await sleep(100) }), reducing CPU waste and eliminating a race window. interruptSpeech()resolves the speech-done promise — soprocessUserInputcan proceed immediately after a barge-in instead of potentially hanging.- WebSocket message handler uses
if/else if— prevents a single message from accidentally matching multiple type branches. - Chunk ID wraps at
Number.MAX_SAFE_INTEGER— avoids unbounded counter growth in very long-running sessions. processUserInputcatch block cleans up speech state — on stream error the pending text buffer is cleared and any in-progress speech is interrupted, so the agent does not get stuck in a broken state.- WebSocket close handler calls
cleanupOnDisconnect()— aborts LLM + TTS, clears queues, and rejects pending input promises.
Fixed
- Typo in JSDoc:
"Process text deltra"→"Process text delta".
[0.0.1] - 2025-07-14
Added
- Initial release.
- Streaming text generation via AI SDK
streamText. - Multi-step tool calling with
stopWhen. - Chunked streaming TTS with parallel generation and barge-in support.
- Audio transcription via AI SDK
experimental_transcribe. - WebSocket transport with full stream/tool/speech lifecycle events.
- Browser voice client example (
example/voice-client.html).