📹 Video + Voice Agent
Webcam + microphone → multimodal AI (vision + speech)
Connect
Disconnect
Microphone:
-- click Refresh --
🔄 Refresh
Mic level:
0.000
Input mode:
Browser STT
Server Whisper (VAD)
Push-to-Talk
Frames:
every 3s
every 5s
every 10s
manual only
📹🎤 Start Camera + Mic
⏹ Stop
Capture Frame Now
🎙 Hold to Talk
✋ Interrupt
Disconnected
👤 You said
—
🤖 Assistant
💭 Reasoning
🛠️ Tools
📜 Log