Hi everyone!
I’m developing Monster Bot, a professional AI-driven assistant designed for pediatric speech therapy. The goal is to create an interactive "companion" that helps children practice their language skills in a fun, clinical environment.
The Tech Stack:
Hardware: Raspberry Pi 5 (8GB).
Edge Processing (Node.js): I'm using the Picovoice SDK (Porcupine, Cheetah, and Orca) for local, real-time voice processing. This ensures 100% privacy for the children's biometric data as no raw audio ever leaves the device.
The "Brain" (Hybrid AI): The transcribed text is sent via a secure API to my backend (Node.js). The core is an LLM trained on a specialized medical dataset focused on speech-language pathology (SLP) protocols. It's not just a chatbot; it's designed to recognize specific phonetic patterns and provide therapeutically sound feedback.
The Challenge:
Audio Hardware for Clinical Use Since the children might have articulation disorders or very soft voices, I need the "ears" of the bot to be extremely sensitive and the "mouth" to be crystal clear.
Current Setup:
LinQ USB gooseneck microphone (Li-M55).
USB-powered external speakers (Tekone) connected via 3.5mm jack.
I would love your expert advice on:
Audio Interface:
Since the Pi 5 lacks an onboard jack, I'm currently using a basic USB adapter. Would an I2S DAC HAT (like IQaudio or HiFiBerry) significantly reduce floor noise and improve TTS clarity for speech exercises?
Audio Buffering in Node.js:
Has anyone experienced latency spikes when streaming audio frames from pvrecorder to a remote API while simultaneously running a local STT engine?
Hardware AEC (Acoustic Echo Cancellation):
Is there a recommended hardware solution or a specific PipeWire configuration to prevent the bot from "hearing itself" without cutting off the start of the child's response?
Any tips on hardware integration for a "speech-heavy" application would be invaluable!
Thank you
Roberto
I’m developing Monster Bot, a professional AI-driven assistant designed for pediatric speech therapy. The goal is to create an interactive "companion" that helps children practice their language skills in a fun, clinical environment.
The Tech Stack:
Hardware: Raspberry Pi 5 (8GB).
Edge Processing (Node.js): I'm using the Picovoice SDK (Porcupine, Cheetah, and Orca) for local, real-time voice processing. This ensures 100% privacy for the children's biometric data as no raw audio ever leaves the device.
The "Brain" (Hybrid AI): The transcribed text is sent via a secure API to my backend (Node.js). The core is an LLM trained on a specialized medical dataset focused on speech-language pathology (SLP) protocols. It's not just a chatbot; it's designed to recognize specific phonetic patterns and provide therapeutically sound feedback.
The Challenge:
Audio Hardware for Clinical Use Since the children might have articulation disorders or very soft voices, I need the "ears" of the bot to be extremely sensitive and the "mouth" to be crystal clear.
Current Setup:
LinQ USB gooseneck microphone (Li-M55).
USB-powered external speakers (Tekone) connected via 3.5mm jack.
I would love your expert advice on:
Audio Interface:
Since the Pi 5 lacks an onboard jack, I'm currently using a basic USB adapter. Would an I2S DAC HAT (like IQaudio or HiFiBerry) significantly reduce floor noise and improve TTS clarity for speech exercises?
Audio Buffering in Node.js:
Has anyone experienced latency spikes when streaming audio frames from pvrecorder to a remote API while simultaneously running a local STT engine?
Hardware AEC (Acoustic Echo Cancellation):
Is there a recommended hardware solution or a specific PipeWire configuration to prevent the bot from "hearing itself" without cutting off the start of the child's response?
Any tips on hardware integration for a "speech-heavy" application would be invaluable!
Thank you
Roberto
Statistics: Posted by DeveloPress — Fri Jan 16, 2026 8:34 pm