Voculos
Our goal is to create an agent that truly combines human curiosity with seamless automation. Lets go Baby!
Project Description
Our project addresses a real communication barrier faced by deaf and mute people: sign language is expressive and complete, but most of the population cannot understand it, creating social and professional exclusion.
The solution is a smart-glasses-based AI agent that uses computer vision to read sign language gestures in real time, interprets their meaning, and converts them into natural spoken voice. The core flow runs end-to-end as follows: a camera on the glasses captures hand movements, an AI vision model recognizes the signs, a language layer builds coherent sentences, and a voice agent generates audible speech that the wearer hears instantly.
Key demos include live sign-to-speech translation, continuous conversation without human interpreters, and real-time audio output, demonstrating a complete gesture-to-voice agent pipeline from input to output.
Technologies & Reproducibility
• Computer Vision: OpenCV, MediaPipe Hands
• AI / Models: Deep learning models for sign-language recognition and sentence construction
• Voice AI: ElevenLabs Text-to-Speech API
• Backend: N8N, FastAPI
• Input/Output: Camera (smart-glasses or webcam), headphones or speaker
Team
Products & Tools
Additional Links
Pitch Dech Voculos
documentation