Created on 22nd June 2025
•
Short Pitch: Agentic Assistant for my Grandmother and India's 90%, using Gemini + Sarvam AI to talk back in their language and script, in voice as well, even figuring out PDFs and images so they can understand complex stuff like legal docs or what their medicine is for, or how many runs did Virat Kohli score, translating or synthesizing audio for a language you do not speek, all through an interface they already know well, WhatsApp.
A vast majority of Indians, including many elderly citizens and those navigating diverse linguistic landscapes, face significant hurdles in accessing and comprehending critical information. Essential details embedded in legal documents, medical prescriptions, financial agreements, or government communications often remain opaque due to complex language, unfamiliar jargon (frequently in English), or the sheer density of the text. This "understanding gap" means individuals can struggle to make informed decisions about their health, finances, and legal rights, potentially leading to confusion, missed opportunities, medication errors, or an inability to access entitled benefits. The difficulty is compounded when information is primarily available through digital interfaces that are not intuitive or accessible for everyone, especially those with limited digital literacy or those who are more comfortable with voice and their native language. Our assistant directly tackles this widespread challenge of making vital information truly understandable and actionable for everyone, regardless of their language or technical comfort.
Building this agentic assistant involved navigating several technical hurdles. Here are some of the key challenges and how I approached them:
WhatsApp Business API Integration Hurdle:
Developing a Robust Webhook Server & Handling Media/Text Constraints:
save
function for Text-to-Speech (TTS) audio output was not working as expected.State Management Complexities with LangGraph:
I had initially planned for this to be a sort of like a General Purpose Executive Assistant as well, which manages calendar events, looks at mails, but integration with Google OAuth and Google Library APIs proved tough, and I plan on doing it later, properly taking it into consideration from the start.
I had used Google's Langgraph Quickstart Repo as base for this, but the Whatsapp Webhook Server, SarvamAI based TTS/STT/Language Recognition/Translation Agents, A Task Planning Agent, all were added during the Hackathon. No Progress was done before the Hackathon, apart from using the Langgraph Quickstart Repo.
Tracks Applied (3)
Sarvam.ai
Amazon Web Services
Google Cloud Platform
Cheering for a project means supporting a project you like with as little as 0.0025 ETH. Right now, you can Cheer using ETH on Arbitrum, Optimism and Base.