Sarathi

Sarathi

Making sense of anything for India's 90%

Created on 22nd June 2025

Sarathi

Sarathi

Making sense of anything for India's 90%

The problem Sarathi solves

Short Pitch: Agentic Assistant for my Grandmother and India's 90%, using Gemini + Sarvam AI to talk back in their language and script, in voice as well, even figuring out PDFs and images so they can understand complex stuff like legal docs or what their medicine is for, or how many runs did Virat Kohli score, translating or synthesizing audio for a language you do not speek, all through an interface they already know well, WhatsApp.

A vast majority of Indians, including many elderly citizens and those navigating diverse linguistic landscapes, face significant hurdles in accessing and comprehending critical information. Essential details embedded in legal documents, medical prescriptions, financial agreements, or government communications often remain opaque due to complex language, unfamiliar jargon (frequently in English), or the sheer density of the text. This "understanding gap" means individuals can struggle to make informed decisions about their health, finances, and legal rights, potentially leading to confusion, missed opportunities, medication errors, or an inability to access entitled benefits. The difficulty is compounded when information is primarily available through digital interfaces that are not intuitive or accessible for everyone, especially those with limited digital literacy or those who are more comfortable with voice and their native language. Our assistant directly tackles this widespread challenge of making vital information truly understandable and actionable for everyone, regardless of their language or technical comfort.

Challenges I ran into

Challenges I Ran Into

Building this agentic assistant involved navigating several technical hurdles. Here are some of the key challenges and how I approached them:

  1. WhatsApp Business API Integration Hurdle:

    • The Challenge: My initial plan was to use a dedicated Twilio number registered with the WhatsApp Business API. However, my application for WhatsApp Business status was unfortunately rejected. This felt like a major roadblock, potentially halting the project's core interface.
    • The Solution/Workaround: Instead of abandoning the WhatsApp channel, I pivoted to using the Twilio Sandbox for WhatsApp. While this came with limitations – such as Twilio branding, the inability to proactively message users (they must initiate the chat), and a slower message throughput (approximately 1 message every 3 seconds) – it allowed me to proceed with development and demonstrate the concept. I was happy to adapt and work within these constraints to bring the idea to life.
  2. Developing a Robust Webhook Server & Handling Media/Text Constraints:

    • The Challenge: Creating a reliable webhook server capable of managing asynchronous responses, maintaining chat history, and correctly processing various media types (audio, images, PDFs) presented several difficulties.
      • Audio Processing: WhatsApp has specific requirements for audio files (OGG format with the Opus codec). Ensuring incoming audio was correctly processed and outgoing TTS audio met these specifications was crucial.
      • Character Limits & Chunking: Both WhatsApp and the Sarvam AI platform have character limits for messages. This necessitated implementing a chunking mechanism to break down longer pieces of text (from document analysis or AI responses) into manageable segments for delivery.
      • Sarvam AI Library Bug: I encountered a bug in the Sarvam AI library where the built-in

        save

        function for Text-to-Speech (TTS) audio output was not working as expected.
    • The Solution/Workaround: I addressed these by carefully designing the webhook logic. For the Sarvam AI TTS issue, I had to replace the problematic library function with a custom implementation to ensure audio files were saved and could be sent correctly.
  3. State Management Complexities with LangGraph:

    • The Challenge: Managing state effectively within LangGraph proved to be tricky and led to several intermittent bugs.
      • Sometimes the system would fail to send the generated audio response.
      • On other occasions, the response would not be delivered in the sender's detected language, despite that being a core feature.
      • There were also instances where the webhook had trouble reliably receiving attached files (like PDFs or images) from the user.
    • The Solution/Workaround: This required careful debugging and refinement of the LangGraph flows. Through persistent troubleshooting and adjustments to the state management logic, I was able to resolve most of these issues, ensuring more consistent and reliable behavior.

Things not done

I had initially planned for this to be a sort of like a General Purpose Executive Assistant as well, which manages calendar events, looks at mails, but integration with Google OAuth and Google Library APIs proved tough, and I plan on doing it later, properly taking it into consideration from the start.

Progress made before hackathon

I had used Google's Langgraph Quickstart Repo as base for this, but the Whatsapp Webhook Server, SarvamAI based TTS/STT/Language Recognition/Translation Agents, A Task Planning Agent, all were added during the Hackathon. No Progress was done before the Hackathon, apart from using the Langgraph Quickstart Repo.

Tracks Applied (3)

Sarvam AI Track

Used Sarvam AI's TTS/STT/Language Recognition/Translation APIs.
Sarvam.ai

Sarvam.ai

HealthTech: The Vernacular Health Record Navigator

solves this track, allows people to understand Health Records in their Vernacular Language, as well as other documents.
Amazon Web Services

Amazon Web Services

Google Cloud Platform Usage

Uses Gemini 2 Flash & 2.5 Flash as the primary LLM, for reasoning, planning etc.
Google Cloud Platform

Google Cloud Platform

Technologies used

Cheer Project

Cheering for a project means supporting a project you like with as little as 0.0025 ETH. Right now, you can Cheer using ETH on Arbitrum, Optimism and Base.

Discussion

Builders also viewed

See more projects on Devfolio