Building an AI Coaching Service: The Key Ingredients
Part 3 in our series on MAIA's architecture
---
With AI tools becoming more accessible, more organizations are exploring AI-powered services like coaching bots, advisory tools, training apps. The technology holds a lot of promise. But what does it actually take to go from "we'll use AI" to a service that works reliably for real users?
Based on our experience building MAIA across five countries for the past 2.5 years, here are the core capabilities that a well-functioning AI coaching service needs, and a rough sense of how much of the work each one represents (based on lines of code, which is an interesting but imperfect measure).
1. Knowing What to Say — and How to Say It (~26%)
The largest piece of the puzzle is instructing the model. To be useful across a variety of use cases and contexts, you actually need a library of prompts, not just one, because a coaching service needs to behave differently depending on who it's talking to, what stage they're at, and what they need. A first-time user in Panama exploring formal registration needs different guidance than a returning user in Kenya refining their pricing strategy. This includes decision logic that routes a user's question to the right capability and prompt, if your chatbot will do more than just generic 'talk do your documents' advice. Should this trigger a web search? Generate a marketing image generation? Offer a diagnostic? Simply answer? This library is never "done." It evolves constantly as you learn what works.
2. Reliability and Delivery (~22%)
The gap between a demo and a product is reliability. Messages need to arrive once and only onc, even when services go down. When an upstream provider has an outage, the system needs to gracefully switch to another provider rather than go silent. When a user sends a voice note, it needs to be transcribed and processed. When two messages arrive at the same instant, they can't corrupt each other's state, and when users multi-message (common in WhatsApp) you need your tool to behave like a human and combine them, rather than behave like a chatbot and respond one at a atime. This is the plumbing. No one sees it, but you definitely notice when its not right, and it significantly impacts usage.
3. Proactive Coaching (~19%)
A service that only responds when spoken to isn't really coaching. Nearly a fifth of MAIA's codebase is dedicated to reaching out: personalized weekly outreach, smart follow-ups that review the recent conversation to decide whether a check-in is useful, engagement logic that adjusts frequency based on how active each user is, structured multi-week sprints, and onboarding sequences tailored to each country program. This entire subsystem exists independently of the regular response flow, and has to be built around the unique economics of WhatsApp.
4. Memory and Understanding (~11%)
AI models don't inherently remember users between conversations. Building persistent memory means extracting structured information from each interaction, particularly when chats span months and years as some of ours do. This includes things like what the user's business does, their financial situation, their marketing channels, their goals, and maintaining an evolving profile that gets better over time. Unlike generic chatbots which can struggle with coherent focused relevant memory, MAIA's business focus allows for a much smarter and more targeted memory. Moreover, MAIA goes beyond recall to a structured diagnostic that assesses users across seven business pillars (like sales, finance, operations, and digital tools), so that advice is calibrated to where someone actually is and what is their key next priority.
5. Context Assembly (~7%)
Even with a good model and good instructions, the quality of an AI response depends on what information it can see. Before every interaction, the system assembles a context window: the user's business profile, their diagnostic results, recent conversation history, any active coaching program, and relevant knowledge base references. Think of it like briefing a human coach before each session. The better the briefing, the better the session.
6. Conversation Handling (~10%)
Real users don't behave like demos. They send three messages in a row before the bot can respond. They reply to a messages out of order, from last week. They forward something from another chat. They go quiet for two months and come back. Handling all of this gracefully, like batching rapid messages, transcribing audio, threading quoted replies, managing session state, preventing the bot from talking to itself in edge cases, is another significant layer of work.
7. The AI Model Call (~7%)
Here is the part that most people picture when they think "AI service": sending a request to a language model and getting a response. It includes connecting to the model provider, parsing the response, and having fallback options when one provider is unavailable.
8. Localization and Cultural Fit (~5%)
Operating across countries isn't just translation. Coaching advice about formal business registration is completely different in Peru versus Kenya. The relevant government agencies, legal frameworks, financial products, and market conditions are all different. The system needs to resolve the right language, the right country context, and the right institutional references for each user, and allow per-country overrides without requiring a new deployment.
Conclusion
Each of these capabilities is essential, and they do overlap. Your proactive coaching engine needs your memory system to know who to reach out to. Your context assembly depends on your reliability layer to ensure profiles are current. Your instruction library depends on your localization system to serve the right version. The LLM is the engine, but a well-functioning service is the whole car: navigation, fuel system, chassis, dashboard. Each one is a product design decision that impacts user satisfaction, tool usage and ultimate impact.
