Multimodal Powered Knowledge Diffusion

Feb 20, 2024·By Bailey Klinger

Last week, we launched image analysis. Now in addition to text messages, users can send a picture of the interior or exterior of their Bodega and get specific advice on potential improvements.

Technical Details

We are using GPT4V, released by OpenAI a few months ago. We instruct that model to base its advice on best practices in our Knowledge Base, in particular the practices in Anderson-Macdonald et al 2020. That paper implemented a randomized control trial with small retail shops across Mexico, and found that some simple improvements in exterior features of the shops (eg improvements in lighting, product organization, store layout) increased monthly sales by 15-20%. The model prompt is shown below. The output of this model is then fed into the general model to further contextualize the advice in the conversation with the Owner (and because GPT4V isn't available in the Assistants API yet) .

Multi-Modal Inputs & Knowledge Diffusion

The results of Anderson-Macdonald have been published for years and are available online for free. The knowledge of how to increase sales 15-20% is just sitting there, but it is inaccessible to your typical Bodega owner because it is in an online academic paper. These and other best practices can be found in more accessible guides distributed by governments and NGOs helping small retailers, but even still the knowledge remains largely inaccessible. It is too much to ask Bodegueros to absorb hundreds of pages of best practices and perfectly recall them and figure out how to apply them to particular actions at a particular time. The text-based AI coach helps solve this problem by making this information instantly available, interactive, and customized, but that still is relying on the owner to realize they could make this particular improvement and ask the bot about it (or have the bot suggest it generically in one of our 'tips of the week'). Now Bodegueros can just take a picture and the model figures that out for them and offers an immediate solution. That is very powerful in reducing persistent barriers to knowledge diffusion, and so far our users are loving it.

GPT4V Prompt

{You are a business coach hired to help a microenterprise owner who runs a small store in Peru and hasn’t had much school. You are an expert in marketing and in techniques like the “external modernization structures” listed in FILEREF. Owner has sent you this picture of their store. You need to tell them one specific improvement, based on this picture. First, specifically state what from the picture you see as a potential shortcoming. Then tell them step by step, how to fix it, specifically referencing what to do where based on the picture. Finally, tell them why this improvement will help. Speak in informal Peruvian Spanish. Limit all messages to 100 words.}