Grounding GPT 4o Responses with Perplexity

Bailey Klinger
Jan 24, 2025By Bailey Klinger

One interesting thing we've noticed is that often, someone with very little online experience actually can have an easier time getting started with an LLM than someone with some technical experience, particularly when the front-end is WhatsApp. A user with no technical experience talks to the LLM as if it were a person, which is a good way to get started. But someone with some technical experience usually talks to the LLM as if it were a Google search, which is a good way to get disappointed. LLM's have a knowledge cutoff date and can hallucinate very specific details like addresses and phone numbers.

Given MAIA is the on-ramp to AI for most of our users, we've seen it as our job to teach users the difference, often through experience, akin to how we've developed through experience knowing when something is a ChatGPT question vs. a Perplexity question. But teaching this is hard, and loses us a lot of first-time users. That's why ChatGPT has integrated search, and we've been working for a while to do the same.

Seems like it should be easy, but its taken some work, at least for us. The search capability of ChatGPT isn't available when using their models via the API, seems to be for legal reasons. As discussed in that same thread, you can recreate it yourself with function calling, but it seems to me that this not only requires a lot of work but also would give pretty weak performance at reasonable levels of complexity. Google seems to have an out of the box option for grounding completions by Gemini with google search, built in with their API, but not OpenAI.

We decided on giving our microenterprise users the same workflow we use ourselves: GPT 4o + Perplexity. The first attempt at this was to add instructions to the main model to trigger background research by Perplexity if the situation called for it. But this didn't work well- as with other experiments, trying to get our main model prompt to do too many things degrades its performance at the key thing its supposed to do best. So instead, we added a first check of the user message and chat history with GPT 4o mini, its only job is to determine if additional context from Perplexity is needed.

The cases where it is needed come from real user chats over the past year where the LLM either can't give the user the information our users want, or in some cases, hallucinated. This is a combination of things that are after the knowledge cutoff date (eg today's spot price for coffee, the largest POS providers in Peru this year), and things too specific to be reliably reterived by an LLM from its training data (eg the address and phone number of a small supplier in a midsized Peruvian town). 

In most cases, that mini model determines that additional context isn't needed and the user message continues to a response from the main LLM. When additional context is needed, that minimodel generates a self-contained research question, which we then submit to Perplexity via their API, and provide the result as additional context for the responding LLM.

Seems simple enough, but its taking a lot of adjusting to get right, and we're still not quite there. This is for two main reasons:

1) Compared to adjusting one model prompt that answers user questions, adjusting a prompt in a chain of models is trickier. Things just seem to cascade and amplify when a chain of multiple models are talking to each other, this is probably why agents are so hard to get right. Here you need to get your mini model to trigger in the right situation, and formulate the right kind of question to Perplexity. You need your Perplexity prompt to get the key information for the main model and only that. And you need your main model prompt to use the right information in the right way and ignore the rest. 

2) The context of Peruvian microenterprises is a small small corner of GPT 4o's training data, but you can prettly reliably limit the models' responses to the relevant corner of its galaxy brain. But, in my experience, this can't be as reliably done for LLMs incorporating realtime info from the web. For example, our main model prompt will do a great job giving advice relevant to a typical small neighborhood restaurant in Peru. But a model responding with grounded search about average 'menu' prices, even when prompted explicitly, can't limit itself to what's relevant for small neighborhood restaurants in Peru. It usually can't get there at all, and is swamped by what makes up 99% of search results about Peruvian restaurant prices: tourist restaurants. This is the case for what seem to be the best of the best- both Perplexity and Gemini Grounded in Google Search. They aren't great at staying relevant for microenterprises in devleoping countries. 

This is still a work in progress, because its tough to get all these dials tuned just right, but it is improving the user experience in our tests and getting better with each adjustment. If anyone has any other experiences grounding LLM responses in search in a similar context and would like to swap notes, please do reach out to me at [email protected]

Here are the model instructions for those first two steps (our main model prompt can be seen here, the results from perplexity are added to the context field in [[[]]] there).


Mini model evaluating the message:

   Un LLM va a responder al siguiente mensaje de un microempresario peruano. Pero primero, tu trabajo es determinar si ese LLM necesita información específica o actualizada de Internet para responder al mensaje.

    La mayoría de los mensajes no necesitan búsqueda web, ya que el LLM es capaz. Pero los casos en que se necesita información específica o actualizada de una búsqueda web son:

      - Si el usuario necesita una dirección específica, número de teléfono o dirección de sitio web
      - Si el usuario necesita una empresa en particular para software, insumos, servicios, etc., que podrían estar desactualizados en un LLM
      - Si el usuario pregunta sobre precios, volúmenes o valores actuales que podrían estar desactualizados en un LLM
      - Cualquier otra cosa que se beneficie de la información en tiempo real

    Tu única respuesta debe ser 'LLM' si no se necesita búsqueda web.

    Si se necesita una búsqueda web, debes responder 'websearch' seguido de una pregunta de investigación en línea clara, concisa y autónoma para obtener la información útil. Esta no es necesariamente la pregunta del usuario; esta es una pregunta para obtener la información actualizada de Internet, que el LLM luego usará para responder la pregunta del usuario. Debes formular esta pregunta de investigación basándote no solo en la pregunta del usuario, sino también en los mensajes anteriores como contexto, incluidos los detalles sobre la actividad comercial del usuario. Esta pregunta de investigación debe contener toda la información para que el motor de búsqueda recopile la información necesaria. Escribe tu pregunta de investigación en español.

    Aquí está el mensaje del usuario encerrado entre [[[ ]]]: [[[ @contact.message ]]]

    Aquí está el historial de chat del usuario, un unparsed thread de OpenAI, encerrado entre [[]]: [[ @getthread_response ]]

------------------

The request to Perplexity:


     {
        "model": "sonar",
        "messages": [
          {
            "role": "system",
            "content": "Eres un motor de búsqueda para otro LLM que responderá a una pregunta de una microempresa peruana. Ese LLM es tu usuario, y necesita contexto actualizado y completo de Internet para responder a la pregunta de la microempresa, por lo que te ha dado esta pregunta de investigación. Responde a esta pregunta de investigación para el LLM y proporciona un contexto útil de manera eficiente, incluye enlaces directamente en el texto. Recuerda que esto es para una microempresa peruana, por lo que las preguntas sobre cosas como proveedores deben considerar empresas que estén basadas en Perú o que indiquen explícitamente en sus sitios web que venden a Perú. Recuerda, el LLM responderá a la pregunta de la microempresa, tú eres solo el motor de búsqueda que le da contexto a ese LLM."
          },
          {
            "role": "user",
            "content": "Pregunta de investigación: @contact.message"
          }
        ]      
      }