BloombergGPT, Fine Tuning & Moat building
A super interesting study as summarized by Ethan Mollick:
"Bloomberg trained a GPT-3.5 class AI on their own financial data last year… …only to find that GPT-4 8k, without specialized finance training, beat it on almost all finance tasks. Hard to beat the frontier models."
Many organizations are applying AI in International Development, and have sensible arguments for fine-tuning their own models. A particular country's language may be poorly represented in LLM training data, the application may be very sepcialized (eg agricultural extension), there are government rules around where data goes, and a fine-tuned open source model may look much more attractive when making financial projections out to millions and millions of users.
There are many valid reasons, but we should also be on guard for the risk of some motivated reasoning here: non-profits still have incentives to 'build a moat' to survive and thrive. An NGO working to be a leader in AI and _ will, just like many AI start-ups, not want to look like a thin GPT "wrapper", and one way to do that is to fine-tune your own model instead of using off-the-shelf models.
We are taking the opposite approach, and trying to be the thinnest wrapper possible that achieves significant impact for MSMEs. Much like unconditional cash transfers give an impact benchmark that more highly-engineered development interventions must beat, our results give an impact benchmark that proprietary models should beat, and which any institution can quickly implement on their own and achieve. Its important to avoid spending donor money to train another BloombergGPT.
----------
An aside on the other issues raised above:
COST: Yes its true that GPT4 is expensive now, GPT4V has a low rate limit, etc. But look at the prices of GPT 3.5 a year ago compared to now, and project prices forward the 12-18mos when a solution is fully tested out and optimized, before which it isn't ready for massive scale-up.
LANGUAGE: To figure out the right intervention model and UX we're starting with what is a very easy language (Spanish) that still gives access to millions and millions of low-income MSMEs needing help.
SPECIALIZATION: The services sector absorbs a massive amount of labor in low-income countries, with a generic production process that is relatively easy for LLMs to address, and persistent productivity gaps, so its the obvious place to start. But we will test how specalized a production process has to get before the simple approach of a frontier model supplemented with basic RAG doesn't outperform a less capable fine-tuned model, potentially in the coffee sector.