The recent release of OpenAI's ChatGPT has transformed the landscape, with justifiable enthusiasm. Language models packaged for public consumption represent a significant milestone.
However, these models demonstrate limited performance with low-resourced languages. Organizations like Masakhane have repeatedly documented that "many multilingual language models are trained on dangerous, offensive and frankly garbage data," with poor performance across African languages specifically.
We tested ChatGPT's Zulu capabilities with mixed results and proceeded to benchmark Africa-centric models against GPT3.5 across two NLP tasks.
Named Entity Recognition
This task extracts entities like names, places, and dates from text — valuable for conversational AI and social media analysis. The MasakhaNER 2.0 dataset covers 21 African languages and four entity categories.
Three Africa-centric models were compared: AfroXLMR-Base, AfroXLMR-Large, and AfroLM. Results on Zulu benchmarks showed AfroXLMR-Large achieving an F1 score of 91.5, outperforming GPT3.5's score of 87. Even AfroLM, using only 236 million parameters versus GPT3.5's 175 billion, remained competitive at 86.3.
Machine Translation
This challenging task showed significant disparities. ChatGPT struggled dramatically with Zulu translation. Comparisons revealed that GPT3.5 scored 0 BLEU for both English-to-Zulu and Zulu-to-English, while M2M100 scored 21.2 and 38.0 respectively.
GPT3.5 generated completely unrelated content. When asked to translate "ngibona ukuthi kuningi okungenziwa ukusiza abantu kulesi sikhungo," it produced "i skipped around trying to use as much of the plant material as i could" rather than the correct "i feel like there could be more done to help people within this industry."
In another example, GPT3.5 mysteriously transformed Tanzania into South Africa mid-translation, producing nonsensical output despite clear context.
Summary
While OpenAI's models demonstrate impressive capabilities generally, they underperform specialized Africa-centric and multilingual models on both straightforward and complex linguistic tasks. This demonstrates the substantial value of context-specific AI development.
