Welcome to Issue #77 of One Minute AI, your daily AI news companion. This issue discusses a recent announcement from NVIDIA.
Introducing ChatQA-2
NVIDIA's new model, Llama3-ChatQA-2-70B, rivals GPT-4-Turbo with its ability to handle long contexts of up to 128,000 tokens and excel in retrieval-augmented generation (RAG) tasks. Based on Meta's Llama3, the model demonstrates competitive performance in long-context, medium-length, and short-context tasks. The researchers extended the model's context window from 8,000 to 128,000 tokens using a two-step approach, including continued pre-training and a three-stage instruction tuning process. Llama3-ChatQA-2-70B outperformed many state-of-the-art models, showing significant advancements in open-source language models.
In evaluations, Llama3-ChatQA-2-70B scored higher than GPT-4-Turbo on InfiniteBench long-context tasks with an average score of 34.11 compared to GPT-4-Turbo's 33.16. For medium-length tasks within 32,000 tokens, it scored 47.37, slightly lower than GPT-4-Turbo’s 51.93, and for short-context tasks within 4,000 tokens, it achieved 54.81, outperforming GPT-4-Turbo. This development highlights the potential of open-source models to match the capabilities of proprietary ones, contributing valuable technical recipes and evaluation benchmarks to the community.
Want to help?
If you liked this issue, help spread the word and share One Minute AI with your peers and community.
You can also share feedback with us, as well as news from the AI world that you’d like to see featured by joining our chat on Substack.