Welcome to Issue #17 of One Minute AI, your daily AI news companion. This issue discusses extended versions of Llama-3 released by Abacus.AI and Gradient.
Gradient releases the Llama-3 8B Gradient Instruct 1048k model
Gradient has released its Llama-3 8B Gradient Instruct 1048k model. This model extends Llama-3 8B's context length from 8k to > 1040K.
It was built on top of the EasyContext Blockwise RingAttention library to scalably and efficiently train on contexts up to 1048k tokens on Crusoe Energy's high-performance L40S cluster.
Abacus.AI presents its longer-necked variant of Llama-3 70B
Abacus.AI has released an extended version of Llama-3 70B with an effective context length of approximately 128k.
Their methodology for training uses PoSE and dynamic-NTK interpolation and has been trained using ~1B tokens on eight H100 GPUs with Deepspeed Zero Stage 3.
Want to help?
If you liked this issue, help spread the word and share One Minute AI with your peers and community.
You can also share feedback with us, as well as news from the AI world that you’d like to see featured by joining our chat on Substack.