Issue #30: Vision model added to Phi-3 family

Microsoft introduces Phi-3-vision at Microsoft Build

May 22, 2024

Welcome to Issue #30 of One Minute AI, your daily AI news companion. This issue discusses the new Phi-3-vision model announced by Microsoft.

A smartphone screen displaying a chat window with an analysis of a chart. The chat window shows a conversation where the phone's camera has captured an image of a chart, and the analysis text appears below the chart image. The background is a clean, modern design, emphasizing the technology. The phone is held in a hand, with a visible interface showing the captured chart and its analysis clearly.

Microsoft introduces new vision model to Phi-3

At this year's Microsoft Build, Microsoft unveiled the Phi-3-vision, the first multimodal model in the Phi-3 series. This innovative model integrates text and images, enabling it to analyze real-world images and extract and interpret text from them. With 4.2 billion parameters and a context length of 128,000 tokens, it is designed for broad commercial and research usage in English.

The Phi-3-vision model supports general-purpose AI systems and applications that require both visual and text input capabilities, especially in environments with

memory and compute constraints
scenarios demanding low latency
general image understanding
optical character recognition (OCR)
understanding of charts and tables.

Try out the model

Want to help?

If you liked this issue, help spread the word and share One Minute AI with your peers and community.

Share One Minute AI

You can also share feedback with us, as well as news from the AI world that you’d like to see featured by joining our chat on Substack.

Join Team One Minute AI’s subscriber chat

Available in the Substack app and on web

One Minute AI

Discussion about this post

Ready for more?