✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more
Carlos
  • Updated: November 12, 2025
  • 4 min read

Baidu Unveils Ernie 4.5 VL 28B: A Compact, Open‑Source Multimodal Reasoning Model

Unveiling Baidu’s Ernie 4.5 VL 28B: A Leap in Multimodal AI Reasoning

Illustration of Baidu's Ernie 4.5 VL 28B Model

In the ever-evolving landscape of artificial intelligence, Baidu has made a significant stride with the release of the Ernie 4.5 VL 28B model. This groundbreaking advancement in multimodal reasoning is set to redefine the capabilities of open-source AI, offering a compact yet powerful solution for understanding complex data across various modalities. As the tech world buzzes with excitement, let’s delve into the intricacies of this large language model and explore its potential impact on the industry.

Overview of Baidu’s Ernie 4.5 VL 28B Model

Baidu’s Ernie 4.5 VL 28B model is a vision-language model that focuses on enhancing document, chart, and video understanding. With a modest active parameter budget, this model is designed to deliver large model-level multimodal reasoning while operating with the efficiency of a 3B class model. This innovation is a part of Baidu’s ERNIE-4.5 open-source family, which aims to provide robust AI solutions accessible to a wide range of users.

Technical Details: Mixture-of-Experts and Multimodal Capabilities

At the core of the Ernie 4.5 VL 28B model lies the Mixture-of-Experts (MoE) architecture, a sophisticated design that enables efficient parameter activation. The model boasts approximately 30B total parameters, with only 3B active per token, thanks to the A3B routing scheme. This architecture allows the Ernie 4.5 VL 28B to maintain a compute and memory profile akin to a 3B class model while retaining a larger capacity pool for advanced reasoning tasks.

The model’s multimodal capabilities are further enhanced through a mid-training stage on a vast visual language reasoning corpus. This stage is crucial for improving representation power and achieving semantic alignment between visual and language modalities. The result is a model capable of tackling dense text in documents and intricate structures in charts with remarkable precision.

Performance Benchmarks and Comparisons

The Ernie 4.5 VL 28B model has demonstrated competitive or superior performance compared to industry benchmarks such as Qwen-2.5-VL-7B and Qwen-2.5-VL-32B. Despite utilizing fewer activation parameters, the model excels in visual reasoning, STEM reasoning, visual grounding, and video understanding. Its ability to “Think with Images” allows it to zoom into regions, reason on cropped views, and integrate local observations into a coherent final answer. Additionally, the model’s tool utilization feature extends its capabilities by enabling calls to external tools like image search when internal knowledge falls short.

Significance for Open-Source AI and Industry

The release of the Ernie 4.5 VL 28B model marks a significant milestone for open-source AI, offering a lightweight yet powerful solution for multimodal reasoning. Its deployment capabilities via transformers, vLLM, and FastDeploy make it an attractive option for commercial multimodal applications. Moreover, the model’s open-source nature under the Apache License 2.0 ensures that it is accessible to a broad audience, fostering innovation and collaboration within the AI community.

Quote/Analysis

According to Baidu researchers, the Ernie 4.5 VL 28B model is positioned as a practical solution for teams seeking efficient multimodal reasoning on documents, charts, and videos. Its Mixture-of-Experts architecture, coupled with advanced training techniques like GSPO and IcePop strategies, allows it to tackle complex reasoning tasks with ease. This model is a testament to Baidu’s commitment to advancing AI technology and making it accessible to a global audience.

Conclusion and Call-to-Action

In conclusion, Baidu’s Ernie 4.5 VL 28B model represents a significant advancement in the field of multimodal AI. Its innovative architecture, impressive performance benchmarks, and open-source accessibility make it a game-changer for the industry. As AI enthusiasts and tech-savvy professionals explore the potential of this model, it is essential to stay informed about the latest developments in AI trends. For more insights into AI trends and products, visit our AI trends news section and explore our AI platform products.

For further details on Baidu’s Ernie 4.5 VL 28B model, you can read the original article here.


Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Sign up for our newsletter

Stay up to date with the roadmap progress, announcements and exclusive discounts feel free to sign up with your email.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.