DeepSeek launches Janus-Pro-7B model, outperforms OpenAI’s DALL-E 3 and Stable Diffusion
While everyone’s focused on DeepSeek’s R1 model, the Chinese AI startup has just unveiled another open-source AI model: Janus-Pro-7B. This multimodal model, capable of generating images, outperforms OpenAI’s DALL-E 3 and Stable Diffusion across the GenEval and DPG-Bench benchmarks. And it’s all happening alongside the growing hype surrounding the R1.
The news of the launch comes as the Chinese AI startup triggered a massive sell-off in the U.S. stock market, wiping out a staggering $1 trillion in market capitalization, as investors question the sustainability of AI chip spending.
DeepSeek describes Janus-Pro as an innovative autoregressive framework that integrates multimodal understanding and generation. It overcomes the limitations of previous models by separating the visual encoding into distinct pathways, yet still relies on a single unified transformer architecture for processing.
“Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base,” DeepSeek said on its launch page.
This decoupling approach not only resolves the conflict between the visual encoder’s roles in understanding and generation but also boosts the model’s flexibility. Janus-Pro goes beyond the performance of previous unified models and matches or even outperforms task-specific models. With its simplicity, flexibility, and effectiveness, Janus-Pro stands out as a leading contender for next-generation multimodal models.
Built on DeepSeek’s LLM architecture (DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base), Janus-Pro is a unified multimodal learning model (MLLM) that separates visual encoding for both understanding and generation. For multimodal understanding, it utilizes the SigLIP-L vision encoder, capable of processing 384 x 384 image inputs. For image generation, Janus-Pro uses a specialized tokenizer with a downsample rate of 16.
Janus-Pro is licensed under the MIT License, with usage governed by the DeepSeek Model License.
Just last week, DeepSeek launched DeepSeek-R1, a reasoning model positioned as a compelling alternative to OpenAI’s o1 model. This open-source option is gaining traction among developers for its affordability and performance on key benchmarks.