Alibaba launches AI models that understand images and have more complex conversations
The artificial intelligence (AI) space is heating up. Just yesterday, South Korea’s Naver announced the launch of HyperClova X, a new generative AI service to compete with ChatGPT. Now, China’s internet giant is unveiling two open-source AI models that can understand images and have more complex conversations.
On Friday, Alibaba unveiled new AI models designed to comprehend images and engage in more intricate conversations compared to their previous offerings. This release comes at a time of intense global competition for technological leadership.
The Chinese tech powerhouse stated that their two novel models, called Qwen-VL and Qwen-VL-Chat, will be made available as open-source tools, meaning that researchers, educators, and businesses around the world can use these models to develop their own AI applications without the necessity of training their individual systems. This approach not only conserves time but also reduces costs significantly.
The news comes just a month after Alibaba launched Tongyi Wanxiang, an AI image-generation tool that competes with OpenAI’s DALL-E & Midjourney. Tongyi Wanxiang, launched by Alibaba’s cloud division, allows users to input text prompts in either Chinese or English, and the AI tool generates corresponding images in various styles, such as sketches or 3D cartoons. Currently, the tool is available for beta testing exclusively to enterprise customers in China.
The two new AI language models were also developed by the company’s cloud unit, Alibaba Cloud. According to reports, the tech giant said that Qwen-VL was designed to be the advanced evolution of its 7-billion-parameter model, Tongyi Qianwen. This dynamic model showcases a remarkable capability to effortlessly handle both images and text prompts. Its versatility extends from effectively answering wide-ranging questions related to various images to creating captivating captions for those images.
Alibaba also added that Qwen-VL can perform multiple tasks at the same time. Not only can it answer open-ended questions related to various images but it can also craft captions for those pictures.
But the real star of the show is Qwen-VL-Chat. This AI handles more intricate interactions, like comparing multiple images and handling rounds of questioning. It’s not stopping there—Alibaba boasts that it can spin stories, conjure images based on user-submitted photos, and even solve math problems presented in pictures.
A cool example they gave involves a hospital sign in Chinese. Qwen-VL-Chat can decode it and give the scoop on where different hospital departments are located.
Meanwhile, much of current AI’s “genius” has typically been about text. But times are changing. Qwen-VL-Chat and the latest version of OpenAI’s ChatGPT are shaking things up, responding to images with text in a way that’s pretty impressive. It’s like AI’s learning to speak a new visual language!