HomeBankingAlibaba Cloud launches open-source vision language model

Alibaba Cloud launches open-source vision language model

Related stories

Former Klarna UK Chief Alex Marsh Named CEO of Salad Group

Fintech veteran Alex Marsh takes the helm at Salad...

Ent Credit Union Partners with Lumin Digital to Boost Online Banking Ahead of 2026 Merger

Colorado-based Ent Credit Union collaborates with Lumin Digital to...

FCA Partners with Raidiam to Accelerate Open Finance Testing in the UK

The UK’s Financial Conduct Authority (FCA) has joined forces...

Charity Bank Partners with Sandstone Technology to Launch New Savings App

Ethical lender Charity Bank teams up with Sandstone Technology...

Routefusion Raises $26.5M Series A to Expand Global Payments Infrastructure

Cross-border payments platform Routefusion secures $26.5 million in Series...

Alibaba Cloud has introduced two open-source large vision language models (LVLM): Qwen-VL and Qwen-VL-Chat. These models can understand images, texts, and prompts, enabling multi-round question answering in English and Chinese. Alibaba Cloud aims to democratize AI technology by sharing the models with the open-source community and commercial institutions.

Facts

  • Alibaba Cloud launches open-source vision language models (LVLM), Qwen-VL and Qwen-VL-Chat.
  • Qwen-VL is a multimodal model capable of understanding both image inputs and text prompts in English and Chinese, performing tasks like open-ended queries and generating image captions.
  • Qwen-VL-Chat enables complex interactions, such as comparing multiple images and engaging in multi-round question answering, showcasing creative capabilities.
  • Alibaba Cloud has shared the model’s code, weights, and documentation with academics, researchers, and commercial institutions worldwide to democratize AI technologies.
  • Models are accessible via Alibaba’s AI model community ModelScope and Hugging Face for commercial use.
  • These models have the potential to revolutionize interactions with visual content, aiding visually impaired individuals during online shopping.
  • Qwen-VL handles image input at a resolution of 448×448, resulting in better image recognition and comprehension.
  • Qwen-VL recorded outstanding performances on various visual language tasks and benchmarks.
  • Qwen-VL-Chat achieved leading results in text-image dialogue and alignment with humans.
  • Alibaba Cloud previously open-sourced Qwen-7B and Qwen-7B-Chat, two 7-billion-parameter LLMs, with over 400,000 downloads within a month of their launch.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories

Exit mobile version