Google Introduces PaliGemma, A New Visual Language Model |
Written by Kay Ewbank | |||
Monday, 20 May 2024 | |||
Last week's Google I/O saw the introduction of PaliGemma, an open vision-language model (VLM), together with some details of what's coming in Gemma 2. Gemma is Google's lightweight open models that have been built from the same research and technology used to create Google's Gemini models. The existing models in Gemma are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.
PaliGemma is described as a powerful open VLM inspired by PaLI-3. The language is built on open components, including the SigLIP vision model and the Gemma language model, and is designed a wide range of vision-language tasks, including image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation. Google is providing both pre-trained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration. PaliGemma is being released for various platforms and resources, with free options including Kaggle and Colab notebooks. Academic researchers seeking to push the boundaries of vision-language research can also apply for Google Cloud credits to support their work. The language joins CodeGemma and RecurrentGemma which were released earlier in the year. Google also shared some information about Gemma 2, a 27B parameter instance that outperforms models twice its size and runs on a single TPUv5e. Gemma 2 will be available in new sizes and is based on a new architecture. Google says Gemma 2 will deliver performance comparable to Llama 3 70B at less than half the size. The updated design will mean it can fit on less than half the compute of comparable models, with the 27B model optimized to run on NVIDIA's GPUs. Google also says Gemma 2 will provide developers with "robust tuning capabilities across a diverse ecosystem of platforms and tools". Google PaliGemma is available now. More InformationRelated ArticlesGoogle Releases Gemma Open Models Google Rebrands Bard With Subscription Google Adds Code Generation To Bard To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Monday, 20 May 2024 ) |