Twitter
Advertisement

What is CoDi, Microsoft's AI to generate text, images, audio, videos all at once?

Microsoft's CoDi breaks down limits by becoming the first model able to analyse and generate several types of material simultaneously, leading to a satisfying result.

Latest News
What is CoDi, Microsoft's AI to generate text, images, audio, videos all at once?
FacebookTwitterWhatsappLinkedin

Microsoft has started an innovative move to build an all-at-once AI model called CoDi (Composable Diffusion) in an effort to increase the AI's potential. CoDi is set to revolutionise how we engage with computers and perceive our environment since it is capable of concurrently analysing and creating a variety of media types, including text, pictures, video, and audio.

Microsoft's CoDi breaks down limits by becoming the first model able to analyse and generate several types of material simultaneously, leading to a satisfying result. The invention of CoDi is based on a novel approach that creates a shared diverse space, allowing synchronised synthesis of related modalities like simultaneously synced video and audio.

This special capability decreases earlier worries about the consistency of independently created unimodal streams when combined. Latent diffusion models (LDMs) relevant to each format were first developed individually, resulting in excellent single-modality creation performance. The same conceptual framework was then projected onto these inputs, enabling the LDM of each mode to analyse any combination of simultaneous inputs.

READ | Amazon introduces new product customisation feature in India

The capacity of CoDi to cope with many-to-many generation techniques, constantly creating a variety of output methods, is a ground-breaking invention. CoDi accomplishes this difficult task without the need to spend time on all potential mode combinations by merging a cross-attention generator with an environment translator.

Capabilities of CoDi

CoDi showed off its skills by producing a synchronised visual and audio output by effectively fusing text, audio, and image instructions. This development shows CoDi's capability to combine data from many sources and provide cogent and aligned results.

The ground-breaking capabilities of CoDi open up a wide range of practical uses, particularly in accessible technology and education. It can provide dynamic, captivating content that supports various learning methods and offers affordable opportunities for those with limitations. CoDi is anticipated to greatly improve human-computer interaction, bringing in a new era of creative AI.

Find your daily dose of news & explainers in your WhatsApp. Stay updated, Stay informed-  Follow DNA on WhatsApp.
Advertisement

Live tv

Advertisement
Advertisement