A brief history of the proliferation, the technology at the heart of modern imaging AI • InNewCL
A brief history of the proliferation, the technology at the heart of modern imaging AI • InNewCL
#history #proliferation #technology #heart #modern #imaging #InNewCL Welcome to InNewCL, here is the new story we have for you today:
Click Me To View Restricted Videos
Text-to-image AI has exploded this year as technological advances have vastly improved the fidelity of art that AI systems could create. As controversial as systems like Stable Diffusion and OpenAI’s DALL-E 2 are, platforms like DeviantArt and Canva have adopted them to power creative tools, personalize branding, and even develop new products.
But the technology at the heart of these systems can create far more than art. Called diffusion, it’s used by some intrepid research groups to produce music, synthesize DNA sequences, and even discover new drugs.
So what exactly is diffusion and why is it such a huge leap from the prior art? Towards the end of the year, it’s worth taking a look at the origins of Diffusion and how it evolved over time into the influential force it is today. The story of Diffusion is not over – the techniques are refined every month – but the last year or two has seen particularly notable advances.
The Birth of Diffusion
You may remember the trend of deepfaking apps a few years ago – apps that inserted portraits of people into existing images and videos to create realistic-looking substitutions of the original subjects in that target content. Using AI, the apps would “insert” a person’s face — or in some cases their entire body — into a scene, often convincing enough to fool someone at first glance.
Most of these apps relied on an AI technology called Generative Adversarial Networks, or GANs for short. GANs consist of two parts: a generator that creates synthetic samples (e.g. images) from random data, and a discriminator that tries to distinguish between the synthetic samples and real samples from a training data set. (Typical GAN training datasets consist of hundreds to millions of examples of things the GAN is expected to eventually capture.) Both the generator and the discriminator improve their respective abilities until the discriminator is no better at distinguishing the real examples from the synthesized examples the 50% accuracy expected from chance.
Harry Potter and Hogwarts sand sculptures generated by Stable Diffusion. Image credit: Stability AI
For example, powerful GANs can take snapshots of fictitious apartment buildings. StyleGAN, a system Nvidia developed a few years ago, can generate high-resolution headshots of fictional people by learning attributes such as facial pose, freckles, and hair. Beyond image generation, GANs have been applied to 3D modeling space and vector sketches, showing the ability to output video clips as well as voice and even loop instrument samples into songs.
In practice, however, GANs suffered from a number of shortcomings due to their architecture. Simultaneous training of generator and discriminator models was inherently unstable; Sometimes the generator would “collapse” and output many similar looking samples. GANs also required lots of data and processing power to run and train, making them difficult to scale.
Enter the diffusion.
How diffusion works
Diffusion was inspired by physics – the process in physics by which something moves from a region of higher concentration to one of lower concentration, like a sugar cube dissolving in coffee. Grains of sugar in coffee initially concentrate at the top of the liquid but gradually disperse.
Diffusional systems borrow specifically from diffusion in non-equilibrium thermodynamics, where the process increases the entropy—or randomness—of the system over time. Imagine a gas – it eventually spreads out and evenly fills an entire space by random movement. Likewise, data such as images can be transformed into a uniform distribution by randomly adding noise.
Diffusion systems slowly destroy the data structure by adding noise until only noise is left.
In physics, diffusion is spontaneous and irreversible—sugars diffused in coffee cannot be cubed again. However, diffusion systems in machine learning aim to learn a kind of “reverse diffusion” to recover the destroyed data and gain the ability to recover the data from noise.
Image credit: OpenBioML
Diffusion systems have been around for almost a decade. But a relatively recent innovation from OpenAI called CLIP (short for Contrastive Language-Image Pre-Training) made them much more practical in everyday applications. CLIP classifies data—images, for example—to “score” each step of the diffusion process based on how likely it is to be classified under a given text prompt (e.g., “a sketch of a dog on a flowering lawn”).
Initially, the data has a very low CLIP score because it is mostly noise. But as the diffusion system reconstructs data from the noise, it slowly approaches agreement with the prompt. A useful analogy is uncut marble – as a master sculptor tells a novice where to carve, CLIP guides the diffusion system to an image that yields a higher score.
OpenAI introduced CLIP along with the DALL-E imaging system. It has since found its way into DALL-E’s successors, DALL-E 2, as well as open-source alternatives such as Stable Diffusion.
What can diffusion do?
So what can CLIP-guided diffusion models achieve? Well, as mentioned, they’re pretty good at creating art – from photorealistic art to sketches, drawings, and paintings in the style of just about any artist. In fact, there is evidence that they are problematic in reporting some of their training data.
But the models’ talent — controversial as it may be — doesn’t end there.
Researchers have also experimented with using guided diffusion models to compose new music. Harmonai, an organization funded by Stability AI, the London-based startup behind Stable Diffusion, released a diffusion-based model that can output music clips by training on hundreds of hours of existing songs. More recently, developers Seth Forsgren and Hayk Martiros created a hobby project called Riffus, which uses a diffusion model skillfully trained on spectrograms – visual representations – of audio to produce ditties.
Beyond the music realm, several labs are attempting to apply diffusion technology to biomedicine in hopes of discovering new treatments for diseases. Startup Generate Biomedicines and a team from the University of Washington trained diffusion-based models to create designs for proteins with specific properties and functions, MIT Tech Review reported earlier this month.
The models work in different ways. Generate Biomedicines adds noise by unraveling the chains of amino acids that make up a protein, then stitching together random chains to form a new protein guided by constraints set by the researchers. The University of Washington’s model, on the other hand, starts with an encoded structure and uses information about how the parts of a protein should fit together, provided by a separate AI system trained to predict protein structure.
Credit: PASIEKA/SCIENCE PHOTO LIBRARY/Getty Images
You have already achieved some successes. The model developed by the University of Washington group was able to find a protein that could bind to parathyroid hormone — the hormone that controls blood calcium levels — better than existing drugs.
Meanwhile, researchers at OpenBioML, a Stability AI-powered attempt to bring machine learning-based approaches to biochemistry, have engineered a system called DNA diffusion to generate cell-type-specific regulatory DNA sequences — segments of nucleic acid molecules that control expression certain genes in an organism. DNA diffusion will – if everything goes according to plan – generate regulatory DNA sequences from text statements such as “A sequence that activates a gene in cell type X to its maximum expression level” and “A sequence that activates a gene in the liver and heart”, but not in the brain.”
What might the future hold for diffusion models? The sky may well be the limit. Researchers have already used it to create videos, compress images, and synthesize speech. That’s not to say that diffusion won’t eventually be replaced by a more efficient, powerful machine learning technique, as GANs did with diffusion. But it’s the architecture of the day for a reason; Diffusion is nothing if not versatile.