
Benj Edwards / Ars Technica
Swiss software engineer Matthias Bühlmann last week discovered popular image synthesis model Stable diffusion although with significant caveats, it can compress existing bitmap images with fewer visual artifacts than JPEG or WebP at high compression ratios.
Stable diffusion is one AI image synthesis model usually creates images based on text descriptions (called “requests”). The AI model learned this skill by studying millions of images taken from the Internet. During the training process, the model makes associations between images and related words, creates a smaller representation of key information about each image, and stores them as “weights,” which are mathematical values that represent what the AI image model knows. , so to speak.
As Stable Diffusion researchers “compress” into weight form, they reside in what is called “hidden space,” meaning that they exist as a kind of fuzzy potential that can be transformed into images once decoded. With Stable Diffusion 1.4, the weights file is about 4 GB, but it represents knowledge of hundreds of millions of images.

While most people use Stable Diffusion with text suggestions, Bühlmann ditched the text encoder and instead forced his images through Stable Diffusion’s image encoder process, which takes a low-resolution 512×512 image and converts it to a higher-resolution 64×64 hidden image. representation of space. At this point, the image is available at a smaller data size than the original, but can still be expanded (decoded) to a 512 × 512 image with fairly good results.
In his tests, Bühlmann found that images compressed with Stable Diffusion subjectively looked better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows a photo of a candy store compressed to 5.68KB using JPEG, 5.71KB using WebP, and 4.98KB using Fixed Diffusion. A Fixed Diffusion image has more resolved detail than compressed in other formats, and few obvious compression artifacts.

Bühlmann’s method currently comes with significant limitations, however: It doesn’t do well with faces or text, and in some cases it can hallucinate detailed features in the decoded image that aren’t in the source image. (You probably don’t want your image compressor to invent details in an image that doesn’t exist.) Also, decoding requires a 4GB Stable Diffusion weights file and additional decoding time.
While this use of Stable Diffusion is more of an unconventional and fun hack than a practical solution, it could potentially point to future uses of image synthesis models. May be a Bühlmann code Found on Google Colab, and more technical details about his experiment can be found in his paper Writing towards artificial intelligence.