Overall, JPEG XL at default cjxl speed outperforms AVIF even when using a very slow libaom setting (s1, >30 times slower). At a more reasonable libaom s7 (about half as fast as default cjxl), the improvement JPEG -> AVIF is comparable to the improvement AVIF -> JPEG XL.
Of course behind the overall picture, there are differences depending on the image contents. For example, for images of sports or rooms, AVIF actually does (slightly) better than JPEG XL (if you don't mind the extra encode time).
For landscapes or portraits, on the other hand, JPEG XL has a clearer advantage.
The video-based image formats (WebP and AVIF) particularly struggle with images containing subtle textures, like clothing or cloudy skies. For those, they can be even worse than (moz)JPEG. Overly aggressive deblocking filters are probably to blame for this.
Encoder consistency is another aspect, very important for deployment. "mozjpeg q80" has an average subjective quality (DMCOS) of 85, with a standard deviation of 5, so it can easily be 80 or 90. More complicated encoders tend to have less consistent results. Except for JPEG XL.
If you try to improve encoder consistency — or just evaluate encoders — using objective metrics (as opposed to subjective testing, which is of course much harder to do), be careful what metric you use. Simple metrics correlate only poorly.
What about HEIC? And better, proprietary AVIF encoders? We also have data on that. At half the encode speed of cjxl, HEIC (x265) more or less matches JPEG XL. Aurora, at one third the speed of cjxl, matches it at the high end but not at the low end (somewhat surprisingly).
One category the video-codec derived formats are particularly good at, is (lossy) non-photo images: logos, text, diagrams etc. On such images they get excellent results. JXL has some catching up to do there (which it can, there is still significant room for encoder improvements).
Note that hardware encoders (not tested here) will almost certainly perform significantly worse than software encoders, since hw design is inherently about cutting corners. E.g. hw JPEG is lot worse than mozjpeg, Apple's hw HEIC encode is likely a lot worse than x265, etc.
• • •
Missing some Tweet in this thread? You can try to
force a refresh