There is some discussion that suggests that Windows is simply wrong in its presentation of SDR sRGB content over HDR signals because it does not use the reference display 2.2 power function to decode the sRGB content before re-encoding it as an HDR signal. But this isn’t the whole picture. My perspective is that this extreme view of the Windows behaviour being absolutely “wrong” stems from the sRGB standard IEC 61966-2-1:1999 being locked behind a paywall, which makes it difficult for the average person to get a better understanding of it for themselves. Thankfully someone posted it on a GitHub thread so I was able to take a look at it myself and discover why the behaviour that Windows exhibits in HDR mode is actually exactly what the sRGB standard says to do, in spite of this causing a visually undesirable mismatch with some SDR displays.
tldr; the sRGB standard is a problematic standard.
In section 5.2, the sRGB standard states that transformations from nonlinear piecewise encoded sRGB code values to linear CIE 1931 XYZ values should use the piecewise decoding functions. These linear XYZ values are needed to encode an HDR signal, as HDR does not support direct transmission of nonlinear sRGB code values. Although these decoding transformations are listed under the Encoding transformations section, the result of this transformation is described in relation to the reference display as follows:
These CIE 1931 XYZ values represent optimum image colorimetry when viewed on the reference display, in the reference viewing conditions, by the reference observer, and as measured on the faceplate of the display, which assumes the absence of any significant veiling glare.
This statement is simply incorrect: a 2.2 power function must be used to transform the piecewise encoded RGB values into CIE XYZ values that represent colorimetry when viewed on the reference display.
To make this incorrect statement even more confusing, it is fully recognized as incorrect and causing a mismatch that the authors felt was reasonable at the time:
One impact of this encoding specification is the creation of a mismatch between theoretical reference display tristimulus values and those generated from the encoding implementation. The advantages of optimising encoding outweigh the disadvantages of this mismatch. A linear portion of the transfer function of the dark-end signal is integrated into the encoding specification to optimise encoding implementations.
All that to say, Windows correctly follows the sRGB standard to the letter when presenting SDR sRGB content over an HDR signal by using the prescribed transformations to attain reference display colorimetry as CIE XYZ values. Additionally, there is a notable problem with the sRGB standard in that both a piecewise decoding to CIE XYZ values and a 2.2 power function display decoding are treated as equally correct, even though they produce visibly different, mismatching results.
Discussion of Intent
There is a remaining question of intent behind the mismatch between the piecewise encoding function and the 2.2 power function reference display. Is the mismatch an unfortunate compromise or an important and deliberate feature?
It seems that most explanations as to why this mismatch was the original intent of the standard come from an assumption that the sRGB standard is similar to broadcasting standards such as BT.709 with BT.1886.
Charles Poynton’s book Digital Video and HD Algorithms and Interfaces gives a great explanation in the Gamma section as to why the mismatch between encoding and decoding of BT.709 and BT.1886 was intentionally introduced to compensate for the low maximum luminance and dynamic range of consumer displays relative to a bright daylight physical source scene. In fact, when using a power function that is equivalent to the BT.709 piecewise function for encoding and the BT.1886 2.4 power function for decoding, a 1.2 power function mismatch occurs, which emphasizes the deliberate intent behind this mismatch. ITU has numerous reports that describe this important mismatch as the “Opto-Optical Transfer Function (OOTF)” (see ITU-R BT.2408, ITU-R BT.2390, and ITU-R BT.2446).
But what about sRGB? Unlike the ITU reports, sRGB has no supporting documents suggesting that the 2.2 power function should be used for decoding to reproduce an OOTF or equivalent. As outlined in the beginning of this post, the standard seems to describe the mismatch as something that only provides downsides, rather than benefits, and even goes as far as instructing use of the piecewise functions for decoding to linear XYZ space to produce the intended stimulus of a reference display. A Standard Default Color Space for the Internet – sRGB also does not describe any benefits to the mismatching encoding and reference display functions; instead, there is only discussion relating to displaying BT.709 encoded content on an sRGB reference display in order to produce an intended and beneficial mismatch that BT.709 encoded content depends on. It seems that the primary reasons for the 2.2 power function reference display were to easily support playback of BT.709 content without any further processing and support existing common consumer displays, not to cause a mismatch between the sRGB encoded content and the reference display.
Unlike the end-to-end OOTF mismatch of BT.709 with BT.1886, which is equivalent to a 1.2 power, or the mismatch of BT.709 with an sRGB reference display, which is equivalent to a 1.1 power, the end-to-end mismatch of sRGB is negligible when using the same math approximations. And this makes sense; the sRGB standard was designed to include display of abstract computer graphics, such as user interfaces and text, etc., rather than being designed primarily for display of bright outdoor physical scenes captured by a camera during a live broadcast.
I believe that there is good argument behind the sRGB mismatch being an unfortunate compromise, but when comparing with other standards it appears that this 2.2 power function mismatch is not an important and deliberate feature in the way that BT.709 with BT.1886 is.