METHOD, APPARATUS AND SYSTEM FOR DETERMINING A LUMA VALUE
This application claims the benefit under 35 U.S.C. §119 of the filing date of Australian Patent Application No. 2015275320, filed Dec. 23, 2015, hereby incorporated by reference in its entirety as if fully set forth herein. The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for determining a luma value from 4:4:4 RGB video data. Development of standards for conveying high dynamic range (HDR) and wide colour gamut (WCG) video data and development of displays capable of displaying HDR video data is underway. Standards bodies such as International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG), the International Telecommunications Union-Radiocommunication Sector (ITU-R), the International Telecommunications Union-Telecommunication Sector (ITU-T), and the Society of Motion Picture Television Experts (SMPTE) are investigating the development of standards for representation and coding of HDR video data. HDR video data covers a wide range of luminance intensities, far beyond that used in traditional standard dynamic range (SDR) services. For example, the Perceptual Quantizer (PQ) Electro-Optical Transfer Function (EOTF), standardised as SMPTE ST.2084, is defined to support a peak luminance of up to 10,000 candela/metre2(nits) whereas traditional television services are defined with a 100 nit peak brightness (although more modern sets increase the peak brightness beyond this). The minimum supported luminance is zero nits, but for the purposes of calculating the dynamic range the lowest non-zero luminance is used (i.e. 4*10−5nits for PQ quantised to 10 bits). The physical intensity of a light source is measured in candela/metre2and is also referred to as ‘luminance’ or ‘linear light’. When luminance is encoded using PQ (or other transfer function) the encoded space is referred to as luma′. Luma is intended to be more perceptually uniform (i.e. a given change in the luma value results in the same perceived change in brightness regardless of the starting point). Traditional power functions such as the ‘gamma’ of SDR television is somewhat perceptually uniform. Transfer functions such as PQ are designed according to models of human visual perception to be more perceptually uniform. In any case, the relationship between luma and luminance is highly non-linear. Video data generally includes three colour components, where each frame comprises three planes of samples and each plane corresponds to one colour component. The relationship between the sampling rates of the planes is known as a ‘chroma format’. When each plane is sampled at the same rate, the video data is said to be in a ‘4:4:4’ chroma format. In the 4:4:4 chroma format, each triplet of collocated samples forms a ‘pixel’, having a colour and luminance resulting from the values of the triplet of collocated samples. When referring to a sample to which a gamma-correction or a transfer function was already applied, the colour component is referred to as ‘chroma’ and the luminance component is referred to as luma′ to reflect the fact that the colour components' values are not ‘true’ colour and luminance. The prime symbol (′) is sometimes used after the variable name to indicate a luma value (e.g. Y′). When the second and third of the three planes is sampled at half the rate horizontally and vertically compared to the first plane, the video data is said to be in a ‘4:2:0’ chroma format. As the use of the 4:2:0 results in fewer samples being processed compared to 4:4:4, the result is lower complexity in the video codec. Then, each pixel has one luma sample and groups of four pixels share a pair of chroma samples. Moreover, in such a case, typically the ‘YCbCr’ colour space is used, with the luma (Y) channel stored in the first plane, where the sampling rate is highest and the chroma channels (Cb and Cr) stored in the second and third planes respectively, where the lower sampling rate for chroma information results in lower data rate with little impact subjectively for viewers of the decoded video data. When displaying the video data, a conversion back to 4:4:4 is required to map the video data onto modern display technology, such as an LCD panel. As such, a pair of chroma samples (i.e Cb and Cr samples) are combined with four luma (Y) samples. Any residual luminance information present in the Cb and Cr samples is known to interfere with the luminance information present in each Y sample, resulting in shifts in the 4:4:4 output from the 4:2:0 to 4:4:4 conversion process. In earlier ‘standard dynamic range’ (SDR) systems using a transfer function that is a power function for encoding of luma and chroma samples (i.e. a ‘gamma function’) the nonlinearity of the transfer function was less than is the case for Perceptual Quantizer (PQ) Electro-Optical Transfer Function (EOTF). It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. According to one aspect of the present disclosure, there is provided a method of determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream, the method may initiate by determining a location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, from the RGB video data. The method continues by determining a region that contains the determined location, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients. The method may also include selecting one or more of the coefficients associated with the determined region, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values; and determining the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location. Other aspects are also disclosed. At least one embodiment of the present invention will now be described with reference to the following drawings and appendices, in which: Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. As discussed above, although the possibility of brightness and colour distortion resulting from chroma upconversion is present in SDR systems, the less nonlinear transfer function reduces the extent of such artefacts compared to the case where the Perceptual Quantizer (PQ) Electro-Optical Transfer Function (EOTF) is used. Methods to alleviate such artefacts operate at the ‘pixel rate’ of the video processing system, and as such, relatively low complexity or at least, fixed complexity methods are required. In modern video processing systems, the pixel rate is very high (e.g. 4K at 60 frames per second) where 3840×2160×60=498×106pixels per second need to be processed. As such, for real time processing, a need exists for implementations that are feasible for hardware implementation. The encoding device 110 encodes source material 112. The source material 112 may be obtained from a complementary metal oxide semiconductor (CMOS) imaging sensor of a video camera with a capability to receive a wider range of luminance levels than traditional SDR imaging sensors. Additionally, the source material 112 may also be obtained using other technologies, such as charged coupled device (CCD) technology, or generated from computer graphics software, or some combination of these sources. Also, the source material 112 may simply represent previously captured and stored video data. The source material 112 includes a sequence of frames 122. Collectively, the frames 122 form uncompressed video data 130. In the context of preparing video bitstreams for distribution, the source material 112 is generally present in the 4:4:4 chroma format and requires downconversion to the 4:2:0 chroma format prior to encoding. For example, if the source material 112 is obtained from an imaging sensor, a ‘debayering’ process is applied that results in 4:4:4 video data. Moreover, the video data is sampled in RGB. The video data 130 includes codewords for the frames 122, such that three planes of codewords are present for each frame. The source material 112 is generally sampled as tri-stimulus values in the RGB domain, representing linear light levels. Conversion of linear light RGB to a more perceptually uniform space is achieved by the application of a non-linear transfer function and results in an R′G′B′ representation comprising R′G′B′ values. The transfer function may be an opto-electrical transfer function (OETF), in which case the R′G′B′ values represent physical light levels of the original scene. In arrangements where the transfer function is an opto-electrical transfer function (OETF), the video processing system 100 may be termed a ‘scene-referred’ system. Alternatively, the transfer function may be the inverse of an electro-optical transfer function (EOTF), in which case the R′G′B′ values represent physical light levels to be displayed. In arrangements where the transfer function is the inverse of an electro-optical transfer function (EOTF), the video processing system 100 may be termed a ‘display-referred’ system. As seen in As also seen in The video encoder 118 encodes each frame as a sequence of square regions, known as ‘coding tree units’, producing an encoded bitstream 132. The video encoder 118 conforms to a video coding standard such as high efficiency video coding (HEVC), although other standards such as H.264/AVC, VC-1 or MPEG-2 may also be used. The encoded bitstream 132 can be stored, e.g. in a non-transitory storage device or similar arrangement 140, prior to transmission over communication channel 150. The encoded bitstream 132 is conveyed (e.g. transmitted or passed) to the display device 160. Examples of the display device 160 include an LCD television, a monitor or a projector. The display device 160 includes a video decoder 162 that decodes the encoded bitstream 132 to produce decoded codewords 170. The decoded codewords 170 correspond approximately to the codewords of the uncompressed video data 130. The decoded codewords 170 may not be exactly equal to the codewords of the uncompressed video data 130 due to lossy compression techniques applied in the video encoder 118. The decoded codewords 170 are passed to a chroma upsampler module 164 to produce decoded 4:4:4 video data 172. The chroma upsampler module 164 applies a particular set of filters to perform the upsampling from 4:2:0 to 4:4:4, as described further with reference to The relationship between a given codeword of the decoded codewords 170 and the corresponding light output emitted from the corresponding pixel in the panel device 166 is nominally the inverse of the transfer function. For a display-referred system, the inverse of the transfer function is the EOTF. For a scene-referred system, the inverse of the transfer function is the inverse OETF. For systems using ‘relative luminance’, the light output is not controlled only by the codeword and the inverse of the transfer function. The light output may be further modified by user control of the display's contrast or brightness settings. In one arrangement of the video processing system 100, the EOTF in use is the PQ-EOTF (i.e., SMPTE ST.2084) as will be described further below with reference to Notwithstanding the example devices mentioned above, each of the source device 110 and display device 160 may be configured within a general purpose computing system, typically through a combination of hardware and software components. Further, whilst the communication channel 150 of The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-Ray™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the HDR imaging sensor 112, or as a destination for decoded video data to be stored for reproduction via the display 214. The capture device 110 and the display device 160 of the system 100 may be embodied in the computer system 200. The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun SPARCstations, Apple Mac™ or alike computer systems. Where appropriate or desired, the video encoder 118 and the video decoder 162, as well as methods described below, may be implemented using the computer system 200 wherein the video encoder 118, the video decoder 162 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 118, the video decoder 162 and the steps of the described methods are effected by instructions 231 (see The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing the video encoder 118, the video decoder 162 and the described methods. The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212. In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280. When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the computer system 200 of As shown in The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229. In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in The video encoder 118, the video decoder 162 and the described methods may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder 118, the video decoder 142 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267. Referring to the processor 205 of (a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230; (b) a decode operation in which the control unit 239 determines which instruction has been fetched; and (c) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction. Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232. As seen in Other locations for the chroma sample are also possible. The encoded bitstream 132 includes a packet known as the ‘video usability information’ (VUI). The VUI includes information instructing the display device 160 how to interpret the decoded samples from the video decoder 162. The VUI may include a syntax element ‘chroma_sample_loc_type_top_field’ that indicates the chroma sampling location. Although generally used when the video data is in ‘interlaced’ format, the above syntax element may also be used when the video data is in progressive format. When chroma_sample_loc_type_top_field is not present, the chroma samples are assumed to be non-co-sited (as shown in The encoded bitstream 132 may also contain a packet known as a Chroma resampling filter hint Supplementary Enhancement Information (SEI) message. The Chroma resampling filter hint SEI message enables signalling of the chroma filters to resample video data from one chroma format to another chroma format. Predefined filters can also be signalled, such as those defined in Rec. ITU-T T.800|ISO/IEC 15444-1. Considering normalised input, each of R′, G′ and B′ lie within the interval [0, 1]. This results in a cubic volume in the R′G′B′ space that contains the video data. One aspect of the chroma subsampling issue is that by performing the filtering in the perceptual domain (i.e. R′G′B′) rather than the linear domain (i.e. RGB), the highly non-linear nature of the transfer function results in shifts in intensity from the filtering operation. In the case of R′G′B′ being subsampled (e.g. G′ is assigned to the primary colour component and R′ and B′ are subsampled) then multiple samples in B′ and R′, at the higher sampling rate of the 4:4:4 chroma format, are filtered to produce samples at the lower sampling rate of the 4:2:0 chroma format. Then, interference occurs between neighbouring pixels due to brightness information present in the B′ and R′ samples at the higher sampling rate being combined into B′ and R′ samples at the lower sampling rate. From Equation (1), the severity of the interference can be seen from the relative contribution of R′ and B′ to Y′. Even though Y′CbCr is considered to ‘decorrelate’ luma (Y′) from chroma (Cb and Cr), the decorrelation is not complete and some luminance information is still present in the Cb and Cr values. For example, when applying equation (1) to produce Y′CbCr values, the volume of valid R′G′B′ values results in a different volume of valid Y′CbCr values, such that the range of valid Cb and Cr values converges at the minimum and maximum valid Y′ values. At the middle Y′ value, the range of permissible Cb and Cr values reaches a maximum. Thus, it can be seen that Y′CbCr has not fully decorrelated colour information from the luminance information. Then, when chroma subsampling is applied independently to the Cb and Cr samples, interference between neighbouring pixels in terms of luminance occurs. across samples, which may have quite different brightness. This interference can even result in ‘out of gamut’ colours, i.e. in 4:2:0 Y′CbCr values that, when converted back to 4:4:4 and then to R′G′B, have values outside the interval of [0, 1]. The RGB to YCbCr conversion may be performed with linear light RGB as input rather than R′G′B′ as input. However, in many systems it is impractical to do so, as either the linear light data is not available, or the complexity of dealing with linear light data, which requires a floating point representation, is not feasible for considerations of complexity and/or real time operation. The application of the RGB to YCbCr colour conversion using R′G′B′ instead of RGB video data is known as a ‘non-constant luminance’ approach. Also, the colour space conversion module 114 operates in the 4:4:4 chroma format, with downconversion to 4:2:0 as a subsequent step (prior to encoding). The steps described above with reference to Although the luma sample adjuster 600 is shown operating on linear light RGB input, the luma sample adjuster 600 may also operate on Y′CbCr, e.g. resulting from the colour space conversion module 114. In such a case, the non-constant luminance representation of Y′CbCr has introduced some distortions that cannot be undone, hence deriving a Ylinearthat exactly matches the Ylinearfrom application of the RGB to XYZ colour space transform on the source material 122 is not possible. However, a close match is still achieved. Then, a modified luma sample adjuster is possible that operates as a function of three inputs (Y, Cb and Cr) and produces a revised Y value, Yfinal. Although the result of the luma sample adjuster 600 could be encapsulated into a look-up table for all possible input values, the size of such a look-up table would be prohibitive for real-time implementation. In A model may be associated with each of the blocks of the structure 750. Alternatively, it can be said that the model is associated with regions of the colour space 700. The model is used to map the input values (i.e. Ylinear, Cb and Cr) to an output Y value. Each model includes a number of coefficients. With a ‘second order’ model, seven coefficients are used to implement the model. In the example of the A single model may be associated with more than one block and hence with more than one region. In the example of the The model is used to map the input values (i.e. Ylinear, Cb and Cr) to an output Y value. Each model includes a number of coefficients. With a ‘second order’ model, seven coefficients are input to the model according to the following Equation (2): where Yfinalis the output Y luma value; a, b, c, d, e, f and g are the model input coefficients associated with a region; Ylinear, Cb and Cr are the coordinates of a point within the region. A linear model (i.e. with the coefficients a, c and e always set to zero) provides a relatively poor approximation of the function resulting from the luma sample adjuster 600 due to the highly nonlinear nature of the transfer function present in the iterative loop. The introduction of the second order (i.e. quadratic) terms allows for improved modelling. As such, the number of regions required to obtain a close approximation of the luma sample adjuster 600 using a second-order model is less than would be the case for a linear model. The trade-off is the presence of additional multiplier logic to implement the quadratic terms. Experiments were also performed using higher order models, including third order and fourth order, however the additional improvement was deemed insufficient to justify the increase in complexity, both in terms of additional multiplication logic and in terms of storage for the increased number of coefficients associated with each model. The method 800 may be performed as an initialisation step, or may even be performed at a separate time and location (e.g. at design time), with the result stored in the memory 206 for later use at run time. The method 800 begins by considering the entire colour space as one cube (i.e. a ‘current region’ occupies the entire colour space). The method 800 begins at a generating step 802, where model is generating for a current region. A set of points occupying a considered region is tested for under control of the processor 205 at step 802. Due to the large total number of possible points, the region is sparsely tested (e.g. by testing points spaced equally in each dimension). Note that the set of points may not occupy the full extent of the region because some YlinearCbCr points are invalid (i.e. cannot result from valid R′G′B′ input) and thus are unreachable, so the invalid points are excluded from the model generation. A least means square algorithm is used at step 802 to derive the coefficients of the region, using the tested points and using the result of the luma sample adjuster 600 as the desired values for Yfinal. Control in the processor 205 then passes to a determining step 804. At determining step 804, a model error is determined for the model generated at step 802, under control of the processor 205. The model error is a measure of the discrepancy between values produced by the model and the result of the iterative search as described with reference to At step 806, the error from the step 804 is compared with a threshold value under control of the processor 205. The threshold value is a predetermined value indicative of the maximum permitted error for a given model and region. In one arrangement, the threshold value is selected to be equal to the just noticeable distortion (JND) value. If the threshold is exceeded at step 806, it is deemed that the model is inadequate to generate usable values for Yfinal. In this case, control passes to a region subdividing step 808. Alternatively, control passes to a last region testing step 810. To avoid excessive subdivisions, a limit may be placed on the maximum permitted depth of subdivisions, in which case the smallest-permitted regions (e.g. 64×64×64) may have error values that exceed the threshold value. At the region subdividing step 808, the currently considered region is subdivided into eight smaller regions, in accordance with an ‘octant hierarchy’. The eight smaller regions are realised by dividing the current cubic region in half in each dimension. It is also possible to have subdivisions other than halving, in which case region thresholds are also output by the region subdividing step 808, for storage in the memory 206, to enable later identification of a region to which a given point belongs. The method 800 is invoked for each of the resulting regions. Once step 808 completes, control in the processor 205 passes to the step 810. At step 810, if any of the volume of the colour space of As the method 800 progresses through determining models and regions, a map is maintained (e.g., within memory 206) enabling any point in the colour space of In one arrangement of the method 800, each region in the octant hierarchy has an associated ‘active region’ flag, including regions that are further subdivided into smaller regions. The active region flag indicates that a model is associated with the active region. If a point is determined to lie within a region that does not have an associated model, then the model of the first ancestor region to have an associated model is used. As such, if the recursion process determines that a region should be split, it is possible that some of the resulting regions offer no substantial improvement upon the model derived for the earlier region. In this case, the child regions can have the ‘active region’ flag set to indicate that an ancestor region is used instead. This situation can occur because the error from step 804 is often concentrated in small portions of a region, so the requirement to split the region into subregions may only result in an improvement in a small portion of the region (e.g. in one sub-region). Use of ancestor regions reduces the requirement to store many more models (e.g. for the other seven sub-regions) in the memory 206, as a model associated with a region may also be use by sub-regions. Such arrangements therefore afford a reduction in the memory consumption for storage of coefficients, due to the reduced number of required models. A region identifier module 902 determines which region a given chroma sample belongs to in the YlinearCbCr colour space. The input to the region identifier module 902 is the linear light value from the luminance deriver 902 and subsampled chroma samples from a chroma subsampler module 912. The chroma subsampler module 912 can provide various filters with different degrees of ‘sharpness’ (i.e. roll-off in the frequency domain). The subsampled chroma output from the chroma subsampler module 912 is then passed to a chroma upsampler module 914. The chroma upsampler module 914 applies interpolation filters to upsample the chroma from the 4:2:0 chroma format to the 4:4:4 chroma format. The particular filters applied are signalled in the encoded bitstream 132 using the Chroma resampling filter hint SEI message. Consequently, the upsampled chroma samples output from the chroma upsampler module 914 correspond to the decoded 4:4:4 video data 172. Then, the upsampled chroma, along with the linear light value Ylinear, is used to identify a point in the octant hierarchy. For uniformly-spaced subdivision in the octant hierarchy, a point in the octant hierarchy is identified by by inspecting the most significant bits in descending order for each codeword to determine the region to which the sample belongs. The result is a region identifier (ID) which is passed to a model mapper module 904. The region identifier module 902 also produces a region boundary. The region boundary is simply the co-ordinates of the region in the three dimensional space afforded by the bit depth of the codewords. The model mapper module 904 provides a look-up table (LUT) mapping a region ID to a model identifier (ID). The LUT allows for multiple regions to have the same model. In some arrangements, the relationship between a region ID and a model ID is a one-to-one relationship. In arrangements where relationship between a region ID and a model ID is a one-to-one relationship, the model mapper module 904 is effectively bypassed, or not present, as no functionality is provided by the module 904. In other arrangements, the region ID is the smallest supported octant (or cube) size, and larger effective regions are achieved by having multiple region ID values mapping to a single model ID value. Arrangements where the region ID is the smallest supported octant (or cube) size provides the benefit of reduced memory consumption, as fewer models are stored compared to storing a model for each of the smallest octant size. In an arrangement of the module 904, if a region does not have a model associated with it, then the module 904 will use a model from a parent region. A model LUT module 906 contains the coefficients for each model, selectable via the model ID. Generally, the model LUT module 906 is configured within the memory 206, although the model LUT module 906 may also be configured within on-chip SRAM for fast access (e.g. at the pixel rate of the encoding device 110). A region offset module 908 determines the offsets of the YlinearCbCr point within the selected region (i.e. relative to the region boundary as provided by the region identifier module 902). The offsets are the offset in each dimension (i.e. Ylinear, Cboffsetand Croffset) relative to the region (cube) boundary in each dimension (i.e. Yregion, Cbregion, Crregion). Input is the subsampled chroma samples and Ylinear. The resulting offsets are passed to a luma sample deriver 910, where the resulting offsets are applied with the model coefficients from the model LUT 906. Use of a second order model results in the Equation (3), as follows, for deriving the output Yfinal: Generally, the luma sample deriver 910 is implemented using integer arithmetic for reduced complexity, and so bit shifts are also present to rescale the final value to the normalised range afforded by the bit depth of the codewords. The bit depth of Yfinalmatches the bit depth of the video encoder 118. However, the bit depth for intermediate data (i.e. resulting from evaluation of equation (3) above) may be wider than the bit depth of the video encoder 118 to reduce errors resulting from loss of precision of intermediate results. When using the 4:2:0 chroma format, the chroma downampler is generally applied to sets of 2×2 Cb and 2×2 Cr input samples to produce a pair of output samples (i.e. one Cb and one Cr sample). The chroma downsampler may take source sets of Cb and Cr samples from an area wider than 2×2, depending on the selected filter tap coefficients of the subsampling operation. The remaining modules in the chroma downsampler module 116 operate at the pixel rate, which in the case of YCbCr in 4:2:0 corresponds to the rate at which Y samples arrive. In the chroma downsampler 116, the ‘true’ or intended luminace of each pixel is known, and this information is used to adjust the final Y sample output to accommodate for residual luminance information present in the downsampled Cb and Cr samples, that is shared amongst four luma samples. The method 1000 is described by way of example for the hardware modules as described with reference to The method 1000 is described by way of example with reference to the frame 300. The method 1000 begins with a determining linear luminance step 1001, where a linear luminance value, Ylinear, is determined. The linear luminance is determined by the luminance deriver 901, under control of the processor 205. The linear luminance is determined for a given pixel input to the chroma downsampler 116 (e.g. as YCbCr data in the 4:4:4 chroma format). The linear luminance may be determined by converting the YCbCr data back to R′G′B′, then applying the PQ-EOTF to each component to produce RGB. The RGB is then converted to CIE1931 XYZ. Of the final XYZ values, only the luminance is further considered. Control in the processor 205 then passes to a determining subsampled chroma values step 1002. At step 1002, the chroma subsampler 912, under control of the processor 205, performs a subsampling operation on the Cb and Cr samples in the 4:4:4 chroma format to produce subsampled chroma values in the form of Cb and Cr samples in the 4:2:0 chroma format. Step 1002 may be achieved through application of a simple linear filter. Alternatively, more complex (i.e. more filter tap) filters may be used at step 1002, generally having a sharper roll-off in the frequency domain, to hasten the transition from one chroma value to another chroma value, when considering a run of chroma sample values. Such hastening is beneficial to reduce the amount of intermediate chroma sample values, which actually represent ‘new’ colours not present in the 4:4:4 video data. Although the method 1000 is invoked for each luma (Y) sample, step 1002 may be performed for every 2×2 group of luma samples. For example, in the example of At step 1003, the chroma upsampler module 914, under control of the processor 205, determines upsampled chroma values. A defined set of chroma upsampling filters is applied to convert 4:2:0 video data from the chroma subsampler 912 to 4:4:4 video data. The 4:4:4 video data output from the chroma upsampler 914 differs from the 4:4:4 video data input to the chroma subsampler 912, due to the loss resulting from the intermediate 4:2:0 representation. The filtering applied in the chroma upsampler 914 accords with signalling present in the encoded bitstream 132 and contained in the Chroma resampling filter hint SEI message. Examples of upsampler approaches include ‘nearest neighbour’ and bilinear filtering. Regardless of the approach used, the chroma upsampler 164 in the display device 160 should use the same approach as used in the chroma upsampler module 914. Using the same approach ensures that the luma sampler deriver 910 operates using the same chroma sample values as seen in the display device 160. In some application scenarios the chroma sample filters may be predefined, in which case there is no need to explicitly signal the Chroma resampling filter hint SEI message in the encoded bitstream 132. Control in the processor 205 then passes to a determine colour point step 1004. Then at determining colour point step 1004, a colour point is determined under control of the processor 205. The Ylinearvalue determined at step 1001 and the subsampled chroma values from the step 1002 are assembled at step 1004 to produce a value YlinearCbCr that represents a point in a space, termed here a ‘colour space’ but not corresponding directly to other well-known colour spaces such as YCbCr. The colour space may be defined by linear luminance and non-linear 4:2:0 chroma values. The Ylinearvalue is generally considered to have a large dynamic range mandating use of floating-point storage. However, a compressed representation (e.g. as achieved with the use the PQ-EOTF), enables an integer format to be used. For 4:4:4 input video data and 4:2:0 output, step 1004 is applied to each of the four Ylinearvalues associated with the upsampled chroma samples from the chroma upsampler module 914. For arrangements supporting a ‘4:2:2’ chroma format, there are two Ylinearvalues associated with a pair of subsampled CbCr values, and the iteration is modified accordingly, as is the chroma downsampler module 912. Control in the processor 205 then passes to a determining region step 1006. At determining step 1006, the region identifier module 906, under control of the processor 205, determines the region within the hierarchy of regions resulting from the method 800 which contains the value YlinearCbCr. The determined region is one region of a plurality of regions located in the colour space and has a plurality of associated coefficients as described above with reference to At step 1008, the model mapper 904, under control of the processor 205 selects a model identifier (ID) associated with the region identified from the step 1006. The model ID is an index into a table of models configured, for example, within memory 206. Each model includes a set of coefficients (e.g. seven coefficients for a second-order model). Steps 1006 and 1008 may be said to select coefficients for a given region, as the coefficients for the applicable model are available from the memory 206. Control in the processor 205 then passes to a determining colour point location within region step 1010. At the determining colour point location within region step 1010, the region offset module 908, under control of the processor 205, determines the location (or ‘offset’) of the colour point, YlinearCbCr, in the region determined from step 1006. At step 1010, the offset of the YlinearCbCr point within the region of the step 1006 is determined. The offset is a vector having three dimensions corresponding to Ylinear, Cb and Cr. Generally, the unit of each component of the offset vector is one codeword. Control in the processor 205 then passes to an apply model step 1012. At the applying model step 1012, the model LUT 906 and the luma sample deriver 910, under control of the processor 205, are used to determine an output luma sample Yfinalfor use by the video encoder 118. The model ID from the step 1008 is used to index the selected model from the model LUT 906, determining the coefficients associated with the selected model. The determined coefficients represent a model which maps the space within the selected region from the input vector of Ylinear, Cb and Cr to the output Yfinalvalue. As the determined coefficients are the result of a search process (e.g. a recursive least-mean algorithm as described with reference to At encoding luma sample step 1014, the video encoder 118, under control of the processor 205, is used for encoding the luma sample value resulting from the step 1012 into the encoded bitstream 132. The associated subsampled chroma samples resulting from the step 1002 are also passed to the video encoder 118. The method 1000 then terminates. An alternative arrangement of the method 1000 will now be described below with reference to the In the arrangement of For the arrangement of At the determining colour point location within region step 1010, for the arrangement of At the apply model step 1012, for the arrangement of Next, the luma sample deriver 910 applies each of the models associated with the regions 1102 and 1104 to the offset vector to obtain two resulting values Yfinal-1and Yfinal-2for the regions 1102 and 1104 correspondingly. The model LUT 906 and the luma sample deriver 910, under control of the processor 205, then use a weighted sum according to the Equation (4) below to determine the Yfinalvalue: The resulting value Yfinal, is then passed, along with the associated Cr and Cb samples, to the video encoder 118. Control in the processor 205 then passes to an encode luma sample step 1014. Arrangements disclosed herein provide for a video system that encodes and decodes video content that has been subsampled, e.g. to the 4:2:0 chroma format, with compensation for deviations in the luminance of each pixel that would otherwise be present in a conventional chroma downsampler. For HDR applications using highly nonlinear transfer functions, such as PQ-EOTF, the deviations are more significant than in traditional SDR applications. Moreover, the methods described herein operate with fixed complexity per pixel and with complexity commensurate with hardware implementation (e.g. for real-time systems). Notwithstanding the above description of a system operation with the source material 112 in the 4:4:4 chroma format and the video encoder 118 configured to encode video data in the 4:2:0 chroma format, source material 112 in other chroma formats, such as 4:2:2 may also be used. Moreover, source material 112 that is in an interlaced format may be used, with the chroma upsampling filter alternating between the top and bottom fields. The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals. The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of”. Variations of the word “comprising”, such as “comprise” and “comprises” have correspondingly varied meanings. A method of determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream. A location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, is determined from the RGB video data. A region that contains the determined location is determined, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients. One or more of the coefficients associated with the determined region are selected, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values. The method then determines the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location. 1. A method of determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream, the method comprising:
determining a location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, from the RGB video data; determining a region that contains the determined location, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients; selecting one or more of the coefficients associated with the determined region, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values; and determining the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location. 2. The method according to 3. The method according to 4. The method according to 5. The method according to 6. The method according to 7. The method according to 8. The method according to 9. The method according to 10. The method according to 11. A system for determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream, the system comprising:
a memory storing data and a computer program; a processor coupled to the memory for executing said computer program, said computer program comprising instructions for:
determining a location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, from the RGB video data; determining a region that contains the determined location, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients; selecting one or more of the coefficients associated with the determined region, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values; and determining the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location. 12. An apparatus for determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream, the apparatus comprising:
means for determining a location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, from the RGB video data; means for determining a region that contains the determined location, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients; means for selecting one or more of the coefficients associated with the determined region, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values; and means for determining the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location. 13. A computer readable medium having a computer program stored on the medium for determining a luma value from 4:4:4 RGB video data for encoding chroma downsampled 4:2:0 YCbCr video data into a bitstream, the program comprising:
code for determining a location, in a colour space defined by linear luminance and non-linear 4:2:0 chroma values, from the RGB video data; code for determining a region that contains the determined location, the region being one region of a plurality of regions located in the colour space and having a plurality of associated coefficients; code for selecting one or more of the coefficients associated with the determined region, the selected coefficients being used to map the linear luminance and non-linear 4:2:0 chroma values to a luma value that compensates for a luminance shift introduced by chroma downsampling of the non-linear 4:2:0 chroma values; and code for determining the luma value for encoding into the bitstream according to a function of the selected coefficients and the determined location.REFERENCE TO RELATED PATENT APPLICATION(S)
TECHNICAL FIELD
BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF THE DRAWINGS
DETAILED DESCRIPTION INCLUDING BEST MODE
INDUSTRIAL APPLICABILITY













