RECEPTION DEVICE, RECEPTION METHOD, TRANSMISSION DEVICE, AND TRANSMISSION METHOD
The present technology relates to a reception device, a reception method, a transmission device, and a transmission method, and more particularly, the present technology relates to a reception device and the like that VR-displays a stereoscopic image. In a case where a stereoscopic image is virtual reality (VR)-displayed, it is important for stereoscopic vision to superimpose subtitles and graphics at a position closer to an object displayed interactively. For example, Patent Document 1 shows a technology to transmit depth information for each pixel or evenly divided block of an image together with image data of left and right eye images, and to use the depth information for depth control when superimposing and displaying subtitles and graphics on the receiving side. However, for a wide viewing angle image, it is necessary to secure a large transmission band for transmitting depth information. An object of the present technology is to easily implement depth control when superimposing and displaying superimposition information by using depth information that is efficiently transmitted. A concept of the present technology is a reception device including: a reception unit configured to receive a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures; and a processing unit configured to extract left-eye and right-eye display area image data from the image data of a wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream and to superimpose superimposition information data on the left-eye and right-eye display area image data for output, in which when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives parallax to the superimposition information data to be superimposed on each of the left-eye and right-eye display area image data on the basis of the depth meta information. In the present technology, the reception unit receives a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures. For example, the reception unit may receive the depth meta information for each of the pictures by using a timed metadata stream associated with the video stream. Furthermore, for example, the reception unit may receive the depth meta information for each of the pictures, the depth meta information being inserted into the video stream. Furthermore, for example, the position information on the angle areas may be given as offset information based on a position of a predetermined viewpoint. The left-eye and right-eye display area image data is extracted by the processing unit from the image data of a wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream. The superimposition information data is superimposed on the left-eye and right-eye display area image data for output. Here, when superimposing the superimposition information data on the left-eye and right-eye display area image data, on the basis of the depth meta information, parallax is added to the superimposition information display data that is superimposed on each of the left-eye and right-eye display area image data. For example, the superimposition information may include subtitles and/or graphics. For example, when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit may give the parallax on the basis of a minimum value of the representative depth value of the predetermined number of areas corresponding to a superimposition range, the representative depth value being included in the depth meta information. Furthermore, for example, the depth meta information may further include position information indicating which position in the areas the representative depth value of the predetermined number of angle areas relate to. When superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit may give the parallax on the basis of the representative depth value of the predetermined number of areas corresponding to the superimposition range and the position information, the representative depth value being included in the depth meta information. Furthermore, the depth meta information may further include a depth value corresponding to depth of a screen as a reference for the depth value. Furthermore, for example, a display unit may be included that displays a three-dimensional image on the basis of the left-eye and right-eye display area image data on which the superimposition information data is superimposed. In this case, for example, the display unit may include a head mounted display. In this way, in the present technology, when superimposing the superimposition information data on the left-eye and right-eye display area image data, parallax is given to the superimposition information data superimposed on each of the left-eye and right-eye display area image data on the basis of the depth meta information including position information and a representative depth value of the predetermined number of angle areas in the wide viewing angle image. Therefore, depth control when superimposing and displaying subtitles and graphics by using depth information that is efficiently transmitted can be easily implemented. Furthermore, another concept of the present technology is a transmission device including: a transmission unit configured to transmit a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures and depth meta information for each of the pictures, in which the depth meta information includes position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image. In the present technology, the transmission unit transmits the video stream obtained by encoding image data of a wide viewing angle image for each of the left-eye and right-eye pictures, and the depth meta information for each of the pictures. Here, the depth meta information includes position information and a representative depth value of the predetermined number of angle areas in the wide viewing angle image. In this way, in the present technology, the video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and the depth meta information including position information and a representative depth value of the predetermined number of angle areas in the wide viewing angle image for each picture are transmitted. Therefore, depth information in the wide viewing angle image can be efficiently transmitted. According to the present technology, depth control when superimposing and displaying the superimposition information by using depth information that is efficiently transmitted can be easily implemented. Note that advantageous effects described here are not necessarily restrictive, and any of the effects described in the present disclosure may be applied. A mode for carrying out the invention (hereinafter referred to as an embodiment) will be described below. Note that the description will be made in the following order. 1. Embodiment 2. Modification [Configuration Example of Transmission-Reception System] The service transmission system 100 transmits DASH/MP4, that is, an MPD file as a metafile and MP4 (ISOBMFF) including media streams such as video and audio through a communication network transmission path or an RF transmission path. In this embodiment, a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures is included as the media stream. Furthermore, the service transmission system 100 transmits depth meta information for each picture together with the video stream. The depth meta information includes position information and a representative depth value of the predetermined number of angle areas in the wide viewing angle image. In this embodiment, the depth meta information further includes position information indicating which position in the areas the representative depth value relates to. For example, the depth meta information for each picture is transmitted by using a timed metadata stream associated with the video stream, or inserted into the video stream and transmitted. The service receiver 200 receives the above-described MP4 (ISOBMFF) transmitted from the service transmission system 100 through the communication network transmission path or the RF transmission path. The service receiver 200 acquires, from the MPD file, meta information regarding the video stream, and furthermore, meta information regarding the timed metadata stream in a case where the timed metadata stream exists. Furthermore, the service receiver 200 extracts left-eye and right-eye display area image data from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream. The service receiver 200 superimposes superimposition information data such as subtitles and graphics on the left-eye and right-eye display area image data for output. In this case, the display area changes interactively on the basis of a user's action or operation. When superimposing the superimposition information data on the left-eye and right-eye display area image data, on the basis of the depth meta information, parallax is given to the superimposition information data superimposed on each of the left-eye and right-eye display area image data. For example, parallax is given on the basis of the minimum value of the representative depth value of the predetermined number of areas corresponding to a superimposition range included in the depth meta information. Furthermore, for example, in a case where the depth meta information further includes position information indicating which position in the areas the representative depth value relates to, parallax is added on the basis of the representative depth value of the predetermined number of areas corresponding to the superimposition range and the position information included in the depth meta information. “Configuration Example of Service Transmission System” The control unit 101 includes a central processing unit (CPU), and controls an operation of each unit of the service transmission system 100 on the basis of a control program. The user operation unit 101 The left camera 102L and the right camera 102R constitute a stereo camera. The left camera 102L captures a subject to obtain a spherical capture image (360° VR image). Similarly, the right camera 102R captures the subject to obtain a spherical capture image (360° VR image). For example, the cameras 102L and 102R perform image capturing by a back-to-back method and obtains super wide viewing angle front and rear images each having a viewing angle of 180° or more and captured using a fisheye lens as spherical capture images (see The planar packing units 103L and 103R cut out a part or all of the spherical capture images obtained with the cameras 102L and 102R respectively, and perform planar packing to obtain a rectangular projection image (projection picture) (see The video encoder 104 performs, for example, encoding such as HEVC on image data of the left-eye projection image from the planar packing unit 103L and image data of the right-eye projection image from the planar packing unit 103R to obtain encoded image data and generate a video stream including the encoded image data. For example, the image data of left-eye and right-eye projection images are combined by a side-by-side method or a top-and-bottom method, and the combined image data is encoded to generate one video stream. Furthermore, for example, the image data of each of the left-eye and right-eye projection images is encoded to generate two video streams. Cutout position information is inserted into an SPS NAL unit of the video stream. For example, in encoding of HEVC, “default_display_window” corresponds thereto. The field of “def_disp_win_left_offset” indicates the left end position of the cutout position. The field of “def_disp_win_right_offset” indicates the right end position of the cutout position. The field of “def_disp_win_top_offset” indicates the upper end position of the cutout position. The field of “def_disp_win_bottom_offset” indicates the lower end position of the cutout position. In this embodiment, the center of the cutout position indicated by the cutout position information can be set to agree with the reference point of the projection image. Here, when the center of the cutout position is O(p,q), p and q are each represented by the following formula. Furthermore, the video encoder 104 inserts an SEI message having rendering metadata (meta information for rendering) in the “SEIs” part of the access unit (AU). The 16-bit field of “rendering_metadata_id” is an ID that identifies the rendering metadata structure. The 16-bit field of “rendering_metadata_length” indicates the rendering metadata structure byte size. The 16-bit field of each of “start_offset_sphere_latitude”, “start_offset_sphere_longitude”, “end_offset_sphere_latitude”, and “end_offset_sphere_longitude” indicates the cutout range information in a case where the spherical capture image undergoes planar packing (see The 16-bit field of each of “projection_pic_size_horizontal” and “projection_pic_size_vertical” indicates size information on the projection image (projection picture) (see The 16-bit field of each of “scaling_ratio_horizontal” and “scaling_ratio_vertical” indicates the scaling ratio from the original size of the projection image (see The 16-bit field of each of “reference_point_horizontal”and “reference_point_vertical” indicates position information of the reference point RP (x,y) of the projection image (see The 5-bit field of “format type” indicates the format type of the projection image. For example, “0” indicates equirectangular, “1” indicates cross-cubic, and “2” indicates partitioned cross cubic. The 1-bit field of “backwardcompatible” indicates whether or not backward compatibility has been set, that is, whether or not the center O(p,q) of the cutout position indicated by the cutout position information inserted in the video stream layer has been set to match the reference point RP (x,y) of the projection image. For example, “0” indicates that backward compatibility has not been set, and “1” indicates that backward compatibility has been set. The depth generation unit 105 determines a depth value that is depth information for each block by using the left-eye and right-eye projection images from the planar packing units 103L and 103R. In this case, the depth generation unit 105 obtains a parallax (disparity) value by determining sum of absolute difference (SAD) for each pixel block of 4×4, 8×8, and the like, and further converts the parallax (disparity) value into the depth value. Here, the conversion from the parallax value to the depth value will be described. In At this time, K is calculated by the following formula (1) from a ratio of S and E and a ratio of D and K. By transforming this formula, formula (2) is obtained. Formula (1) constitutes a conversion formula for converting the parallax value S into the depth value K. Conversely, formula (2) constitutes a conversion formula for converting the depth value K into the parallax value S. Returning to Here, the predetermined number of angle areas is set by the user operating the user operation unit 101 Furthermore, the representative depth value of each angle area is the minimum value of the depth value of each block within the angle area among the depth value of each block generated by the depth generation unit 105. The representative depth value DPi in the angle area ARi is the minimum value among a plurality of depth values dv(j, k) included in the angle area ARi, and is represented by formula (3) below. The position of each point is indicated by an azimuth angle φ and an elevation angle θ. The position of each angle area (not shown in Note that in the above description, the depth meta information generation unit 106 determines the representative depth value of each angle area by using the depth value of each block generated by the depth generation unit 105. However, as shown as a broken line in The subtitle generation unit 107 generates subtitle data to be superimposed on the image. The subtitle encoder 108 encodes the subtitle data generated by the subtitle generation unit 107 to generate a subtitle stream. Note that the subtitle encoder 108 adds, to the subtitle data, the depth value that can be used for depth control of subtitles during default view display centered on the reference point RP (x,y) of the projection image or the parallax value obtained by converting the depth value by referring to the depth value for each block generated by the depth generation unit 105. Note that it is considered to further add to the subtitle data the depth value or parallax value that can be used during view display centered on each viewpoint set in the depth meta information described above. Returning to Furthermore, the container encoder 105 inserts a descriptor having various types of information into the MP4 stream including the video stream in association with the video stream. As this descriptor, a conventionally well-known component descriptor (component_descriptor) exists. The 4-bit field of “stream_content_ext” indicates details of the encoding target by being used in combination with the above-described “stream_content.” The 8-bit field of “component_type” indicates variation in each encoding method. In this embodiment, “stream_content_ext” is set at “0x2” and “component_type” is set at “0x5” to indicate “distribution of stereoscopic VR by encoding HEVC Main10 Profile UHD”. The transmission unit 110 puts the MP4 distribution stream STM obtained by the container encoder 109 on a broadcast wave or a network packet and transmits the MP4 distribution stream STM to the service receiver 200. The MP4 stream (video track) has a configuration in which each random access period starts with an initialization segment (IS), which is followed by boxes of “styp”, “sidx (segment index box)”, “ssix (sub-segment index box)”, “moof (movie fragment box)” and “mdat (media data box).” The initialization segment (IS) has a box structure based on an ISO base media file format (ISOBMFF). Rendering metadata and component descriptors are inserted in this initialization segment (IS). The “styp” box contains segment type information. The “sidx” box contains range information on each track, indicates the position of “moof”/“mdat”, and also indicates the position of each sample (picture) in “mdat”. The “ssix” box contains track classification information, and is classified as I/P/B type. The “moof” box contains control information. In the “mdat” box, NAL units of “VPS”, “SPS”, “PPS”, “PSEI”, “SSEI”, and “SLICE” are placed. The NAL unit of “SLICE” includes the encoded image data of each picture in the random access period. Meanwhile, the MP4 stream (timed metadata track) also has a configuration in which each random access period starts with an initialization segment (IS), followed by boxes of “styp”, “sidx”, “ssix”, “moof”, and “mdat.” The “mdat” box contains depth meta information on each picture in the random access period. The 8-bit field of “viewpoint id” indicates an identification number of the viewpoint. The 16-bit field of “center_azimuth” indicates the azimuth angle from the view center position, that is, the view point position of the viewpoint. The 16-bit field of “center_elevation” indicates the elevation angle from the view center position, that is, the view point position of the viewpoint. The 16-bit field of “center_tilt” indicates the tilt angle of the view center position, that is, the viewpoint. This tilt angle indicates inclination of the angle with respect to the view center. The 8-bit field of “number_of_depth_sets” indicates the number of depth sets, that is, the number of angle areas. The following information repeatedly exists for the number of depth sets. The 16-bit field of “angle_t1_horizontal” indicates the horizontal position indicating the upper left corner of the target angle area as the offset angle from the viewpoint. The 16-bit field of “angle_t1_vertical” indicates the vertical position indicating the upper left corner of the target angle area as the offset angle from the viewpoint. The 16-bit field of “angle_br_horizontal” indicates the horizontal position indicating the lower right corner of the target angle area as the offset angle from the viewpoint. The 16-bit field of “angle_br_vertical” indicates the vertical position indicating the lower right corner of the target angle area as the offset angle from the viewpoint. The 16-bit field of “depth_reference” indicates the reference of depth value, that is, the depth value corresponding to the depth of screen (see The MP4 stream including the video stream (video track) and the MP4 stream including the timed metadata track stream (timed metadata track) are associated with each other by the MPD file. Although detailed description is omitted, the part surrounded by a dashed-dotted rectangular frame indicates information related to the video track. Furthermore, the part surrounded by a broken rectangular frame indicates information regarding the timed metadata track. This indicates an adaptation set (AdaptationSet) including the stream “preset-viewpoints.mp4” including the meta information stream of the viewpoint. “Representation id” is “preset-viewpoints”, “associationId” is “360-video”, and “associationType” is “cdsc”, which indicates linkage to the video track. The operation of the service transmission system 100 shown in The image data of the projection image obtained by the planar packing units 103L and 103R is supplied to the video encoder 104. The video encoder 104 encodes the image data of the projection image obtained by the planar packing units 103L and 103R, and generates a video stream including the encoded image data. In this case, cutout position information is inserted into the SPS NAL unit of the video stream (see Furthermore, the image data of the projection image obtained by the planar packing units 103L and 103R is supplied to the video encoder 104. The depth generation unit 105 obtains the depth value that is depth information for each block by using the left-eye and right-eye projection image from the planar packing units 103L and 103R. That is, the depth generation unit 105 generates the depth map (dpepthmap) that is a collection of blockbased depth value dv(j.k) for each picture. The depth map for each picture generated by the depth generation unit 105 is supplied to the depth meta information generation unit 106. The depth meta information generation unit 106 generates depth meta information for each picture. The depth meta information includes position information and representative depth value of the predetermined number of angle areas set on the projection image. The depth meta information further includes position information indicating which position in the area the representative depth value relates to. Note that the depth meta information generation unit 106 may use the depth map generated by the information obtained by using the depth sensor 111, instead of the depth map for each picture generated by the depth generation unit 105. Furthermore, the subtitle generation unit 107 generates the subtitle data to be superimposed on the image. The subtitle data is supplied to the subtitle encoder 108. The subtitle encoder 108 encodes the subtitle data to generate the subtitle stream. In this case, the depth value that can be used for depth control of subtitles during default view display centered on the reference point RP (x,y) of the projection image is added to the subtitle data. The video stream generated by the video encoder 104, the subtitle stream generated by the subtitle encoder 108, and the depth meta information for each picture generated by the depth meta information generation unit 106 are supplied to a container decoder 109. The container decoder 109 generates, as the distribution stream STM, a container containing the video stream, the subtitle stream, and the timed metadata stream having depth meta information for each picture, here, the MP4 stream. In this case, the container encoder 109 inserts the rendering metadata (see The MP4 stream obtained by the container encoder 109 is supplied to the transmission unit 110. The transmission unit 110 puts the MP4 distribution stream STM obtained by the container encoder 109 on a broadcast wave or a network packet for transmission to the service receiver 200. Note that in the above description, the depth meta information for each picture is transmitted by using the timed metadata stream. However, it is considered to insert the depth meta information for each picture into the video stream for transmission. In this case, a PSVP/SEI message (SEI message) including the depth meta information is inserted into the “SEIs” part of the access unit (AU) of each picture. “Service Receiver” The control unit 201 includes a central processing unit (CPU), and controls an operation of each unit of the service receiver 200 on the basis of a control program. The UI unit 201 The reception unit 202 receives the MP4 distribution stream STM transmitted from the service transmission system 100 on a broadcast wave or a network packet. In this case, the MP4 stream including the video stream, the subtitle stream, and the timed metadata stream is obtained as the distribution stream STM. Note that in a case where the depth meta information on each picture is inserted in the video stream and sent, no MP4 stream including the timed metadata stream exists. The container decoder 203 extracts the video stream from the MP4 stream including the video stream received by the reception unit 202, and sends the extracted video stream to the video decoder 204. Furthermore, the container decoder 203 extracts information and the like on a “moov” block from the MP4 stream including the video stream, and sends the information and the like to the control unit 201. As one piece of the information on the “moov” block, the rendering metadata (see Furthermore, the container decoder 203 extracts the subtitle stream from the MP4 stream including the subtitle stream received by the reception unit 202, and sends the subtitle stream to the subtitle decoder 205. Furthermore, when the reception unit 202 receives the MP4 stream including the timed metadata stream, the container decoder 203 extracts the timed metadata stream from the MP4 stream, extracts the depth meta information included in the timed metadata stream, and sends the depth meta information to the control unit 201. The video decoder 204 performs decoding processing on the video stream extracted by the container decoder 203 to obtain image data of the left-eye and right-eye projection image. Furthermore, the video decoder 204 extracts a parameter set or SEI message inserted in the video stream for transmission to the control unit 201. The extracted information includes information on the cutout position “default_display_window” inserted in the SPS NAL packet and furthermore the SEI message having the rendering metadata (see The subtitle decoder 205 performs decoding processing on the subtitle stream extracted by the container decoder 203 to obtain the subtitle data, obtains subtitle display data and subtitle superimposition position data from the subtitle data, and sends the subtitle display data and subtitle superimposition position data to the renderer 207. Furthermore, furthermore, the subtitle decoder 205 acquires the depth value that can be used for depth control of the subtitles added to the subtitle data during default view display, and sends the depth value to the control unit 201. The graphics generation unit 206 generates graphics display data and graphics superimposition position data related to graphics such as on screen display (OSD) or application, or electronic program guide (EPG), and sends the data to the renderer 207. The renderer 207 generates left-eye and right-eye image data for displaying a three-dimensional image (stereoscopic image) on which subtitles and graphics are superimposed on the basis of image data of the left-eye and right-eye projection images obtained by the video decoder 204, subtitle display data and subtitle superimposition position data from the subtitle decoder 205, and graphics display data and graphics superimposition position data from the graphics generation unit 206. In this case, under the control of the control unit 201, the display area is changed interactively in response to the posture and operation of the user. The scaling unit 208 performs scaling on the left-eye and right-eye image data so as to match the display size of the display unit 209. The display unit 209 displays the three-dimensional image (stereoscopic image) on the basis of the left-eye and right-eye image data that has undergone the scaling processing. The display unit 209 includes, for example, a display panel, a head mounted display (HMD), and the like. Image data VPL of the left-eye projection image is supplied from the video decoder 204 to the left-eye image data generation unit 211L. Furthermore, display area information is supplied from the control unit 201 to the left-eye image data generation unit 211L. The left-eye image data generation unit 211L performs rendering processing on the left-eye projection image to obtain left-eye image data VL corresponding to the display area. Image data VPR of the right-eye projection image is supplied from the video decoder 204 to the image data generation unit 211R. Furthermore, the display area information is supplied from the control unit 201 to the right-eye image data generation unit 211R. The right-eye image data generation unit 211R performs rendering processing on the right-eye projection image to obtain right-eye image data VR corresponding to the display area. Here, on the basis of information on the direction and amount of movement obtained by the gyro sensor equipped with HMD and the like, or on the basis of pointing information by the user operation or voice UI information of the user, the control unit 201 obtains information on the moving direction and speed of the display area and generates display area information for interactively changing the display area. Note that, for example, when starting display such as when the power is turned on, the control unit 201 generates the display area information corresponding to the default view centered on the reference point RP (x,y) of the projection image (see The display area information and the depth meta information are supplied from the control unit 201 to the depth processing unit 213. Furthermore, the subtitle superimposition position data and the graphics superimposition position data are supplied to the depth processing unit 213. The depth processing unit 213 obtains a subtitle depth value, that is, a depth value for giving parallax to the subtitle display data on the basis of the subtitle superimposition position data, the display area information, and the depth meta information. For example, the depth processing unit 213 sets the depth value for giving parallax to the subtitle display data as the depth value with the minimum value of the representative depth value of the predetermined number of angle areas corresponding to the subtitle superimposition range indicated by the subtitle superimposition position data. Since the depth value for giving parallax to the subtitle display data is determined in this way, the subtitles can be displayed forward of the image object existing in the subtitle superimposition range, and the consistency of perspective for each object in the image can be maintained. In the illustrated example, a display area A and a display area B are at positions including the viewpoint VpD. In this case, the display area A and the display area B have different area sizes, the display area A is wide and the display area B is narrow. There are variations in the size of the display area depending on how much display capacity the receiver has. Since the display area A includes an object OB1 in the close-distant view, the subtitle is superimposed so as to be displayed forward of the object OB1. Meanwhile, the display area B does not include the object OB1 in the close-distant view, and therefore the subtitle is superimposed so as to be displayed behind the object OB1 in the close-distant view, that is, forward of an object OB2 located far away. Each angle area has a depth representative value, and the solid polygonal line D indicates the degree of depth according to the representative depth value. The value the solid polygonal line D takes is as follows. That is, L0 to L1 is a depth representative value of the angle area AR1. L1 to L2, which is a part where the angle area is not defined, is a depth value indicating “far”. L2 to L3 is a depth representative value of the angle area AR2. L3 to L4, which is a part where the angle area is not defined, is a depth value indicating “far”. L4 to L5 is a depth representative value of the angle area AR3. L5 to L6 is a depth representative value of the angle area AR4. Then, L6 to L7 is a depth representative value of the angle area AR5. The broken line P indicates a depth value for giving parallax to the subtitle display data (subtitle depth value). When the display area moves, the subtitle depth value transitions so as to trace the solid polygonal line D. However, since the part L1 to L2 is narrower than the horizontal width of the subtitle, the subtitle depth value does not trace the solid polygonal line D and becomes the depth value L0 to L1 or the depth value L2 to L3. Furthermore, when the subtitle overlaps a plurality of depth value sections of the solid polygonal line D, the subtitle depth value follows the smaller depth value. Note that S1 to S3 schematically show one example of the subtitle position and the subtitle depth value at that time. As shown in Note that in a case where the display area overlaps both the angle areas AG_2 and AG_3 in this way, as described above, besides performing weighted addition on the representative depth values of the angle areas AG_2 and AG_3 to obtain the subtitle depth value according to the ratio of the display area overlapping each angle area and the like, it is possible to change the depth value stepwise in the target area, for example, on the basis of position information indicating which position in the area each representative depth value relates to. For example, in Furthermore, as shown in In a T1 state, the display area corresponds to the angle area AG_1. Since the display area is included in the angle area AG_1 for standard display, the subtitle depth value (depth value for giving parallax to subtitle display data) is the representative depth value of the angle area AG_1. Meanwhile, since the display area extends over the angle areas AG_0 to AG_2 for wide-angle display, the subtitle depth value is the minimum value of the representative depth values of the angle areas AG_0 to AG_2. Furthermore, in a T2 state, the display area corresponds to the angle area AG_2. Since the display area is included in the angle area AG_2 for standard display, the subtitle depth value (depth value for giving parallax to subtitle display data) is the representative depth value of the angle area AG_2. Meanwhile, since the display area extends over the angle areas AG_1 to AG_3 for wide-angle display, the subtitle depth value is the minimum value of the representative depth values of the angle areas AG_1 to AG_3. Furthermore, in a T3 state, the display area corresponds to the angle area AG_3. Since the display area is included in the angle area AG_3 for standard display, the subtitle depth value (depth value for giving parallax to subtitle display data) is the representative depth value of the angle area AG_3. Meanwhile, since the display area extends over the angle areas AG_2 to AG_4 for wide-angle display, the subtitle depth value is the minimum value of the representative depth values of the angle areas AG_2 to AG_4. The flowchart of Next, in step ST3, the depth processing unit 213 obtains a depth value distribution in the display area (see solid polygonal line D of Note that the depth processing unit 213 does not set the minimum depth value in the subtitle superimposition range as the subtitle depth value in step ST4. In a case where the display area overlaps a plurality of depth value areas, it is possible to avoid a sudden digital change in the subtitle depth value and cause a smooth transition in the subtitle depth value by performing weighted addition on each depth value according to the overlapping ratio to obtain the subtitle depth value. Returning to The depth/parallax conversion unit 214 converts the subtitle depth value and graphics depth value obtained by the depth processing unit 213 into parallax values to obtain a subtitle parallax value and a graphics parallax value, respectively. In this case, the conversion is performed by formula (2) described above. The superimposition unit 212 is supplied with the left-eye image data VL obtained by the left-eye image data generation unit 211L and the right-eye image data VR obtained by the right-eye image data generation unit 211R. Furthermore, the superimposition unit 212 is supplied with the subtitle display data and the subtitle superimposition position data, and the graphics display data and the graphics superimposition position data. Moreover, the superimposition unit 212 is supplied with the subtitle parallax value and the graphics parallax value obtained by the depth/parallax conversion unit 214. The superimposition unit 212 superimposes the subtitle display data at the superimposition position indicated by the subtitle superimposition position data of the left-eye image data and right-eye image data. At that time, the superimposition unit 212 gives parallax on the basis of the subtitle parallax value. Furthermore, the superimposition unit 212 superimposes the graphics display data at the superimposition position indicated by the graphics superimposition position data of the left-eye image data and right-eye image data. At that time, the superimposition unit 212 gives parallax on the basis of the graphics parallax value. Note that in a case where superimposition positions of subtitles and graphics partially overlap each other, for that part, the superimposition unit 212 overwrites the graphics display data on the subtitle display data. The superimposition unit 212 outputs left-eye image data VLD in which the left-eye subtitle display data and the left-eye graphics display data are superimposed on the left-eye image data. Furthermore, the superimposition unit 212 outputs right-eye image data VRD in which the right-eye subtitle display data and the right-eye graphics display data are superimposed on the right-eye image data. Note that as described above, the subtitle parallax value to give parallax to the subtitle display data can be obtained by the depth processing unit 213 obtaining the subtitle depth value on the basis of the subtitle superimposition position data, display area information, and depth meta information, and then the depth/parallax conversion unit 214 converting the subtitle depth chi. However, when displaying the default view, the subtitle depth value and subtitle parallax value sent in addition to the subtitle data can also be used. The operation of the service receiver 200 shown in The container decoder 203 extracts the video stream from the MP4 stream including the video stream, and sends the extracted video stream to the video decoder 204. Furthermore, the container decoder 203 extracts information on a “moov” block and the like from the MP4 stream including the video stream, and sends the information to the control unit 201. Furthermore, the container decoder 203 extracts the subtitle stream from the MP4 stream including the subtitle stream, and sends the subtitle stream to the subtitle decoder 205. The subtitle decoder 205 performs decoding processing on the subtitle stream to obtain the subtitle data, obtains subtitle display data and subtitle superimposition position data from the subtitle data, and sends the subtitle display data and subtitle superimposition position data to the renderer 207. Furthermore, when the reception unit 202 receives the MP4 stream including the timed metadata stream, the container decoder 203 extracts the timed metadata stream from the MP4 stream, extracts the depth meta information included in the timed metadata stream, and sends the depth meta information to the control unit 201. The video decoder 204 performs decoding processing on the video stream to obtain image data of the left-eye and right-eye projection image, and supplies the image data to the renderer 207. Furthermore, the video decoder 204 extracts the parameter set and SEI message inserted in the video stream, and sends the parameter set and SEI message to the control unit 201. In a case where the depth meta information is inserted in the video stream and sent, the SEI message including the depth meta information is also included. The graphics generation unit 206 generates the graphics display data and graphics superimposition position data related to graphics including OSD, application, EPG, and the like, and supplies the data to the renderer 207. The renderer 207 generates left-eye and right-eye image data for displaying a three-dimensional image (stereoscopic image) on which subtitles and graphics are superimposed on the basis of image data of the left-eye and right-eye projection image, subtitle display data and subtitle superimposition position data from the subtitle decoder 205, and graphics display data and graphics superimposition position data from the graphics generation unit 206. In this case, under the control of the control unit 201, the display area is changed interactively in response to the posture and operation of the user. The left-eye and right-eye image data for displaying the three-dimensional image obtained by the renderer 207 is supplied to the scaling unit 208. The scaling unit 208 performs scaling so as to match the display size of the display unit 209. The display unit 209 displays the three-dimensional image (stereoscopic image) whose display region is changed interactively on the basis of the left-eye and right-eye image data that has undergone the scaling processing. As described above, in the transmission-reception system 10 shown in Furthermore, in the transmission-reception system 10 shown in Note that the above-described embodiment has shown an example in which the container is MP4 (ISOBMFF). However, the present technology is not limited to the MP4 container, and is similarly applicable to containers of other formats such as MPEG-2 TS or MMT. Furthermore, in the description of the above-described embodiment, it is assumed that the format type of projection image is equirectangular (see Furthermore, the present technology can also have the following configurations. (1) A reception device including: a reception unit configured to receive a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures; and a processing unit configured to extract left-eye and right-eye display area image data from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream and to superimpose superimposition information data on the left-eye and right-eye display area image data for output, in which when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives parallax to the superimposition information data to be superimposed on each of the left-eye and right-eye display area image data on the basis of the depth meta information. (2) The reception device according to (1) described above, in which the reception unit receives the depth meta information for each of the pictures by using a timed metadata stream associated with the video stream. (3) The reception device according to (1) described above, in which the reception unit receives the depth meta information for each of the pictures in a state of being inserted into the video stream. (4) The reception device according to any one of (1) to (3) described above, in which when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives the parallax on the basis of a minimum value of the representative depth value of the predetermined number of angle areas corresponding to a superimposition range, the representative depth value being included in the depth meta information. (5) The reception device according to any one of (2) to (3) described above, in which the depth meta information further includes position information indicating which position in the areas the representative depth value of the predetermined number of angle areas relates to, and when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives the parallax on the basis of the representative depth value of the predetermined number of areas corresponding to a superimposition range and the position information included in the depth meta information. (6) The reception device according to any one of (1) to (5) described above, in which the position information on the angle areas is given as offset information based on a position of a predetermined viewpoint. (7) The reception device according to any one of (1) to (6) described above, in which the depth meta information further includes a depth value corresponding to depth of a screen as a reference for the depth value. (8) The reception device according to any one of (1) to (7) described above, in which the superimposition information includes subtitles and/or graphics. (9) The reception device according to any one of (1) to (8) described above, further including a display unit configured to display a three-dimensional image on the basis of the left-eye and right-eye display area image data on which the superimposition information data is superimposed. (10) The reception device according to (9) described above, in which the display unit includes a head mounted display. (11) A reception method including: receiving a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures; and extracting left-eye and right-eye display area image data from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream and superimposing superimposition information data on the left-eye and right-eye display area image data for output, in which when superimposing the superimposition information data on the left-eye and right-eye display area image data, parallax is given to the superimposition information data to be superimposed on each of the left-eye and right-eye display area image data on the basis of the depth meta information. (12) The reception method according to (11) described above, in which the depth meta information for each of the pictures is received by using a timed metadata stream associated with the video stream. (13) The reception method according to (11) described above, in which the depth meta information for each of the pictures is received in a state of being inserted into the video stream. (14) The reception method according to any one of (11) to (13) described above, in which when superimposing the superimposition information data on the left-eye and right-eye display area image data, the parallax is given on the basis of a minimum value of the representative depth value of the predetermined number of angle areas corresponding to a superimposition range, the representative depth value being included in the depth meta information. (15) The reception method according to any one of (11) to (14) described above, in which the depth meta information further includes position information indicating which position in the areas the representative depth value of the predetermined number of angle areas relates to, and when superimposing the superimposition information data on the left-eye and right-eye display area image data, the parallax is given on the basis of the representative depth value of the predetermined number of areas corresponding to a superimposition range and the position information included in the depth meta information. (16) The reception method according to any one of (11) to (15) described above, in which the position information on the angle areas is given as offset information based on a position of a predetermined viewpoint. (17) The reception method according to claim 11, in which the depth meta information further includes a depth value corresponding to depth of a screen as a reference for the depth value. (18) The reception method according to any one of (11) to (17) described above, in which the superimposition information includes subtitles and/or graphics. (19) A transmission device including: a transmission unit configured to transmit a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures and depth meta information for each of the pictures, in which the depth meta information includes position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image. (20) A transmission method including: transmitting a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures and depth meta information for each of the pictures, in which the depth meta information includes position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image. The major feature of the present technology is that when superimposing the superimposition information display data (subtitles and graphics) on the left-eye and right-eye display area image data, parallax is given on the basis of the depth meta information including the position information and the representative depth value of the predetermined number of angle areas in the wide viewing angle image, thereby making it possible to easily implement depth control when superimposing and displaying the superimposition information by using the depth information that is efficiently transmitted (see A video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of the predetermined number of angle areas in the wide viewing angle image for each picture are received. Left-eye and right-eye display area image data is extracted from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream. Superimposition information data is superimposed on the left-eye and right-eye display area image data for output. When superimposing the superimposition information data on the left-eye and right-eye display area image data, parallax is given on the basis of the depth meta information. 1. A reception device comprising:
a reception unit configured to receive a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures; and a processing unit configured to extract left-eye and right-eye display area image data from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream and to superimpose superimposition information data on the left-eye and right-eye display area image data for output, wherein when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives parallax to the superimposition information data to be superimposed on each of the left-eye and right-eye display area image data on a basis of the depth meta information. 2. The reception device according to the reception unit receives the depth meta information for each of the pictures by using a timed metadata stream associated with the video stream. 3. The reception device according to the reception unit receives the depth meta information for each of the pictures in a state of being inserted into the video stream. 4. The reception device according to when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives the parallax on a basis of a minimum value of the representative depth value of the predetermined number of angle areas corresponding to a superimposition range, the representative depth value being included in the depth meta information. 5. The reception device according to the depth meta information further includes position information indicating which position in the areas the representative depth value of the predetermined number of angle areas relates to, and when superimposing the superimposition information data on the left-eye and right-eye display area image data, the processing unit gives the parallax on a basis of the representative depth value of the predetermined number of areas corresponding to a superimposition range and the position information included in the depth meta information. 6. The reception device according to the position information on the angle areas is given as offset information based on a position of a predetermined viewpoint. 7. The reception device according to the depth meta information further includes a depth value corresponding to depth of a screen as a reference for the depth value. 8. The reception device according to the superimposition information includes subtitles and/or graphics. 9. The reception device according to a display unit configured to display a three-dimensional image on a basis of the left-eye and right-eye display area image data on which the superimposition information data is superimposed. 10. The reception device according to the display unit includes a head mounted display. 11. A reception method comprising:
receiving a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures, and depth meta information including position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image for each of the pictures; and extracting left-eye and right-eye display area image data from the image data of the wide viewing angle image for each of the left-eye and right-eye pictures obtained by decoding the video stream and superimposing superimposition information data on the left-eye and right-eye display area image data for output, wherein when superimposing the superimposition information data on the left-eye and right-eye display area image data, parallax is given to the superimposition information data to be superimposed on each of the left-eye and right-eye display area image data on a basis of the depth meta information. 12. The reception method according to the depth meta information for each of the pictures is received by using a timed metadata stream associated with the video stream. 13. The reception method according to the depth meta information for each of the pictures is received in a state of being inserted into the video stream. 14. The reception method according to when superimposing the superimposition information data on the left-eye and right-eye display area image data, the parallax is given on a basis of a minimum value of the representative depth value of the predetermined number of angle areas corresponding to a superimposition range, the representative depth value being included in the depth meta information. 15. The reception method according to the depth meta information further includes position information indicating which position in the areas the representative depth value of the predetermined number of angle areas relates to, and when superimposing the superimposition information data on the left-eye and right-eye display area image data, the parallax is given on a basis of the representative depth value of the predetermined number of areas corresponding to a superimposition range and the position information included in the depth meta information. 16. The reception method according to the position information on the angle areas is given as offset information based on a position of a predetermined viewpoint. 17. The reception method according to the depth meta information further includes a depth value corresponding to depth of a screen as a reference for the depth value. 18. The reception method according to the superimposition information includes subtitles and/or graphics. 19. A transmission device comprising:
a transmission unit configured to transmit a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures and depth meta information for each of the pictures, wherein the depth meta information includes position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image. 20. A transmission method comprising:
transmitting a video stream obtained by encoding image data of a wide viewing angle image for each of left-eye and right-eye pictures and depth meta information for each of the pictures, wherein the depth meta information includes position information and a representative depth value of a predetermined number of angle areas in the wide viewing angle image.TECHNICAL FIELD
BACKGROUND ART
CITATION LIST
Patent Document
SUMMARY OF THE INVENTION
Problems to be Solved by the Invention
Solutions to Problems
Effects of the Invention
BRIEF DESCRIPTION OF DRAWINGS
MODE FOR CARRYING OUT THE INVENTION
1. Embodiment
2. Modification
REFERENCE SIGNS LIST
![](/ipUS20210006769A1/0.png)
![](/ipUS20210006769A1/1.png)
![](/ipUS20210006769A1/2.png)
![](/ipUS20210006769A1/3.png)
![](/ipUS20210006769A1/4.png)
![](/ipUS20210006769A1/5.png)
![](/ipUS20210006769A1/6.png)
![](/ipUS20210006769A1/7.png)
![](/ipUS20210006769A1/8.png)
![](/ipUS20210006769A1/9.png)
![](/ipUS20210006769A1/10.png)
![](/ipUS20210006769A1/11.png)
![](/ipUS20210006769A1/12.png)
![](/ipUS20210006769A1/13.png)
![](/ipUS20210006769A1/14.png)
![](/ipUS20210006769A1/15.png)
![](/ipUS20210006769A1/16.png)
![](/ipUS20210006769A1/17.png)
![](/ipUS20210006769A1/18.png)
![](/ipUS20210006769A1/19.png)
![](/ipUS20210006769A1/20.png)
![](/ipUS20210006769A1/21.png)
![](/ipUS20210006769A1/22.png)
![](/ipUS20210006769A1/23.png)
![](/ipUS20210006769A1/24.png)
![](/ipUS20210006769A1/25.png)
![](/ipUS20210006769A1/26.png)
![](/ipUS20210006769A1/27.png)