Object identification in bird's-eye view reference frame with explicit depth estimation co-training