System and method of media file access and retrieval using speech recognition
(19)AUSTRALIAN PATENT OFFICE (54) Title System and method of media file access and retrieval using speech recognition (51)6 International Patent Classification(s) G10L 021/00 (21) Application No: 2003272365 (22) Application Date: 2003.09.12 (87) WIPO No: WO04/025623 (30) Priority Data (31) Number (32) Date 10/245,727 2002 .09 .16 (33) Country US 200511037 (43) Publication Date : 2004 .04.30 (43) Publication Journal Date : 2004 .05.20 (71) Applicant(s) JUNQUA, Jean-Claude; MATSUSHITAELECTRIC INDUSTRIAL CO., LTD. (72) Inventor(s) Rigazio, Luca; Nguyen, Patrick; Jurqua, Jean Claude; Kryze, David(-1-1) Application NoAU2003272365 A8(19)AUSTRALIAN PATENT OFFICE (54) Title System and method of media file access and retrieval using speech recognition (51)6 International Patent Classification(s) G10L 021/00 (21) Application No: 2003272365 (22) Application Date: 2003.09.12 (87) WIPO No: WO04/025623 (30) Priority Data (31) Number (32) Date 10/245,727 2002 .09 .16 (33) Country US 200511037 (43) Publication Date : 2004 .04.30 (43) Publication Journal Date : 2004 .05.20 (71) Applicant(s) JUNQUA, Jean-Claude; MATSUSHITAELECTRIC INDUSTRIAL CO., LTD. (72) Inventor(s) Rigazio, Luca; Nguyen, Patrick; Jurqua, Jean Claude; Kryze, David-1- An embedded device for playing media files is capable of generating a play list of media files based on input speech from a user. It includes an indexer generating a plurality of speech recognition grammars. According to one aspect of the invention, the indexer generates speech recognition grammars based on contents of a media file header of the media file. According to another aspect of the invention, the indexer generates speech recognition grammars based on categories in a file path for retrieving the media file to a user location. When a speech recognizer receives an input speech from a user while in a selection mode, a media file selector compares the input speech received while in the selection mode to the plurality of speech recognition grammars, thereby selecting the media file. CLAIMS What is claimed is: 1. An embedded device for playing media files and generating a play list of media files based on input speech from a user, comprising: an indexer generating a plurality of speech recognition grammars, including at least one of: (a) a first indexer generating a first speech recognition grammar based on contents of a media file header of the media file ; and (b) a second indexer generating a second speech recognition grammar based on categories in a file path for retrieving the media file to a user location; a speech recognizer receiving an input speech from a user while in a selection mode; and a media file selector comparing the input speech received while in the selection mode to the plurality of speech recognition grammars, thereby selecting the media file.
2. The device of claim 1, wherein said indexer generating the plurality of speech recognition grammars includes the first indexer generating the first speech recognition grammar based on contents of the media file header of the media file.
3. The device of claim 2, wherein said indexer generating the plurality of speech recognition grammars includes the second indexer generating the second speech recognition grammar based on categories in the file path for retrieving the media file from the user location.
4. The device of claim 1, wherein said indexer generating the plurality of speech recognition grammars includes the second indexer generating the second speech recognition grammar based on categories in the file path for retrieving the media file from the user location.
5. The device of claim 1, wherein the media file contains speech, and said indexer generating the plurality of speech recognition grammars includes a third indexer recognizing speech within the media file and generating <Desc/Clms Page number 11> a third speech recognition grammar based on the recognized speech within the media file.
6. The device of claim 1, wherein said speech recognizer receives an input speech from the user while in a non-selection mode, wherein said indexer generates a classification based on the input speech received while in the non-selection mode, and wherein said indexer includes a fourth indexer generating a fourth speech recognition grammar based on the generated classification.
7. The device of claim 1, wherein said indexer includes a fifth indexer generating a fifth speech recognition grammar based on supplemental descriptive text associated with the media file and provided in a data store on a computer network.
8. The device of claim 1 comprising: a data link receiving the media file over a computer network; and a data store storing the received media file in association with the plurality of speech recognition grammars.
9. The device of claim 1 comprising a play list generator operable to add the media file to the play list upon selection of the media file while in an insertion mode.
10. The device of claim 1 comprising a play list generator operable to remove the media file from a play list upon selection of the media file while in a deletion mode.
11. A method of selecting a media file using input speech, comprising: generating a plurality of speech recognition grammars, including at least one of: (a) generating a first speech recognition grammar based on contents of a media file header of the media file ; and (b) generating a second speech recognition grammar based on categories in a file path for retrieving the media file to a user location; receiving an input speech from a user while in a selection mode; and <Desc/Clms Page number 12> comparing the input speech received while in the selection mode to the plurality of speech recognition grammars, thereby selecting the media file.
12. The method of claim 11 wherein said generating a plurality of speech recognition grammars includes generating a first speech recognition grammar based on contents of the media file header of the media file.
13. The method of claim 12, wherein said generating a plurality of speech recognition grammars includes generating a second speech recognition grammar based on categories in a file path for retrieving the media file from a user location.
14. The method of claim 11, wherein said generating a plurality of speech recognition grammars includes generating a second speech recognition grammar based on categories in a file path for retrieving the media file from a user location.
15. The method of claim 11, wherein the media file contains speech, the method comprising recognizing speech within the media file, wherein said generating a plurality of speech recognition grammars includes generating a third speech recognition grammar based on the recognized speech within the media file.
16. The method of claim 11 comprising: receiving an input speech from the user while in a non-selection mode; generating a classification based on the input speech received while in the non-selection mode; and associating the generated classification with the media file, wherein said generating a plurality of speech recognition grammars includes generating a fourth speech recognition grammar based on the classification associated with the media file.
17. The method of claim 11, comprising generating a fifth speech recognition grammar based on supplemental descriptive text associated with the media file and provided in a data store on a computer network. <Desc/Clms Page number 13> 18. The method of claim 11 comprising: receiving the media file over a computer network; and storing the received media file in a data store in association with the plurality of speech recognition grammars.
19. The method of claim 11 comprising: entering an insertion mode; and adding the media file to a play list upon selection of the media file while in the insertion mode.
20. The method of claim 11 comprising: entering a deletion mode; and removing the media file from a play list upon selection of the media file while in the deletion mode.