User interface/entertainment equipment of imitating human interaction and loading relative external database using relative data

18-01-2006 дата публикации

Номер:

CN0001237505C

Автор: GUTTA H J STRUBB L ESHELMAN S, H. J. STRUBB,L. ESHELMAN,S. GUTTA

Принадлежит: Koninklijke Philips Electronics NV

Контакты:

Номер заявки: 85-53-0180

Дата заявки: 24-10-2001

[1]

Background of the invention

[2]

1. Invention field

[3]

The invention relates to devices for through various forms of output forms such as light flashing, artificial language, computer-generated animation, sound, and human interaction between users, in order to create an impression on the presence of a person, in his mood, a personality characteristics and to the conversation.

[4]

2. Background

[5]

As the technology of the ever-increasing complexity of, the majority of the device is associated with a variety of possible functions and tab is daunting. This phenomenon can be confirmed by the satellite and cable TV, wherein some cases to a program selection is very difficult to operate of the number. In many instances this phenomenon, including the cellular telephone, and personal computer application program, such as electronic commerce system. In this type of environment, if the machine can be carried out from day-to-day work draws some of the multiple number of options to select in, this will be very useful. However, usually the solution to the problem of not more than to reduce the pain of the patient. For example, for each user by using the user template filter a large number of options of the user interface according to the preferences of the user is trained. For example, user by actively to his/her preferences and abominations is classified ("customized"), can input his/her preferences. The process can also be passive realize, for example, by the computer processing process in a period of time established by the "observation" the choice of the user ("personalization"). In the transfer of such systems to the Gemstar and Philips electronic a large number of Patent applications have been discussed. For example, United States Patent No. 5515173 "the TV system using video recorder external channel selector automatically recording television program system and method"; U.S. Patent No. 5673089 "used for the theme-channel scanning device and method"; U.S. Patent No. 5949471 "used for improving the parents the TV set using a control device and method". Other examples in the U.S. Patent No. 5223924 given in.

[6]

Clear or passive, to allow specified in detail user interface of the strokes of the usually very complex, so as to lose the interest and the ocular. More and more towards the systems of this type of the so-called "intelligent" system development, this kind of system to can look like is the human assistant rather than the control panel. For example, in complex software application program such as dialogue in MicrosoftOffice with the help of frame receiving natural sentence, and to give quasi-synchronization with the animation in response to the text. Receiving natural language questions interface AskJeeves another embodiment, a search engine on the Internet.

[7]

The user interface is rapidly from the function-oriented system development to the object-oriented interface. In the former a system in a series of steps is executed and the last step gives the desired result, the user is allowed a class of system processing of major concern in the outcome events and provides immediate feedback. This idea is the ultimate expression of non-imperative mood interface, such as allowing the user to control, and experience effect and non-execution of the results of a series of steps in order to obtain the virtual reality environment. Agent is another type of non-imperative mood interface. Some of the studies is the ultimate aim is to provide a highly intelligent agent, they can better understanding of the user's needs and lifestyle all of the information and can at any given time to be precise the retrieved user information required. However, this kind of specific interface non-imperative mood and to not complete in the product to the reality can be introduced. One of the problems is to require software technology is highly complex and require a large number of programming and/or training. Now, still need to be informed of the computer what to do, and those who can give "intelligent" real impression of an easy-to-use interface, for most application equipment, is still distant deram.

[8]

However, those who look brainy and practical and is not the case of the computer has been in existence for some time. Computer software technology development began in difficult field of the 20 century 60 in the mid-term, including computer scientists MIT Joseph in addition creates a Weizenbaum look brainy, but no practical simple procedures. The program of the spoken by attempting to let users and encourages him continuous conversation simulation of a Rogerian spirit of the role of the treating physician. The program is called ELIZA. Many users found ELIZA very convincing. The Secretary Weizenbaum are required even with this program alone can stay for a while. However, only you willing to accept a Rogerian spirit of the treating physician very passive mode, wherein the "treating physician" in the reply is usually by repeating the "patients" part of the contents of the said or by asking the "patient" has just been said to explain or elaborate manner, ELIZA's conversation is effective only. Makes ELIZA seem convincing is that it does not to rote manner this conversation. For example, if the user on his mother, ELIZA will request the user to tell "her" more about his family.

[9]

The following is a section of the sample generated ELIZA:

[10]

The user: to all persons are the same.

[11]

ELIZA: which?

[12]

User: they are always to Bother I, is not the matter that is incident.

[13]

ELIZA: can you think of a specific example??

[14]

User: you can see, my boyfriend the I here.

[15]

ELIZA : get to you of your boyfriend to here.

[16]

User: he said, I-too much time.

[17]

ELIZA: heard that you unhappy I sad.

[18]

User: really, I am unhappy.

[19]

ELIZA: do you think coming here will help you get rid of those who are not happy??

[20]

User: I need help, there is a genuine need for like.

[21]

ELIZA: help you get to, and this for you mean??

[22]

The user: and perhaps I could get along with my mother, very well.

[23]

ELIZA: relevant you tell me more about the home.

[24]

User: I I mother concern.

[25]

ELIZA: in your house you other concern?

[26]

User: my father.

[27]

ELIZA: your father.

[28]

The user: in some areas you are a little bit like my father.

[29]

ELIZA: you see that we have what similar??

[30]

So far, practically all of these so-called conversation simulator using the a basic method (rules) based on template matching rather than any real understanding a simple answer. For example, template " I wish I were <x>" (Wherein <X>Arbitrary a string of words representative of) a description of the user " taller. I wish I were" phase matching. The template and the machine reply " do you wish you were Why <x>? " Association. The machine using the following statements responded :" Why do you wish you were taller? " Differentiated a conversation simulator with another kind of local rather than to say that they are sophistry and complexity, it would be better to say that they are in response to the scale and diversity of the database is more precise. More complex variant should have more large-capacity template database, the answer thereof and included in a reply can be followed without law, such reply than the passive ELIZA, straightforward answer more interesting.

[31]

Some conversation simulator to provide the relevant information for a specific subject matter, and is better than ordinary conversation simulator. For example, conversation simulator is used to provide the relevant information of a particular topic. Basically, their response library on a particular theme to prepare the issue of work, and to provide "canned" type of answer. Some conversation simulator it seems programmed to have a life experience can be about. When they do not provide the appropriate template matching of the time when the continuation of the conversation, they will be briefed on their experiences.

[32]

A typical conversation simulator is described as having two parts: a user interface shell and a database. The user interface is a computer program, no matter which kind of character database or information database is used it is basically kept unchanged. Such a database can entrust the conversation simulator character, knowledge, and the like. It includes the relevant problem of a certain topic of the particular answer and information. The database has pre-defined template and problems linked to an answer. The authenticity of the conversation simulator can depend on how the generator of the database the proper degree the issues raised it may be expected, and the same with a total of the answer to the question of a mode. The user interface receives a the problem of people, from beginning to end and return to the search template with the most appropriate (or the most appropriate arbitrary) a reply (or a group of reply). The technical requirements of the typical database creation program designer; in the user interface do not have the relevant natural-language initial knowledge, and the system can not rely on its own to study. The system is not perfect, when the appropriate matching cannot be found the system will provide when it is difficult to understand or only without ignored. However, this is not to be tolerated. In principle, should be comprehensive database can be imagined to various situations of the work, but even if only 80% of the can be adequately addressed the problem, it seems also sufficient to maintain the interest of the people.

[33]

Manufacturing of the machine can be of another kind of method is application of more complicated "intelligent" technology, however, as mentioned above, this requires more high complexity and/or more training, so that the simulator cannot be used for conversation. Such attempts Mega Hal, for example, which gives a feeling of virtually no impression. However, this kind of intelligent technology has its usefulness. A is referred to as "computer philological" field of study, that is, a branch of artificial intelligence, are trying to the development of a language grammar description of, or algorithm. The technology can be used for the syntactic analysis of a sentence is completed and in a sentence such as identification of the most important words or identification direct object and verb and the task of other similar content. In fact, the study has been made for a long time. Computer linguists to a requirement to enable the computer to be a truly understand the contents of the people are on a vocabulary and composite semantic of the technology is very interested. This should be from words (write down or out of) the word determining independent use and in a narrow or broad use in the context of the meaning of the word. However, the computer is programmed in order to distinguish an ambiguous meaning of a word is far from requests to allow a computer appropriately responded with at least one word reply difficult.

[34]

In the conversation simulator, has been of successful application of the technology typically with the input of the user to work compared with the template database. They select a predefined good statements can be with user of the "best" matching template, and generates a with the template-related reply. In order to more detailed description of the mechanism, the use of a specific example is helpful. For this purpose, we will use Splotch, Carnegie Mellon University, which is composed by Duane Fields of creating a program, the source code can be from CMU disclosed on the web site to obtain. "Splotch" is a variant of "Spot", so-named because it is a kind of pang Ai , in other words, the optical point determining means.

[35]

Splotch as like other such programs, works by template matching. User input is compared with the template database. In those matching the template, of the highest-ranking template is selected, then the one of the template-related reply selected as an output. The template may be a single word, the combination of the word or phrase.

[36]

A single template can be replaced can be included in the word or phrase. For example, "money" template can also and matching the word "cash". There is also another kind of method can be designation of these replacement: synchronous dictionary. Input of the user is compared to the template before Splotch, the word input in and phrase are converted into a standardized forms. This is through their word in the phrase dictionary and and all of the variants is used preferably in the form of and replace. The majority of such variants is the spelling can be used, including spelling mistake. For example, can be converted into "kool" "cool", "gotta" into the "got to". In this way, it will enable a single template can be selected with a plurality of replaceable equivalent, however, to match the word or phrase, without the need for various templates are designated the choice of these can be replaced.

[37]

Word or phrase in the template may be marked, or the prohibition of achieving the necessary. If a word or phrase is used to prohibit matching, then when the word or phrase appears, in this a particular template in that there is no match. For example, " business :! You of none ", if the phrase" none of "through the front" you! "Is marked into the front must be space-time, Splotch are not able to" business " template for matching. On the other hand, when a word or phrase is marked to be used for the necessary include the time, then if the designated word or phrase when there exists no match will fail. For example, when the user input includes a word "gender" or "sex", only also includes the word "what", "gender: sex: & what" template will successfully match.

[38]

Furthermore, the template may be variable. For example, " you like Do <x>" Template has a variable as its 4th a. Variable can be transmitted to the reply, for example " No, I don ' t like <x>. ". In this case, after "Do you like" all the terms must be variable. The template " are Men <x>Than "in, the" woman "and" than are " word between the need is variable.

[39]

Each template has one assigned grade by the actuator. When the user tried to Splotch with all its reply after matching template, the template with the highest grade of selected, is then used together with the template is a reply list one of the answer. Next a same template is chosen, will select another different reply, until all the cited the cycle a reply time from beginning to end.

[40]

In addition to the variable transmitted from the template, responses may have other types of "variable". Such variable representing the word or phrase can be replaced to the placeholders. For example, in response to " @color.w My is favorite color" in, that can be color is from one comprising color word list "color.w" in a document of arbitrary selected. The associated with a template of a reply is in fact a plurality of replaceable reply. "@" in the phrase in the document may also include its own pointing to other "@" file pointer.

[41]

Prior art conversation simulator, unless they include a large number of already been installed template file, otherwise, it will tend to repeat. Template document is not easy to be operated. Furthermore, even with a large number of alternative templates, a conversation simulator remains static. For example, USSR real known (Soviet union) disintegration of the already, no longer maintain its spy in the movie in the create a romantic Lenovo. As early as 1989 years before the conversation simulator programmed plurality of such will be included in the template, if they provide is a reply from the mouth of a person will be sounds the special bizarrely.

[42]

The majority of prior art conversation simulator, if they are indeed the character the analogue, their completed in this respect is very poor. For example, Hutchens' Hex is successful, because it has Lampoon and pooh-poohed character. Of course, under the prior art lack the conversation simulator showing the character of any depth. The conversation simulator can not simulate the relationship of mutual trust in the people do indeed the sharing of a certain way, because they have no history or experience can share; apart from the lack of character displays, they usually also lack of an identity of the external.

[43]

Conversation simulator is often designed to encourage users to talk. This is of course in this kind of procedure the idea of pioneer ELIZA after. However, is used for initiating conversation skills of the user will soon become it was fatigue and predictable. One kind is used for making the conversation simulator interesting is to design the conversation simulator so that it is able to offer solid or entertainment information. Because the conversation simulator cannot understand the semantic user inquiry, any reply to questions or statements that have LED to a reply is not appropriate. Furthermore, use of the simple fact that all its known immediately and gadfly. The most convincing conversation simulator to encourage users to talk, and more according to the emotional than fact to reply, sentence views and for the user to view and values of reaction (such as support). This is not to say that the conversation simulator is not at the same time compelling content of freedom can be achieved. Hutchens the work of the full on the, to provide such Hex usually found in the so-called chat information.

[44]

Conversation emluator another problem is that they are particularly based on the user and give up the current topic SUMMARY reply. The contextual relations without sensing, but also difficult to create a contextual relations of the perception of the simulation. A solution is by one of the old topic caused by the user, to require the user of the subject by means of a template of the reply, for example, a relevant topic <x>Problems, provide some persistence mechanism. However, some selfproclaimed context-sensitive conversation simulator will always adhere to a theme, even if the user wants to replace the topic is also so.

[45]

In machine learning programme, new conversation content is from the previous conversation or conversations by sampling in the study, this kind of scheme will not be successful. This method usually produce xinqi the reply, however, such a reply is normally meaningless. This problem stems from the fact that part of the: these techniques attempt to adopt a large number of input to output to choose from a large number, at the same time also accompanied by the large number of training and in the result the requirements of the error is not predictable.

[46]

Even for highly credible conversation simulator, a long-term perspective, they are basically a kind of amusement, a pastime activities. Once understand what they can do, the majority of the people want to ask why some people to the time spent on the conversation simulator. Initially the majority of the people are interested in, later irks the, therefore, even the entertainment value of conversation simulator is also limited. The use of the information in the reply template blank in the loading, or, when the computer philological method is the use of the new phrase structures or ideas, in addition, the user is sent to the conversation simulator eventually all of the data the loss without reservation. Therefore, all those data is only more tattle, however, no new growth of knowledge, there is no a bit is in use. Strengthening this point the simulator regarded as interesting test, almost has no utility judging capability point of view.

[47]

Conversation emluator another problem is to use their is not a natural action. At present, no a conversation simulator its behavior can be used as an example of a large number of common sense, this kind of common sense such as, it can know when to invite users to join in the dialogue, or when to stop, pause, or change the topic. Even if some special useful conversation simulator to say that, also on the conversation simulator to provide such ability of known policy requirements, proposals, or even awareness.

[48]

A field of research, the technology can be applied to the computer program, is usually referred to as the so-called "computer science affective information". This is on the application computer to human feelings and personality characteristics in order to create a better response to user interaction. For example, United States Patent 5987415 describes a system, wherein the user feeling state of the network model and personality characteristics is derived, the derived from the result is used for replacement of the various can be generated by the application program selected in the interpretation of. The method is triggered by the failure of the search system. Fault finding system using user in question in order to help users to diagnose and personally solve the problem on the basis of the mechanical system, attempt to acquire the fault information of the malfunction of the computer, for example. The method may be summarized as follows. First, system is based on a network model to confirm the user's mood, the network model link a is expected sentence various replaceable interpretation. The emotion and the character of the engine to produce the feedback to the user in the desired emotional and personality characteristics are associated together. Mood descriptor is used to infer user's mood, the mood descriptor related processing cause is generated and is used for the appropriate reply from the independent replaceable in the interpretation of the selected. Therefore, if the computer a given independent reply there are two possible interpretation (for example: " it Give up" or " Sorry, I cannot help you! "), The application program will select one of them, the option should be are the most programmed according to the user's mood/personality the identified best suited to require the computer mapping of emotional and character. In short, the existence of a random model, is used to determine the mapping of the reply by the user emotion and the character, then a model is used for the user's mood and character and the computer mapping the expectations of the associated the emotion and the character. Finally, with the desired emotional and character of the best match to the interpretation of the reply selected, and, in turn, by means of the same random model the interpretation is used for generating reply.

[49]

The above-mentioned user interface in character and separate from the content. Also, stochastic models also extremely difficult to up. In the past, in the rules-based conversation simulator systems enjoy great influence and successful.

[50]

For the computer to convey user attitude another technical measures of a user interface can be set manually. The user may explicitly indicate his/her attitude, for example, by in a facial Image on the mouse to a frown a smiling face. This kind is used for the creation of user interface method in United States Patent No. 5977968 is in. However, utilizing this kind of interface may convey emotion range is limited, and in this way to convey sensibility is also difficult and not natural.

[51]

Emotional state of the user is determined by the computer to another field of application is medical diagnosis. For example, United States Patent 5617855 described in a system, it will face and sound character and EEG and other diagnostic data classification in order to assist the diagnosis. The apparatus is directed to psychiatrie and neurological field.

[52]

In other fields of application, the machine can automatically detecting existence of user or the user's specific characteristic, the machine for the purpose of authentication and authorization or for convenient to consider. To this end, some prior art systems employ biometric sensing technology, proximity detectors, radio frequency identification mark or other equipment.

[53]

Another to be input in the system of the state of user emotion JP10214024 is in, wherein a device generates scenes based on a video input. Emotional state of the user through a part identification system information from the user is input, and is used for the development of control plot.

[54]

Invention overview

[55]

An interactive simulator and conversation simulator similar, but a wider range of possible input and output. With people and machinery possible to express their other than. For example, people can use gestures, remote control, the eyes and such sound (hand clap). The machine can make the lights flash, creating a computer-generated animation, the mechanical equipment and the like, becomes angry. Interactive simulator is a more general term, it encompasses the entire range of can be used to create expression interaction between the user and a machine the input and the output. In short, this invention is a kind of interactive simulator, it provides lower than that of the prior art of the use of the simulator is more convenient, enhances the interaction between the user and the quality of the simulator, but also expanded by the interaction with the simulator to the practicability of the evolved. The invention is also a user interface field provides data storage and retrieval of some of the benefits. Therefore, the present invention is built around an interaction simulator is set up by the, the simulator by automatically make its own a particular user to the personality of each user in response to the unique aspects. Furthermore, by the interaction simulator by adopting a system and method have offered a kind of mechanism, with the mechanism simulator can be activated interactive response to the user's situation. For example, a conversation simulator embodiment of conversation can be stopped in order to avoid interrupting the user's soliloquy, and terminated after the conversation with the user was sleeping. Moreover, interactive simulator practicability can be extended, with through the passive user with useful information collected in these information can be used to realize the system. For example, an electronic programming guide preferred item database can be an extract from the dialog box in the preferences and abominations and applying them in the database and is expanded. Such data may be in response to the needs of the database from the user is lead out there. Furthermore, interactive simulator model can be extended to the input and the output in the form of the range. For example, with a TV set the audio input/output performance by adding artificial voice can be generated in the conversation, at the same time on the case with the synchronization of the television lamp or a color conversion, or synchronized animation on the screen, in order to provide a television is the impression that the conversation. The user can also be through the expression of the gestures, sound, body position, manual control in is input to the interaction simulator. Furthermore, interactive simulator in the output of the content of independent existence from the by providing a more new data resources on a regular basis or on-site data feedback in the ability of obtaining information is enhanced. The removal of such information from the simulator and/or other interaction data collected in the operation.

[56]

With photos brief description

[57]

Figure 1 is diagram of a hardware environment solution explanation , wherein an embodiment of the invention in practice.

[58]

Figure 2 is a functional chart overview, describes a can be used for example practice according to the embodiment of the invention in the system of the software of this invention between each process of the data stream.

[59]

Figure 3-5 together form Figure 2 in a more detailed representation of the flow chart.

[60]

Fig. 6 illustrate examples of the: a user is asleep, Figure 3-5 the system in response to this situation.

[61]

Fig. 7 illustrate examples of the: user is another person interrupt, Figure 3-5 the system in response to this situation.

[62]

Fig. 8 illustrate examples of the: the user is in large laughs, Figure 3-5 the system in response to this situation.

[63]

Fig. 9 illustrate examples of the: user is to discuss a topic of interest, Figure 3-5 the system in response to this situation.

[64]

Fig. 10 illustrate examples of the: user is feel desponding, Figure 3-5 the system in response to this situation.

[65]

Figure 11 illustrate examples of the: user is indicated an interest, Figure 3-5 through the expansion of the system in the external data in the database in response to the situation.

[66]

A detailed description of the preferred embodiment

[67]

The present invention includes a combination of the basic principle, it represents the conversation simulator technology toward the more effective a step in the direction of. Conversation simulator has been confirmed the prior art can be done quite convincing. The invention presented here is based on the characteristics of this kind of strength, through the use of other the proven technology, they can be, for example, to classify the environmental characteristic of the machine recognition system, the extended. The result is interactive simulator looks with more shared common sense, acts more humane, and more convenient to use. The characteristic feature of the invention is also based on the convincing conversation simulator technology, education or joviality in order to carry on, this is through the use of information in an effective manner of realizing alternately, for example, by increasing a preferred item database or from the data resources, such as Internet access to further information. The invention relates to these major of the driver into the other must be mentioned problems. For example, if the conversation simulator to become a future domestic electronic or office electronic useful main product, it must be free to adapt to the fight. First of all, to these problems is proposed.

[68]

To become a convincing accompany persons, preferably through conversation simulator to exchange language, can be provided by the user responsive to the social environment. Because chaperoning is a social relationship, conversation simulator must be able to exhibit socially correct behavior. According to one embodiment, this can be through the following method is provided: in the following manner to the conversation simulator to provide the relevant information of a particular user and limiting the simulator simulator, which is one of the rules in the outer surface of can be provided in the form of a, conversation simulator and given a consistent loveable character. In order to make the conversation simulator can appropriately respond to the particular user, can be a conversation simulator extended by the system, the system allows individual it identifies the, result of the conversation simulator adapt to different user and to adapt to the same user for a long time.

[69]

Preferably, the conversation simulator should be audible language as input and output means. With many Splotch as other conversation simulator, are keyed to the text of the interaction with the user. From the language of the text output of the is a direct, but the problem is that such a generation equipment sound prosaic. Several alleviate the problems of the method has been provided. First, the sentence that is not in the storage standardization (response template) and phrases as a SUMMARY text and through the text-to-speech converter will be its output, but these response to pitch change of the template the text with them are stored together. Pitch change in the phrase or sentence is also provided in the expression of the variable. For example, in a template file Ex1 a standard sentence:

[70]

EX1: Tell me, more ' about, why " you, hate <x>.

[71]

Apostrophes indicate that the preceding word should be in order to emphasize the tone state. Quotation marks indicate that more strong stresses, comma-expressed with that of. No mark indicates moderate emphasis. for <x>A variable indicating the sentence from a user said. Because it is repeated with moderate emphasis, therefore, lack of accent mark. The emphasis of the variable phrase from the one associated with the standard template reply in the formula of the projections. Since the template is a problem, and usually is expected to be lead-out with intimate and the sensitive nature of the information, it searches for the variable will eventually weaken the stress. Therefore, if the phrase is:

[72]

Going to school,

[73]

A may be marked, the final syllable with reduced emphasis on. In template sentence EX2 with the same variables in a phrase is in contrast to the situation of use.

[74]

EX2: What? You don ' like t <go aodirenc "ing to school">

[75]

Here, the stress is sings the type , highlighted. System designers in accordance with his/her needs and priority, the detailed content of the rule selected. However, preferably, the rules should follow the relevant language of the natural voice-style. In the above example, even if the can not be known in advance of the variable phrase, is also feasible to defined rules. Rules for variable phrases is not predictable. However, the template sentence in which use of the information provided by the standard template than simple capable of forming a better for providing tone rules, this is used for EX1 is gradually reduced and that the rules for EX2 sings the type of rules. Note that in the above embodiment, only one-dimensional tone is discussed, but it should be understood the tone, include pitch, loudness, the beat and other scale. These dimension can be provided for by the appropriate programme separate processing these measures make each syllabics has a corresponding to the pitch-of the loudness.

[76]

There are also several kinds of other modes are used for non-solution and from a fixed template language obtained in the tone problem. One method is recording of the variable phrase back to the user. For example, if the user says "I think my English teacher is completely out of her mind", conversation simulator on playing " do you think your Why", the next following a section of said by the user recording a "English teacher is completely out of her mind". Quality of voice can be digitized to be modified, in order to imitate the conversation simulator interface. The disadvantages of this method lies in that, in this example, may be sounds some satirize, because the user's sentence and conversation simulator different requirements for the sentence the tone pattern. By modifying the sound data change tone mode is feasible. Another alternative method is to let conversation simulator interface with its own language generator to the attention of the tone and try to re-generated (the same or modified, such as to form a a problem and is not the user's statement).

[77]

Another problem is caused by the associated with the language comprehension. It must address the issue of when to determine the words user to end, so that it may be provided within the expected reply time. The prior art text-based conversation simulator systems determine when the reply is expected to through a simple, for example, by entering one or two residues will be realized that the same identifier. In the usually spoken conversation, no such specific indicator may be utilized. However, as with an appropriate conversation simulator should be aware of the talks when to avoid speaking user to end the nozzle. On the other hand, when the conversation simulator is speaking if the user interrupt, conversation simulator must be able to recognize this point to and stop speaking, in response to the proper. Several methods can be used alone or jointly.

[78]

1) exceeds the threshold value of the time interval can be seen as a speech pause of the end of the signal.

[79]

A) the threshold value stallings can be according to the user the rhythm is adjusted. Of the voice conversation simulator conversationists response comparison of the response of the slow speaking much faster conversationists.

[80]

B) pause threshold value can be in accordance with a user's speech and tone of the template, which has been modulated result of the comparison. Since sentence cuts stressed at the ends usually, this point can be used to shorten the delay time.

[81]

2) conversation simulator can (or other) according to the above scheme is simple to carry out the best guess, if it is interrupted by the user, only need to shut up, continue to "listen to". Preferably, conversation simulator if already initiated reply, the cessation of speech should be as expeditiously as possible.

[82]

3) conversation simulator can generate a sound is being considered, "well" is similar to the word or sound "Ummmm..." or "Hmmm", is used to indicated that it would want to speak. If user is speaking to be interrupted on. This allows conversation simulator the user terminated before substantial surface-to-surface to the reply. These non-substantive sound to provide the interrupted then a mode of conversation, the conversation simulator begins to produce than the substantive reply mode better interrupted.

[83]

4) the use of any kind of interrupt-based conversation simulator program capable of learning feedback from the interrupt, and adjust the pause time threshold. It can be from a specific user who can look for his/her reply of the leads has come to the end of, this is accomplished by the tone and rhythm mode feedback to the internal of the machine learning process of the program, wherein the video-clue such as gesture or facial expression, and the like, or when the conversation simulator should be capable of providing reliable indicator of the speech of the other input. These can also be explicitly programmed. The idea is to use a user interrupt as a feedback mechanism for use in machine learning in the processing process.

[84]

A) various input can be used in this kind of machine learning processing process: loudness mode, the tone patterns, other tone mode and, when the certain user to the conversation simulator is impatiently delay, such as particular word " well... ? " May be frequent use.

[85]

B) because a user does not need to be the style of a user is the same as the other, therefore machine learning programme should be able to independently for the respective users being develop and maintain.

[86]

C) the eye gaze information identifying a person's focus play a very important role. The information in the existing system may be used to provide AC-clue. For example, it can be used for identification of a person are looking at what, he/she is what the attention. The direction of the user's eye gaze determined by two factors: the head and eyes. The overall head directional determining the direction of the gaze, and the eyes and precise gaze direction of orientation of the head. Other hint may be the direction from the body inclination of the speaker (the body posture), facial expression and speaker out in an emotional state. Speaker's emotional state can be from, for example, speech rate, pitch, sound intensity, sound and gradus characteristics is estimated in. Understanding speaker's emotional state is favorable to indicate when conversation speaker will end.

[87]

5) for prompting when the conversation simulator should be speech substantive content of the speech from the user. For example, through the user's problem could be the substantive text tone mode is identified and, this point can be conversation simulator as it is relied on to that of the in response to the indication. Expression or phrase specific statements can be conversation simulator programmable classification, indicating that the user has been finished as at and is wished to receive a reply. For example, " do you think What? Hmmm "," ... ! OK ","? ".

[88]

A) clue phrase of simple classification than the more fine. The conversation simulator identification of certain sentence than the other sentence more the termination. For example, the "I don ' t so. think" compared with the, "Yes, what is I think. that" more the termination, because the former may be the beginning of a detailed explanation, and the latter is the said simulator confirmation of some of the content.

[89]

Contextual relations in the majority, more interesting conversation simulator and to avoid duplication is more desirable. This point can be provided by several methods. A method is to create a conversation simulator has the Image of the character. This point, which in turn through the programming and in fact at the same time levels of reply mode is provided. However, to become a suitable accompanying , of these attributes must be eligible for testing. Interest in conversation simulator, character and sentiments must stand on the user side. In this regard, one can accompany conversation simulator must be supportive like ELIZA the like, but it also must be interesting. Most conversation simulator tends to or interesting (at least for a period of time) or supportive, but not both at the same time meets the above. The part of the contemporary interesting usually can't encourage the user to continue to talk. A support is able to provide interesting and the method of combining the two is to provide two-way response. It can be said that if the user has just to perhaps relevant, perhaps humours comments and, at the same time providing support and encourage users to continue the stresses or detailed explanation. This requires one to become accompanying conversation simulator has a large number of template, the template can be for the expression of mood, feelings, mood and attitude and in response to identification word. For example, if the user says: "I hate meetings", the conversation simulator needs a template with the " I hate <x>"Matching, the reply should be such as" I don ' t like meetings very much either, they are so boring.What do you dislike most about meetings? ".

[90]

Ideally, the tones and speech conversation simulator should be and the atmosphere to the line and the content of the conversation. The difficulty and adaptability has been precisely over a response to the user's final specific phrases in the content. Conversation simulator may be given the ability to discern the user's emotional state, it should be able to go on, respond to, the change of state of the user's mood with the reply. For example, it can identify when users inconsolable, to date, when the emotional state of the user from inconsolable into a happy. This can be through the audio, language, user Image, and other input if he/she is applied to the pressure on the key of the remote control of the various characteristics, such as supplied by the classification.

[91]

The audio signal contains information about the user, and these information is not included in the language. For example, the loudness of the user's voice and tone provides the user clues useful mood state. Similarly, the background noise that, especially like nerve spasms or teary the same repeat activities, it can be from the audio signal is, distinguish, in. An audio signal classifier includes the various audio signal corresponding to the classification processing program, so it is capable of identifying the specific sound features, even when they are overlapping increases can also be done. Similarly, user is located can be processed video Image scene, can distinguish in the video Image of the object and events can be classified in order to offer the relevant user is doing the information. For example, in the room the movement of the continuous and repeating that anxiety. Of course, the final speech content is analyzed to obtain a clue of the user's mood state. A mood classifier text-based functions may be programmed to be by generating an indication signal of the state of the negative or judge a negative attribute to repeated use in response to the word. Dictionary can give various input are provided with a mood vector. The mood vector may be defined as the one for various mood class right, the right to express the mood can be a corresponding probability of the word or phrase instructions.

[92]

output power may be provided to indicate the confidence level of the mood type. Therefore, the output of each can be selected with a corresponding associated with a level of confidence. The output signal of the mood classifier may be in the form of is a vector, can be selected for each category of a give confidence. The mood class may be assigned to a characteristic of gradually reduced, in order to not too suddenly from a kind of dialogue into another kind of dialogue. For example, if a user already have shown the state of melancholy for half an hour, but sudden to laugh, the mood signal to change suddenly is not desirable.

[93]

The conversation simulator use of each response template is provided a vector, the template is used for indicating the applicability of various feelings. To the next is used for generating the various replies can be selected through the net score of the template and the template vector mood class vector dot product is weighed. Mood indication can be used to modify the output template choose any of the other schemes can also be used. With regard to the use of which output template generating conversation simulator will be finally determined of the reply can be selected to change the mood of the signal. Even if there may be not only in two kinds of a win the competition mood class , template selection can still be improved. For example, in accordance with a template of the two available for selection in any one of of feel, even though the mood is defined with a high degree of ambiguity (i.e., two kind of feeling with the same probability), each having a low but still is good representative of a level of confidence that the choice of a good. United States Patent 5987415 method is used to classify mood/personality.

[94]

The table below shows a kind of mood classification and the rough part of the prompt list of examples. Examples of these prompts is extended the prior art can be utilized. For example, there are can be in a certain scene identification and tracking the head position of the user of the video tracking system. Have also been used for the biological identification can realize the purpose of video-based face recognition system, it is also applicable to the identification of mood class. Note that can be used in various types of indicator contained adequate but not necessary indicators of the mood. For example, perhaps only a very small number of effective to throw the hands in the air, but this movement and desponding or closely related to the probability is very high.

Dolefulness/depression	Video: head relatively stationary, mobile or periodically to the to. Audio: voice is soft, pitch is high that the reaction times. Language: words indicate mood.
Hastiness	Video: duplication or sudden movements, swaging shoulder; audio: speech fast, large laughs. Language: words indicate mood.
Concentrate on and seriously	Video: static, if the video UI (such as cartoon) that direct conversation simulator interface. Audio: normal, the rhythm pattern. Not laughs. Language: words indicate mood.
Desponding/angry	Video: hand head automobilery, anger or desponding posture. Audio: the moment of the language is arbitrary. Language unusual monotonous. Language: words indicate mood
Happy to/meet	Video: to indicate the posture. Video: is of sings the type , high reading accuracy.

[95]

Preferably, conversation simulator should have knowledge of the user's personality, and can correspondingly adjust its reply. For example, some people like with a "occupy active (take-charge)" character of the AC, together with such persons, conversation simulator may be programmed to more confident, and does not have such a characteristic of the person is programmed to exchange more more exploratory. A personality classifier for a determined individual can set up a permanent model, by the following steps: in the first of the dialogue user identification is confirmed, then a description of the user and his/her conversation simulator to the clue of the catalyst in the reaction of, the establishment of the exchange.

[96]

Multiple personality topology has already been developed. Any one of them with the personality test the related technology, in order to confirm the type of the to which the individual belongs. For example, a four-dimensional model is Myers-Briggs, with 16 kinds of independent character class. The "big five" personality "five elements" model or another has been well-known, the model having a five-dimensional substrate. Although the conversation simulator could provide the user with a test (this kind of test can provide which welcomes of the interesting), there may be is embedded in the ordinary conversation can be in conversation simulator used in the indicator of the classifying user. As has been mentioned, classification is permanent; conversation simulator can be maintained in the dialogue this classification, only through the long-term accumulation of information to change this conclusion. Similar method can be used to determine the user's interests and character. From the user reply key word using standard learning technique is classified, in order to support the interest and character classification. The search in the process of forming a reply, some key words than the other key word should be give more importance to. For example, specific reply is marked in order to indicate the user a reply to those described for determining character and interest of the user are particularly relevant. From these responses of these key words is weighted accordingly. These key words of the user they appear can be in accordance with the frequency of the reply is increased progressively (such as has been noted, it is necessary to consider the extraction of these words a reply of the simulator).

[97]

In addition other than character, can be used to improve output choices may be indicator is also the subject of the conversation simulator. For example, the user's gender, age, height, National, socio-economic level, destination of Chile, can be defined and is used in selection of the template. Conversation simulator can also be used on the user's other real information is programmed. For example, the user can know the names of the users are talking about and what, so that it can continue to this topic. It also can be stored in the first the topic of conversation. Furthermore, it can store can be programmed to the topic of interest to a user, and when the conversation is to provide recommendations in the capacity of these topics. These are also durable variable, they are jointly use with a device, used for confirming identity, the name of the user, for example, or using biometrika programme, such as voice print.

[98]

With conversation simulator to about their feelings and the mood of the user, conversation simulator must be to establish trust. A method of realizing the simulator is programmed to a self-balancing, that is, the displayed on its own, particularly those affecting the "experience". This can be through the given conversation simulator a background story to achieve, in this way it is possible to talk about themselves. Having a history of the conversation simulator would be even more convincing, appeared to have more strongly, true and compassionate character. Background story has been used for the basic thinking of the conversation simulator Loebner on the race. A conversation simulator programmed are of the competition, Whalen, the conversation simulator to the creation of a story to be presented to syndics. Conversation simulator is programmed to be always back to the story, trying to catch the interest of syndics question so that they carry out. The story is related to the conversation simulator. Prior art guided strategy is an attempt of questions syndicsskillful technique. The conversation in the simulator of the present invention, is a background story and trust for creating apparatus. Background story therefore is designed to produce the compassion or comprehension, or is designed to let the user "identification (identify with)" conversation simulator. A solution is a definition of a large amount of background story, for various different character categories sorted according to their applicability. Therefore, not only according to the temporary tendency in the user's personality must also, extracted from the background story library a background story.

[99]

Once a conversation simulator has the capability of in response to a user's, the relationship in the front and back of the communication, it has become more important. Since the user is no longer required to the front of the keyboard, the conversation simulator should be programmed to respond to the user or can be at the appropriate time to start a dialogue. In order to avoid interrupting the user, conversation simulator may be programmed to respond to the user interrupt continuous speech events or user, and only begin at the appropriate time for speech. If the user is a child, the conversation simulator programmed to when the user is indeed the interrupt time education the user should pay attention to courtesies. Preferably, the user can be conversation simulator present or is not present, such as when the user enters or leaves the room, to in order to respond to. In this way, when the user in the conversation simulator to be able to welcome the user, when the user leaves the room from the loud or moves to talk to the party to the conversation should be avoided. Whether users areas is also idle, the conversation simulator should be in order to respond to.

[100]

In order to impart to conversation simulator a compliance with the Image of the communication protocol, the various input form is used at the same time. User behavior a variety of categories of using video once again, such as voice and language data is identified. One example is the user of his/her body posture becomes the same sleeping posture at the time of, for example, lying supine posture of the body, a user then quiesced may also be snoring. In such a situation, the conversation simulator is programmed to stop the speech. Another example is the user leaving the room. These are two simple example, can be driven by the appropriate Image and audio processing algorithms to the identification.

[101]

In order to even more true, a conversation simulator is given the Image of understanding in the world today. For example it is provided with the current time, weather and head can be related to changes in the data. These data is used together with output templates to form relevant sentences. If the TV is on, it can be provided to conduct in response to the capacity of the television signal. For example, it can be and follow a section of laughter and big laughs, or in response to the background music tunes the sad mood and the Image of the seat. This can be done by imparting one kind can be identification laughs sound mark and the capacity of the music not harmonious is realized.

[102]

Conversation simulator is provides a data interface, it can be used in the new template. The data source can be provided by a plurality of means. A manner after formatting are selected from feedback on the spot of the resources. Through a network, switched line, radio-based or other of the connection of the communication resources is provided, for linking conversation simulator with various new template resource. The template can be based on such as current news, stock ticker data, information such as weather and journal articles is created. They can be manually created or automatically from the variable template is generated. The new template may be stored in the server, and periodically sent to the conversation simulator process or is the access. The template may be local storage or stored to the server. If template can be the same organization as the database, so that the relevant user personality attribute information can be used for guiding access to the template, therefore, the most appropriate new templates of the conversation simulator may be given access by the user. Template does not need to be the whole can be stored. Defining variable template, using the data in the database load its blank, this is feasible. For example, variable template is composed of a lower of a sentence " you heard Have <x>? ". Variable data and indicating which kind of template should be used in conjunction with the of the tag is stored in one of the plurality of records. Conversation simulator process to obtain this information, and using it to create speech. In this on the basis of a basic idea, other change is feasible, to the ordinary technical personnel but this is obvious.

[103]

Another kind of update method of the template is the adoption of a plurality of resources is not formatted from the feedback. Is used for the data in the new template, not from its structure and organization is used for this purpose in the library is created, it is in the resource from any data is created, such data resources, such as a search of the Internet to provide the relevant data of the agent of the specific topic. Assuming that, in the course of the conversation, the user indicated that he particularly likes a certain composer. Conversation simulator process may be programmed to generate a search on the Internet relating to that composer the information of the agent. Data analyzer, and filter in the non-treatability can be identifying those in the chain with the composer related sentence, the sentence and fills in the template (for example, " you know Did <x>? ") In order to use in the dialogue. The data analyzer can adopt a similar template matching algorithm, such as the current generation conversation the algorithm used in the simulator. Or, it can adopt, in the field of linguistics from computer method, the extract with the particular topic-related and can be used to generate the information of the specific output mode. Note that this processing method is not only limited to the language. Conversation simulator can obtain graphics, music and other media information, and applying them to the interaction. This kind of application is one example of a conversation simulator representing the character "playing" the composer of the music editing digitized. These non-verbal expressions in the variant of the multimedia conversation simulator system of the description will be given of the more clearly in.

[104]

The maximum extent the convincing conversation simulator depends on the quality of the answer, the quality placed ourselves in the hands of the template (including reply) the size of the database. Currently, creating new templates is a rather heavy processing process. Despite the existence of various channels detailed descriptions of these variants, it is also quite limited. Therefore, feasible way based on the expression of most of the variants need to be sentence into the template can be replaced with each other. For example, template " like I <x>. "In," <x>"Is variable, with the" I like horse. "phase matching, but not with the" I really like horse." matching. A separate template should be " I really like <x>." The created. Of course, this is very tedious. However, not only is the problem of low efficiency. Usually, the template syntax is not expressive force of rich. For example, a plurality of template can only match a variable. One can offer such flexibility of the vocabulary of the manual is desirable.

[105]

Template syntax can provide processing can be mutually replacing the capacity of the necessary conditions. For example, in in Splotch, at present, people can by symbol "&" necessary conditions as a prefix pointed out in detail. Because no Splotch the invention can provide a way, to replace any condition the selection of designated in detail, in addition to the various alternative options are the creation of an independent template external, the template syntax may be enhanced so alternative necessary conditions can be specified in detail. A label can be used to identify the use of brackets and separation, such as separation connection symbol can be slice up the blocking conditions, in order to be able to create complex logic conditions, such as the use of a good search engine can define the logic conditions. A kind of scheme is a good example of one kind and is used for search Lexis^ programme of the database. Special case can be provided in order to eliminate the detailed non-adjective matching specified in the wrong matching conditions, this allows non-correlated term is missed, such as in the above-mentioned "I really like", as in the case of. The syntax ignores word order. For example, by designating the "I" and the "like" for matching is necessary, the following conditions can match the template "I like", "I really like", and " very much like I", it also can be matched to the surface of the lower sentence, " I Like", "Like, I don ' t sense. think you are making". Template syntax may be enhanced, in order to non-correlated word can be ignored, but the word order cannot be ignored. For example, special case can be added, so that the "I" and the "like" template on the excluded, to obtain "Like I" (the word order sensitive). Another possibility would be to simply make a rule must be pointed out in detail in turn is necessary to match the conditions. Another possibility is provided with a kind of the routine, is very like utilizing the user input, the change part replacement norms synonym for "expansion" of the routine, this routine is not important the words are removed.

[106]

Key word extract need not be limited to template matching technique. Known natural language technology can also be used to identifying mouth sentence and-typed sentence and the key word in the phrase.

[107]

Usually know whether the user is asking a question is very important, because if a reply is the issues are raised is not the same. Whether can be usually issues are raised in 1st through the sentence to determine word of, for example, in the "why", "what", "where", "how", opening. Conversation simulator may be programmed to determine whether the user's input is a question. Splotch in the, is a method for the creation of a same routine as the expansion routine, but it can be identified is not synonymous with the problem. This kind of treatment process should be able to use certain question mark signal generator such as the symbol "ppp" modify the user input, in order to produce the template to be matched with it. So that the writing template matching and the problem only to the response becomes relatively easy. In a language-based system, natural language or template-matching techniques can be used to identify questions. The same technique (as with the technical) can be used for the word processing rich emotions: a routine may determine whether the attitude and is directed to the conversation simulator or other main body. Video cues and/or identifier (e.g., name) can be used to indicate the user the object or entity referred to (conversation simulator or other person/matter). This is important information reply type decision.

[108]

The template can be prioritized by criteria going beyond the user reply to be of degree of the word. For example, programmed to Splotch from two equally satisfactory in the option of a more selected option of the content. These match the Splotch seem more intelligent, but they also can be conversation hitman. On the other hand, an attempt to encourage the user to continue to the reply, " me more Tell," for example, may be quite irritating and repeated. A feasible the reply is decomposed into two categories: the understanding that the reply (comments) and contribute to continue to the reply (extraction). Output may sometimes is composed of two types of reply composite and into, for example, associated with the conversation can be encouraged to continue to the argument. The main content of the reply may be a two types of reply: has been said that the content of the user and the comments can cause the user to continue through a reply. For example, one person can always say that :" That ' me more. interesting.Tell s (is very interesting, will tell me some. )" However, sometimes this extraction reply is very specific, so that it is not necessary to the comments. For example, the question "why" it is always necessary to lead out a response to, included in the question can be sufficient that the content of the conversation simulator has been "understand" the contents of the user is on, for example, " why are you at your sister mad? (Why you to the sister violent?? )"

[109]

A method of implementation of this mechanism is the template is divided into those can be used as a comment on the replies and that can further lead a reply by the user's input. Template matcher from each of the categories selected in matching template of the highest rank, determining whether it is appropriate for two-way response or can lead out more information in response to the one-way response. Or, various templates have two types of reply attached to the upper, two kinds of reply can be selected, in each select one in the list. The latter method makes the composite reply to become more easily when, of course, there is assumed that in both lists is compatible with the reply. However, the former method for writing template and is not heavy, and relatively flexible, because it is not always to any one of the template can be made two types of reply.

[110]

Used for selecting the priority information includes user personality types, the current situation (such as template in the release date news data), the user the surrounding environment (whether or not a user has fallen asleep, for example, the? ), Such as an emotional state of the user. Of course, output generated does not need to adopt two different process, for screening candidate output template, for a selected among them.

[111]

Most conversation simulator, including, Splotch, is not able to perform perception context. They can only to the user the last time the words in response to reply. If the reply is a reply containing a single word, such as "Yes" or "why", the conversation simulator does not know what the user is saying. A method of enhancing the context associated with the user is that the previous reply matching template in the matching template list. In order to avoid the previous, with a high priority, the template complete control of conversation content, the priority of previously matched templates may be temporarily adjusted to be lower, in order to first matching template ultimately withdrawn from the conversation, unless it is the contents of by the user to update them. Such a system can give the Image of the short-term memory. As previously related to the user's personality classification of the exposition, the system could be the provision of long-term memory, the character classification through the associated template priority of a particular user and other long-term feature to realize long-term adjustments.

[112]

The priority adjustment scheme discussed above in, the applicability of a particular template is partially determined by the character classification, the adjustment mechanism being used in the context of the discussion of the topic. A variety of technologies can be used. Template can be with standard key word descriptor is stored together, and is indexed in order to allow the search. Searching by keyword vector is a classification, for example, and the current mood, personality, the audio and the like, in order to find out the current template. The key term in the search vector need not necessarily be part of the ordered set of the keywords. Keyword part comprises connection symbol -such as neighbouring connection symbol , the necessary word and can replace the word.

[113]

If the user frequently offer certain specific topic, the template by these topics will be caused to gradually increase their priority, as well as they are more likely to be selected in future conversations. Furthermore, those having a plurality of reply template can be make its reply priority is adjusted, with its partialism topic increase of the priority of the associated no reply. However, when the conversation suspended and the need for a new topic is to be injected into the conversation, an additional mechanism to be used. No need to match with a keyword, these templates will not be in the list of potential templates. A remedial method is injected from the is established by a specific user information in the user attribute database. The database may include from a previous conversation key word, the database can be used, for example, through the Internet access from the outside of the expansion of the data of data resources. In most of the reply with a random variable. For example, reply " @color.w favorate My color is" told Splotch from the color list may be in a random selection of color. This selection may be based on the user's personality or interests is determined priority.

[114]

Is used in a the selected template to identify the many reply of any mechanism will be the same reply will cause the risk of being used repeatedly. In order to avoid this kind of situation, once a reply is selected, it is to make the mark in order to within a certain period of time it will not be selected again. Under the condition of the random variable, probability is adjusted, so that they in the non-uniform distribution. Therefore, various reply is a indicator mark, indicate that it has recently been selected time. Then this information is used to ensure that the reply within a certain period of time will not be multiplexed, in this way, even in a certain extent to select at random, reply will not be selected to be very quickly in succession.

[115]

The priority can be adjusted and surveillance typenon-surveillance type learning. Of the template used for the creation of new non-surveillance type method for detecting an the past conversation and produce the new template without significance to a reply is generated. However, it is able to be used in learning without supervision of the new reply to the old template, compared with the establishment of new template the contrary. Is not based on conversation simulator training sample conversations to learn new templates, such training it is to be used for the learning of the new reply. Whenever, a considerable part of the template content and in detail a portion of the conversation matching, another person is added to the answer in response to the template. Of course, need to be detailed to determine criteria used to determine the template will be a need for detailed to what extent matching and to what extent should be similar.

[116]

At present, when there is no matching exists, Splotch a default template is selected, its reply or is one such as the " understand. I" empty or some independent humours comments. The increase in the reply topics have been popular in the past, these reactions can be adapted. For example, if the "movie" in the past has been the topic of a preference, reply " movie would you like to talk about? " Can be added to the default template.

[117]

The file containing the random variables (such as @color.w) also can be based on user for specific the questions allow the new variable is added. Furthermore, information from the database to be very useful random variables. Generally speaking, conversation simulator need not answer complex factual questions, in order to avoid exposing its limitations. To answer this kind of query most of them rely on language collar perception , in any case, would only by rote fact very conversation of the reply may be that the person is tired. However, through the use of related information, the database in the auxiliary conversation simulator expression its view is very useful, for example, know x since user likes, because all y z x and is, therefore, a user will have like y. Such information can be used for the conversation simulator with a similar taste of the user.

[118]

Because the conversation simulator does not necessarily have in-depth comprehension, the communication relations sensitive and can adapt to the user's reply conversation simulator than does not possess these characteristics of more convincing conversation simulator. Conversation by encouraging the user to occasionally and in a create it can understand the content of the said by the user in response to the mode of the mirage, conversation simulator to the conversation can be maintained. Moreover, it can be successfully let the user speech, it is difficult to maintain the false impression of the peers. If the user only uses one phrase to reply, the only reply to several key. However, if the user answers a long speech, there will be many possible may cause reply key word. A wrong answer will be carried out for a key word gives a conversation simulator does not concentrate on the impression, or even worse, can understand that it is not the content of the is being said. If in this situation, the user may not be willing to carry out more conversation, and begin to query conversation simulator in order to check whether or not it is really in attention to listen to.

[119]

Real voice identification depend on the natural language comprehension. Of course, using a certain rule conversation simulator to generate a meaningful reply, fraudulent users believe they can understand the content of the said, convincing to try to do. However, natural language technology is still the only within the range of the limits can be used for auxiliary conversation simulator identification from a large number of context in the specific meaning of the word or sentence according to the grammar analysis. Therefore, the rule-based template method can be used for selected from the reply must be carried out in the case of in, but more complex, training enhanced network techniques can be used to confirm the correct norms terms in order to represent the variable in the sentence, the sentence the grammatical analysis and a distinguished from the verb phrase, for example, direct object. To be generated in the final determining which of a large number of reply is a reply, if the conversation simulator-dependent predictable and quite a simple selection rules, then it will be the most convincing.

[120]

Reference fig. 1, a conversation simulator 100 of a process performed in operation. Controller 100 can receive input from various resources, is connected, for example, notebook 195, is connected to the camera 135 and 136 of the Image processor 305, such as a remote controller 150 and a keyboard 155 of the traditional user interface device 160. Other input devices may include a microphone 112, various instruments 140 one for example temperature sensor, position sensor, a safety switch, the distance sensor, an electrical load sensor, ambient light sensor, can be selected and the user interface device such as mouse (given is not alone), and the like. Data can be controller 100 through a local area network or wide area network or Internet 115 and 110 is collected. Is connected to the local area network 115 comprises an intelligent equipment device 130, home server 120, or includes a display, an audio output, the wireless device (not separately provide), etc. the output device 130. Home server 120 can store data, such data is used for perishable goods and food inventory list data, used in the art of design and other supplies data, used for amateur preferences of the data on the materials. Intelligent device 130 includes the bar code reader has such as microwave oven and a display interface, television, sound (not separately provided), and the like. Controller 100 can be directly through a monitor 175 output. The monitor can comprises a casing 190, with the lamp or mode output element, allows the casing 190 outer can be controller 100 changes. Internet 110 can receive it from satellite 103 or server 140 data.

[121]

Figure 2 provides a functional diagram of the event-driven structure, this structure is used for generating conversation with the user of the simulation, the interactive process. The information of the user is input user interface 400 receives the processing process, the processing process of the absorption data, such as audio, the text of the derived from the speech, video, control device-such as a keyboard, mouse, such as hand-held controller. Input user interface to the classifier 405 sends the text and untreated signal. The received data is the classifier 405 carry out classification, it can identify which event request from a response generator 415 response. The input user interface 400 of the received information is also applied to an input parser 410 in, it can collect information such as a user of said sentence, the grammar analysis and filter, and then the information is applied to the response generator 415 is. Is located in the collected information is stored in the database 435 in other information. Each time an event by the classifier 405 and send the signaling, the response generator 415 from the classifier 405 getting the status information in, for example, the mood of the user, the user's attention levels, personality, interest, and the like, and generates the reply. Some state information in part by previous state information is determined by the. For example, on the character of the user this is the case. If the sorter 405 from the conversation simulator that a kind of verbal reply requested, the response generator 415 from a response database 440 selecting an appropriate data, and to the output user interface 425 should be issued in response to the signal in order to output the artificial voice. This kind of data may require the animation driver 260, flashing a lamp, or other types of final output device or driver and the voice synchronization. The response data generator 445 receives from input speech analyzer 410 of data request, for example request of the user favorite actor information of a member. The response data generator 445 generating a proxy 205, from the data resources such as worldnet obtain information and creating a data module, the response generator 415 from the data in the module a can be generated at a later time or at the same time can be used in conjunction with the request for a response. The response data is stored to or is transmitted to the response data memory 440 in. When a response is generated, the response generator 415 may be selectively to the input parser 410 sends the signal, that the desired (from a user) to the computer in response to what may happen in the reply, in order to help the input parser 410 in parsing the reply. This can be in the form of a template to help the input parser 410 identifies the reply.

[122]

Now reference to Figure 3, this is still a function block diagram, Figure 2 the classifier 405 and input user interface 400 is displayed in more detail. Also, Figure 3 is also represented by the picture in a function structure, it can be used for realizing the various characteristics of this invention, however, in the system of the present invention must not only within the range of a realizing method. Audio input 245, video input 255 and other user interface device (not given) generating may be applied to the signal in all kind of sorter. Audio input 245 can be made of a microphone (not given) or one can be pointed out that the sound loudness but also can provide its direction of the audio detector (not given) or any other suitable audio sensor receiving, and is applied to the audio classifier 210 in. The latter data form a real-time signal, the signal by the audio classifier 210 using appropriate digital or analog or both way classification. Furthermore, audio classifier 210 generates a to be applied to the mood/personality classifier 290 and the event/class processor information signal of the current state. For example, audio classifier 210 can be programmed to the beginning of the recognition words, to eventually produce a signal generated by the termination of the conversation simulator language, thereby avoiding the conversation simulator interrupt the user. The audio classifier 210 may distinguish certain sounds such as lighting, snoring, the sound of the radio, the sound of many people speaking and at the same time. It can also determine whether there are a plurality of the sound source of the sound being generated, whether it is language sound, whether the sound from the machine such as a vacuum cleaner, or a radio playing. These events and/or in the state of the each with a divided range is combined with the time mark, the combination of the signal is applied to the event/class processor 207 in. Event/class processor 207 from a plurality of classifiers with state information for indication comprises generating a system environment of the user, the current state of the environment/user state signal, event signal is also generated (interrupt signal) in order to ensure that the specific event is the sorter can immediately after identification of the response. The identification of the events from the state information of the plurality of sorters, the event/class processor 207 with the state of the classifier from the plurality of data generating a combined state signal and a combined event signal. The environment/state signal may include all the various classifier can identify all possible information indicative of the event types, or only those who exceed the confidence limit value of the indication information may be incident.

[123]

Video classifier 240 receives the video input 255, and the Image data to produce the status information signal, the signal is applied to the mood/personality classifier 290 and the event/class processor 207 in. The video Image classifier 240 is programmed to provide, for example, the coordinates of the user is pointing to, information indicative of the sign language corresponding to, the number of the person in the visual field range, such as the user identity. In various field from the video processing technology, such as authentication, gesture control of the machine, according to the system designer can be the specific purpose of the current in the system of application. Other output device (not given) are respectively applied to the other of the input of the classifier UI 235, the classifier then their output signal is applied to the event/class processor 207 on. Other UI classifier 235 can include the typical computer control such as the hand-held remote control, mouse, keyboard, joystick, and the like. They also may include the instrument used for monitoring the environment such as ambient lighting levels, the same day time, the indoor temperature, building safety state, current skin in response to the sensor, a heart rate sensor, keyboard or key of the remote control such as a pressure sensor. A direct text input any user interface device 250 can be the text data is applied to the input parser 410 in. Text data can also be from speech to text converter 215 is obtained in, the converter receives audio input 245 and converted to text. When a signal is obtained from the audio equipment, text is language time tags to text converter 215.

[124]

Language to text converter 215 using for example in existing conversation simulator technology, and in the natural language search engine used by other appropriate method in the rules of the grammar of the text or the structure of the syntax analysis. This grammar the results of the analysis is to extract the following data: the input text for indicating (short sentence, sentence or user words) types of data; by the input text can be an extract from the specific variable data the; corresponding to the input text in the data request. If the existing conversation simulator technology as in, the use of a selective direct the rule-based template matching method, the input text can be parsed. And non-existing conversation simulator technology to a specific response simply link this kind of structure (although the ultimate result depending on the response generator 415 is programmed to), the text input template is used for the removal from the input text in the specific information. This point will be here detailed description in accordance with the rule-based template matching method is to explain, but can also use other natural language system to realize. For example, if the input text is identified with a particular corresponding to the text input template, this may correspond to the one or a plurality of can be response generator 415 using the output template. Text output template can also designated specific word or phrase is used for obtaining from an external data memory in the information or add information to the external data memory. For example, conversation simulator by the program designer of a rule of the definition of the rules that is suitable for such as "I am a big fan of Shakespear." reply. The rules may be to a certain close relation of the word "I", with some specific exclusion rules into a fan-shaped "unfolded" it is possible to prevent erroneous recognition of matching. There are one or more rules can be used for identifying direct object in the sentence, there is "Shakepear". The rules or one of the plurality of rules can be detailed definition is used for matching the text input template, or can be a general rules or other method. The matching text input template may correspond to grammar analyzer 410 generating the data request. Shakespear in the examples, the data request can be a plurality of additional information related to the request of the Shakespear. The request can be applied to the response data generator 445 in (which in fig. 2 is presented and discussed in detail below), the generator from external resources obtaining data, such data is response data generator 445 in order to form the new output template. The process will be done through a combination of Figure 4 is discussed in detail.

[125]

The mood/personality classifier 290 receives a signal from each kind of sorter and their processing to generate a mood/personality state signal. The mood/personality classifier 290 can be a kind of can be trained neural network, Bayesian network, a simple rule-based system or any other type of can accept a plurality of different input and predict the user's emotional state in certain character and has a certain probability of the classifier. Preferably, the character signal is the result of a plurality of observation, and a long-term tends to persist. Various character and mood topological method is used, the operation from the simple to the complex. A series of is used for user classification is weary to make the person an example of the rules are as follows:

[126]

· Less word number sentence/phrase (the user's sentences including a few words) (input parser 410 signal indicating reply word number)

[127]

· Input apocalypse warm of the word, for example, low frequency of occurrence of the highest level (input parser 410 signal indicating adjectives)

[128]

· Rather prosaic the tone of the voice (audio classifier 210 signal indicating modulation inflection intensity)

[129]

The lack of practical operations · (video Image classifier 240 signal indicating, etc.)

[130]

· Low pressure on the remote control key

[131]

· With the head or body movement

[132]

· Sighed and

[133]

Watch·

[134]

Lack of · with the conversation simulator the identified object (for example, language synchronized animated character) contact the eyes of the

[135]

Each of these is designated by all the classifier to classify. The color of the user's clothes, the tone of the user's voice, the user to enter and leave the number of times of the room, user swing posture, and so on are all able to provide the state of the user's mood and/or character of a clue. "Five (Big Five)" character topological method, or in the U.S. Patent No. 5987415 recommend more simple endurance/intensity mood state topology method, or other appropriate topology method can be used.

[136]

For immediate mental state, any appropriate can be the framework of the application. The following table summarizes the "five" the basic content, which is from to Myers Briggs topolodical derivatives of development. Emotional state and character related to the subject of modeling many Academic articles exist, most of them based on voice, facial expressions, body posture and a plurality of other types of machine input the issue of machine classification. Even the weather information, can be obtained from Internet utilize the agent or by instruments measuring basic weather data such as obtaining sunshine, and the like, can also be used to infer mental, emotional state.

[137]

Six aspects of negative emotions, defined in the continuous state of the two extreme cases (extracts from Costa & McCrae 1992)

Anxiety	Relax; calm	Anxiety; confused and worried
Angry	Quiet; gradually become angry	Immediately the anger
Desponding	Gradually become desponding	Very fast desponding.
Self-awareness	Is not easy to be bothered	Easily bothered
Impulse	To encourage boycotting	Very easy to be lured
Vulnerability	Easy processing of adversity reaction	Is difficult to process the reaction times

[138]

Six aspects of the outward-oriented character, defined in the continuous state of the two extreme cases (extracts from Costa & McCrae 1992)

Warm	Implicit; stereotypically	Warm; friendly ; close
Love social	With very few people find	Communication -Loving, like some people accompany
Noncompliant has see	Satisfied with behind-the-scenes	Categorical; escapades spoken;
Activities	The leisure	The pace of the dynamic
Search for stimulating	Very few seek stimulation	Desire for stimulating
Positive emotional	Energy not too vigorous	Cheerful; optimistic

[139]

Six aspects of the of thinking, defined in the continuous state of the two extreme cases (extracts from Costa & McCrae 1992)

Fantasizing	Energy can be concentrated here and there in	Imagine of at; Ai daydreamed
Aesthetic	Not interested in the art	And aesthetic arts
Feel	Ignore or not too concerned about feeling	The feeling of great importance to all
Action	Like familiar with	Like change, try to fresh things
Thinking	Relatively narrow intelligent concentration on the	Broad the curiosity of intelligent
Values	Dogmatism; conservative	Relates to re-examine values

[140]

It was with the six areas, defined in the continuous state of the two extreme cases (extracts from Costa & McCrae 1992)

Trust	Love pickier; suspected	Kindhearted the others as honest and
Freeswinging	An alert; misconstruction fact that	Direct; characters
Altruism	Do not want to be involved in	Willing to help others
Compliant	Pugnacious; good polemics	Yield before the conflict; submissiveness
Modest	Higher than he person feel	Humilities; deferent
Will delicatissima	Noncompliant; rational	Will delicatissima; is easy to be moved.

[141]

The six areas of responsibility-rich, defined in the continuous state of the two extreme cases (extracts from Costa & McCrae 1992)

Competition	Often feel that they are not ready	Feel capacity, efficient
Order	Not good; there was no systemic	Well-organized; flexibility ; neat
Fulfill	The obligations unmindful	The responsiblity drive; and
Efforts to achievements	The achievements of low requirements of	To achieve successful is driven
Self-discipline	Delay; of a disorderly manner.	Focused on the completion of the task of
Carefully	Spontaneous; hasty	Action carefully, before ruminations

[142]

The mood/personality classifier 209 outputs a state vector having a large number of degrees of freedom, with these degrees of freedom selected by the designer of the corresponding character and mental model. The mood/personality classifier 290 should be accumulated over a period of the structure of the instant data of character model, because the character is a long-standing state. Mental state will have more easily variable element.

[143]

Also reference Figure 4, the response generator 415 receiving respectively from the mood/personality classifier 290 and the input parser 410 the mood/personality state vector and the syntax analysis of the reply data. The response generator 415 also receives from the event/class processor 207 environment/user state signal and the event signal. The response generator 415 also receives from the database 430 linked parser/requester 432 of the data request signal. Response generator 415 from the user grammar analysis of answer, mood/personality state, environment/user state and event signal, from a response data memory 440 in a reply.

[144]

Parser/requester 432 accomplish three things: the syntactic analysis is detected from the user of the answer can be used for is added to the database 430 to the information to update; generates a request, for the programming of the designated topic further information; determining database 430 need to what kind of data in order to make it more effective. If the syntax analysis of the answer can provide data, parser/requester 432 is programmed so that the to conclude that their beneficial to update the database 430, then those data are parser/requester 432 is extracted and is added to the database 430 in. For example, if the database 430 is a kind of user attribute information base, which includes the user on the preference of the TV program, when the user talks with the conversation simulator that in the course of the "I like Pokeman", parser/requester 432 the keyword "Pokeman" is added to the database 430 in. Parser/requester 432 a request is also generated, for by instantiating the agent 205 from the data resources (the resource data 450 expressed) to obtain further information. For example, agent 205 from the specified can be the names of Pokeman characters for acquiring a text web site on the Internet. Parser/requester 432 can extract the names of these Figures and they are added to the database 430 in the attribute data.

[145]

If database 430 the data of the accumulation of a large number of relevant preference, however, found that some ambiguities can be clarified through the question, then the parser/requester 432 data request can be generated and applied to the response generator 415 in order to by requiring the user to interpret the response at some point is generated. For example, the database 430 may be pointed out that has recently been often watch sports program, however, the which movement is partialism data are not clear. Database 430 may also include standard data request, by the intermittent by the conversation simulator to request the issue of standard data after undergoing a period of time can be is gradually filled. This is a kind of the filling in a simulation, but the user will never of the need to know is that a return son matter. In one example, the database 430 is the attribute of the EPG database, wherein there is a set of standard the creation of information, or in one version, these are of the form data loading with customizing to carry out treatment. The above conversation simulator can be completed by the following steps: of easily generating the data request the relevant template; occasionally in will come from the problem of these template in inserted into the dialogue; from the user reply in search of the relevant data.

[146]

Database 340 of the other example is a smart card and with investment information of the user includes a payee's external database per month (is linked via the Internet), with the payee for user interaction to be timely delivery to bill per month. For example, the smart card can be used by hotel kiosk, the kiosk according to the card user preference data stored in the activity information is recommended (e.g., access old church and travel). In addition the smart card, the same data may also be stored in the radio-frequency device, a personal digital assistant or other suitable device. Database 430 can be a kind of an external database query can be performed, for example, product inquiry. Database 430 may be a household consumer network from the list of purchases, the interaction with users to purchase list is used for added and/or deleted on. Within the scope of the present invention the possibility of also exist other.

[147]

The reference Figure 5, for example, the response database 440 comprises a group of templates, wherein each of the template can be required driving one or more animation. Therefore, when the kind of template is started, in response to a the language the chain controls output (sound, etc.) or independent of any other forms of output animation. The response generator 415 may select output template and convey it to the animation driver 260, Figure 5 has been provided. Animation driver 260, which in turn on the display apparatus (not given) of the output a particular corresponding animation on. Display device may be as shown in Figure 1 the monitor. The response generator 415 also choose to include text template. The response generator 415 may be added to the template text text and transmits them to the text-to-speech converter 275 in order to produce the language output. Template selection and variable language or variable text according to the traditional Splotch language simulator such as by processing. Response generator 415 can be directly to a direct text output 280 the [...] such as a computer display or monitor-output text data. The response generator 415 may also access for providing other output effects 270 template. Another example of the output effect (cabinet) effect is driver 265, includes monitor (as shown in Figure 1) of the chassis 190 of the changeable lighting, its appearance corresponding to changes in output order. The animation driver 260 through the animation provided by the text-to-speech converter 275 and language channel 80 synchronously, thus appearing in the animation is given the characters in the Image of a is speaking. The same synchronization process in the other the effect of, for example, television cabinet 190 is casing effect driver 265 to the driving and to the user to a television set has the Image of the character. Or, the television may be assigned to a human Image.

[148]

The language to text converter 215 or direct text input 250 to the obtained input is the input parser 410 to syntactic analysis, an analysis and will be applied to the response generator 415. According to the grammar analysis of answer, mood/personality state, environment/user state and event signal, the response generator 415 in response to selected data memory 440 of the most suitable template. The response generator 415 may be based on all relevant signal calculate each candidate template is estimated that the application of the quality factor. The result is that, for example, the response generator 415 to be not only the data contained in the text of the words of the user in response to the, but also to other here discussed in response to a variety of factors. In particular, the classification of the user's emotional state and character will result in conversation content, conversation simulator (mood) in response to changes in the way.

[149]

Tracking data the class begins in video input 255, the video input 255 signal is applied to the video Image classifier 240 on. The video Image classifier 240 is programmed to identify the video input 255 signal and of the Image of a large number of different types of video sequences. For example, it can be programmed to identify a person lying or sitting; one is standing quietly or a moving or disturbed as the vicinity of the conversation simulator system; and so on. Belongs to the one of these categories is generated and as the probability of the signal output. Or, a single most probable classification is generated and as a signal output. The signal is applied to the event/class processor 207 in, it will the data with other types of data are combined together to generate an environment/user state signal. If the event/class processor 207 receives a from the video Image classifier 240 indicating information to indicate that a burst is the important events, such as when the user to get up and leave room, event/class processor 207 to the results of an event signal is generated, it can be immediately interrupt response generator 415 to generate the output. If the mood/personality classifier 290 receives from the video Image classifier 240 signal, indicates that the user is in the state of the mode increases moving the same, then the mood/personality classifier 290 can combine this information is combined with other classifier signals to generate a mood/personality state vector, indicates that the height of emotional state of anxiety. For example, audio classifier 210 that may be immediately compared with the sound of the speaker is too sharp, the input parser 410 pointed out that the number of words in recently reply abnormal small. Response generator 415 can be selected in response to gate but option of the template will be subject to the impact of the mood/personality state, the topic by selecting, for example, in this kind of environment is converted into the response generator 415 is programmed can be selected one or more topic.

[150]

Note that to allow the system determines whether the current category or state representatives from the previous time a time change, event/class processor 207 and mood/personality classifier 290 may be provided a data storage capacity and used to determine the current user of the device, in order to the different users can be stored corresponding to the history. System can also be provided a user identifier 460. The latter adopts any appropriate way to identify the user, for example, with the aid of video Image classifier 240 for face recognition, radio frequency identification tag, smart card, voice signature, or allows the user to utilize the biometric indicator-such as thumb fingerprint or simple PIN code-is used to identify the identity of his/her of simple user interface. In this way, the mood/personality classifier 290 and the event/class processor 207 are relevant historical data can be linked with a particular user and the identification and to the response generator 415 in the process of transmitting the trend to use the signal.

[151]

Another on the response generator 415 from the various input in response to the information of the examples are as follows. When the video Image classifier 240 identifies the video input 255 received Image is a person is sleeping, conversation simulator application program generates a language. Response generator 415 dialogue will be terminated and generates white noise or music. Another example is, if there are other people enter the room, the response generator 415 causes one to be introduced in to its continuous of the suspension, in order to allow user with just to the number of people entering a room. Furthermore, conversation generator 240 re-inserted into the sentence, for example, request is introduced to a person who has just entered the room or asking whether the user decided to terminate the dialogue. In another example, by the audio classifier 210 for conversion of the audio input 245 identify a person is in the large laughs. The audio classifier 210 may generate a signal, in accordance with the rules of the programming in a pointed out that laughter should be included in the reply after joke, the response generator 415 to its selecting a replaceable response template.

[152]

The input parser 410 to the user corresponding to the expression of the sentence interest or problem of the specific part in the syntactic analysis. For example, the user may ask :" repair difficulties in the home air conditioner?? " Or the interest in Japanese cuisine. The input parser 410 may be programmed to extract and the problem of related interest in the particular symbol or text data, and produces data request. Furthermore, the response data generator 445 generating a proxy 205 for example from resource data 450 such as a local area network or the Internet (using " local network/Internet 200 the expressed [...]) to obtain further information. The agent 205 received data is response data generator 445 to syntactic analysis, and the new template is generated. In order to realize these functions, the response generator 415 may be programmed to be connection response data and its use of the rules. Several examples will be discussed in order to indicate which is feasible. First of all, the user to the above with regard to the problem of the air conditioner. The response data generator 445 receives the main events specified in the data request and specific requirements; in these circumstances, demand is a direct request to the information. Agent 205 obtain one or two the answer, the response data generator 445 to develop a response and a indicate that the reply should be in conversation simulator is given higher in the indicating information optimal power. In this case, the response should be preferably the identification of the problem. For example, the response may be " I can know from view, to your is easy to repair the relevant whether the answer to the question of household air conditioner, <x>. "Symbol" <x>" From resource data 450 of the response data is collected. 2nd, the user expresses an interest in Japanese cuisine, this leads to the data of the information to the topic of the request. The response data generator 445 retrieves the relevant information and forming several template, such as " you know?? In paragraph 14 a street a very popular Japanese restaurant. ", Produced at the same time with this template has the following indication information of: this is the beginning of a conversation or replaced the subject of the reply type, the the interest of the user is directly related to the expression. Is the retrieved data can be read by the conversation simulator in later "conversation" is introduced in, or can be immediately issued.

[153]

The input parser 410 can utilize the recognition template, computer philological technology or other models to the extracted from the sentence in the operation of the particular type of information. Such as another example, if the user uses a sentence such as "I very much enjoyed the Pokeman television program that saw yesterday. I", then input parser 410 will extract the "Pokeman" direct object , and will be the same as the data request to transmit, because it corresponding to a particular recognition template, or by using natural language techniques may identify a direct object. Recognition template has adopted such as the "I" and the "like" of the rules of adjacent relationship. Natural language apparatus is more flexible, and it can produce similar results. By simply using template sentence structure, the template can be used to distinguish questions statements, preferences and abominations, and the like. More simple template structure that the user can not use all the data in the content, but it is able to offer easy programming technology, this technique can be a considerable less under the conditions of the rules provide a fairly convincing conversation.

[154]

Agent 205 can be left to the local network/Internet 200 data collection, for example, in addition to display on the web site search according to the data of the request is matched with the text of the data link to the other information. The response data generator 445 from the agent 205 to obtain text and other data and filtering the syntactic analysis. Answer from the user in the text and from the resource data 450 of the original text in the course of processing, the input parser 410 and the response data generator 445 can be in accordance with the recognition template or other natural language processing program to select the specific word or phrase. In order to simplify the comparison with other data, these processing procedures for the selected text can be converted into the canonical form. For example, representative response template key word is limited to a set of pre-defined norms terminology. When the user generates a dialogue, the vocabulary of the user and can characterize the various response to keyword vector of the template before comparing are converted to their canonical form. The search is retrieved in the process of the data, converted into the canonical form is followed by search query to produce, the query term variant of the normative the highest success rate of the separating table.

[155]

Reference Image 6, conversation simulator in response to the user and the environment in order to ensure that it its own behavior is appropriate. For example, in fig. 6 in the condition, the user is in sleep. The video input 255 generates a is applied to the video Image classifier 240 signal. The video Image classifier 240 has a comprises a column element of the real-time state vector. Each element with the "visual" conversation simulator information a state of an environment. For example, user activities can be in several different state are classified. Here, the user's activities are identified as "static", means that the user is not around the movement of the room is relatively quiet. State vector of the another element is the number of people in the room, in this kind of situation is a human.

[156]

The video Image classifier 240 can be easily identified using prior art another kind of classification is the number of new objects in the room. In the former case, the video Image classifier 240 can be used for a simple mode is stored its environment. For example, it can be in a certain day photographing to the environment, and in another day when the system is started again, the stored object in an Image with the current Image number of the identified object in the number of the comparison. Furthermore, the number of the new object is output, this information is used to generate a reply.

[157]

Another kind of classification used is the body posture, such as standing, prone, sit down, and the like. In this kind of situation, user is identified in prostrate posture. Conversation simulator system can be also received audio signal is applied to the two kind of treatment in the process of: language to text converter 215 and audio classifier 210. The input parser 410 attempts to identify the most consistent with a language to text converter 215 output the text of the recognition template. Therefore, for example, matching sleep soundly human stertor template with specific recognition template associated with each other. However, the relevant audio classifier 210 snore and indicate the video Image classifier 240 indicate the other information of user activity will lead to the response generator 415 conversation simulator the output of the state of the appropriate language, and to let the template selector/memory 225 generating white noise (or music, no sound, or accent dark light).

[158]

Event/class processor 207 of the filter and to play the role of the data combiner. The classifier from a plurality of class data in combination, the category information output more senior. In the fig. 6 example, event/class processor 207 from the audio classifier 210 and video classifier 240 input of a joint, control of the user activities a cooked sleeping, produce a more senior class (a "metaclass"). Audio classifier 210 input sound, and trying to use the training to enable the identified class to identify it. Event/class processor 207 receiving from an audio classifier 210 and other classifier the category information, and try to use the training to enable the identification of the metaclass to identifying them. Of course, here realized by non-and the structure of the various features of this invention the only method, event/class processor 207 can completely be omitted, its function can be by the response generator 415 to succeed. However, separation of these functions is one advantage of the event/class processor 207 can be made of and the response generator 415 used in the different types of classifiers. For example, the response generator 415 could use a rule-based template matcher, such as used in Splotch; and event/class processor 207 can use a can be trained neural network-type classifier. The distribution of these functions may be more suitable, because the response generator 415 outputs than the number of the event/class processor 207 (or the other classifiers) is trained to recognize the number of categories of the much larger. This causes self-such a fact: when the network-type classifier (such as neural network and Bayesian network classifier) has a large number of possible output state, they are extremely difficult to training.

[159]

Figure 6 in the completely different from the structure of prior art conversation simulator. Prior art conversation simulator is only based on language mode selection of the best template, the system, however, other information about the user account environment, starting conversation in response to those information. From an audio classifier 210 and the video Image classifier 240 of the other information is used for generating a better reply. The role of such additional information provides the most feasible using them and the output of the system as a conversation simulator that behave more like a real person.

[160]

Reference to Figure 7, in a similar environment, in addition to the language text, via the video input 255 and audio input 245 [...] of the additional information is used, in order to produce the output of the conversation simulator. In this example, audio classifier 210 output three kinds of indicator: except for indicating the user is a person; a is used for indicating the presence of voice 2nd here; therefore, a kind of instruction the subsequent conversation simulator there will be a momentary silence. Language to text converter 215 generates the text :" Bob Hi! It is going How (hello Bob, recently how? ) ". The input parser 410 the text classification to a greeting directly to the Bob, at the same time gives the canonical form of the question. That is, " How is it going? (How recently? ) "In the output parser 410 by the instruction in the" your How are? " Represents a kind of standard form. The same data are applied to the mood/personality classifier 290 is. The video Image classifier 240 indicate a certain personal are slowly walk, in its field of view there are two people within the range, no other new object, these two the body posture that they are to stand. The mood/personality classifier 290 is stored on the user's personality data, and from the previous in the mood of the user have been exploring. These data are to be applied to the event/class processor 207 in the output signal from the means.

[161]

Event/class processor 207 will be directed there are two types of the sound of the audio classifications link the following two facts: first, the user's name is used in the text; second, there are two people in a room, wherein a person has just come in. These information combination can be event/class processor 207 as a situation that should not be interrupted. It produces a kind of can be applied to the response generator 415 the event signal, the generator immediately terminate the output language, and is inserted into a halt. The response generator 415 uses the other data from the classifiers determine a template, prompting it is introduced. From the mood/personality classifier 290 on the character of the information to be used is selected, the programming that a preference of the intervention of the simulator will be suitable for quiet social occasions, people-oriented character. The response generator 415 is inserted into a suspended, then, slightly later, generation sentence "Excuse me, I don ' t Bob. know". Word "Bob" by the event/class processor 207, from the input parser 410.

[162]

Note that the video Image classifier 240 in the scene of each of the identified person includes independent activities, the body posture categories of information. For each by the video Image classifier 240 identifies the people, would use an independent vectors are respectively output. Noting also that the audio classifier 210 has a directional function, so it is also can be in the many persons sound out of the person from which it came. For example, Figure 7 in the scene, audio classifier 210 identify a specific speaker-a speaking party A, the output of the and pointed out in the this point. In order to will by the input parser 410 with the syntactic analysis of the text of the link being of the person speaking, all of the output are time stamped. For example, audio classifier 210 to each audio signal for time marking and indicating the direction of the sound source. The audio classifier 210 may also be given the ability to identify the voice print of the sound. The use of such information, event/class processor 207 can be time stamped to the text, and allows the response generator 415 the sound coordinate and soundprint the direction of the link of the text. Furthermore, text can be and is linked to the speech of the person.

[163]

The reference Figure 8, in a video input within the framework of the vision of one of the children of a 315 lead to a video Image classifier 240 vector is generated. This vector indicates that the user sitting down, the user is a child, a number, activity level of the user is idle. Audio input 245 is once again applied to the language to text converter 215 and audio classifier 210 in. Language to text converter 215 will be identified from the child's laughter should be applied to the text of the input parser 410 in. The input parser 410 generates a vector indicating that the user is laughing. Audio classifier 210 identifying the same class of sound. The video Image classifier 240 the user classified into idle and sit, but also points out the absence of new object, presence of only one person. Event/class processor 207 to the response generator 415 noted that user is laugh. The response generator 415 is programmed to use of simple rules when laughter should be followed an interesting or joke sentence. The random selection of a joke, to apply it to text-to-speech converter 215 is. It also through the television chassis effect driver 265 synchronization of generating a lighting effect, and through the animation driver 260 generates a joke of laughter animation after the synchronous animation.

[164]

The reference Figure 9, from the user 325 conversation data is obtained in the grammar analysis, is used to generate a new conversation data. From the user's speech is converted in the language includes a pair in adjacent relationship in "Pokeman" and the word "like", the input parser 410 to the identified according to this said "Pokeman" interest in the topic. It generates a "Pokeman" on the topic of the request of the other data, and a indicating that the request signal is applied to the response data generator 445 on. The response data generator 445 creates a proxy 205 example, the proxy instance from the local network/Internet 200 obtain the data, 200 also is linked to resource data 450 is, such as global website. Further information is the syntactic analysis, and one or a plurality of response templates in response to the stored in the form of data memory 440 in. The video Image classifier 240 for children will the user classification, and specify the user is in an excited or excited state. Event/class processor 207 pointed out that and said by the user to the content of the metaclass state and excited, is used to describe the user to the topic of interest in "Pokeman". As long as the response generator 415 in response to the data memory 440 a is found in "Pokeman" template, it generates one made by agent 205 obtained, parsed, by the response data generator 445 form the response of the response data. By the animation driver 260 generated synchronous animation accompany the response.

[165]

The reference Figure 10, conversation simulator detected mood grieved of the user, the response of the sympathy. It also make use of attribute in the database recommendations the user preference data. The video input 255 includes a user 345 expression. The user responses that include other words the feelings of sadness. The video Image classifier 240 the user's facial expression as sad. Audio classifier 210 the user's voice are classified into a weak to low. The mood/personality classifier 290 these classification information will be combined with a user's mood is generated the superclasses, that is, depression, and its output state vector demonstrates this point. Response generator 415 receives the user said substance, in order to be able to classify response to their acoite, to identify in sympathy with a consistent response of the template and of its output. The response data generator 445 previous attribute data from the memory about the user of the database 430 is received in the pointed out that the attribute data such as the most favorite television program, hobbies and the like. In response, the response data generator 445 from the from the resource data 450 in the electronic programming guide for obtaining programming information, and is generated to be stored in the response data by the memory 440 in a response template. Therefore, in response to generator in its sympathy closely after the latest should be able to cause a section of a sentence, prompting the user to he the most favorite program will be broadcast.

[166]

The reference Figure 11, as shown in fig. 9 discussed in, the word "Pokeman" is extracted out. However, and non-simple from the external data access to information resources, as shown in Figure 9, so as to create a new template for the purpose of in response to, but rather the data is used to amplify the database 430. In the present example, the database 430 is attribute information database, it is used for the EPG in accordance with the taste of the user content filtering and classification. Parser/requester 432 received from the input parser 410 of the answer by the grammar analysis, and selectively generating agent 205 case to in order to obtain further information. Agent 205 Pokeman of the new data, these data are carried on the syntactic analysis, a portion of them may be added to the database. For example, the names of Pokeman characters can be agent 205 to obtain, this data may be the user's favorite Pokeman together with this information is added to the database 430 in.

[167]

Note that the attribute data (in this embodiment is stored in the database 430 in) is stored in a local or remote server. Attribute data can be used as not only the generation of new template resources, can also be used for forming disposition class or other personalized in response to the information resources.

[168]

Response data store 440 may be a database of the template is filled. These template are not necessarily permanent templates. In the majority of which is acquired through the Internet, and the "current" data in the process of creating new template is added to the. These information from the Internet or local area network or data resources in by the response data generator 445 extract out, and is merged into the new template. These information includes text, links, or other types of data information, such as may be displayed on a monitor 175 the Figure.

[169]

Conversation simulator to a realization in a way that allows the other device as a language of the system starting interface is its driving. For example, conversation simulator can be said to " Pokeman you would like to download a game? ", Language to text converter can be in the order transmitted in the sentence" Yes ", and this command is used to invoke a by the response data generator 445 the obtained link, then access this link.

[170]

As shown in the above example, conversation simulator used template group by conversation is not necessarily only retrieved from a user a set of static information. And, it is indeed from external resources can be utilized to set up the information in the template. The initial stage of the system, external resources may be accessed; or, if in on example , they can be input parser 410 identify the start of the visit or startup template the terminology. For example, when the "Pokeman" and the " like I" when used in the sentence at the same time, will lead to the following events: disengaged and with specific examples of the agent 205 to search for further information and links, and the like, on the fact that the data of Pokeman.

[171]

The video Image classifier 240 process includes a control receiving the video information of the camera (video input 255 expressed) capacity. The video Image classifier 240 includes a processing process on a regular basis, which is one of the attempts to distinguish the object in the room, the object can be human or thing , and these individual to carry out the various features of the scaling processing. For example, every when the video Image classifier identified when a new individual, Image classifier will try to discern in the view range of the location of the the face, and uniformly scaling is identified within the range of the field of vision of the person's face, in order to obtain, can be used for personal identification or for individual identification of the facial expression information.

[172]

Although this invention is in the above is described that the content of the preferred embodiment of the in, it should be understood that, without departing from the scope and spirit of the present invention under the condition of, various changes can be applied to these embodiments, various equivalent method can be replaced, this point in the relevant technical field the technical personnel is obvious.

[173]

An interaction simulator, such as a chatterbot, is connected with an external database, such as an electronic program guide. The information gathered during interaction, particularly conversational, is parsed and used to augment the database data. The interaction simulator may be guided by the data residing in the database so as to help fill in recognizable gaps by, for example, intermittently asking questions relating to the subject data requirement. The interaction simulator may be provided with specific response templates based on the needs of the database and a corresponding set of templates to extract the information required by the database. Another example database may be for recording and indexing by key word stories or other free-form verbal data uttered by the user. The interaction simulator may be programmed to help the user develop the story using templates designed for this purpose.

1. For simulating conversational interaction with a user is performed between the conversation simulator, comprising:

One is programmed in order to receive natural language user input of the controller (100);

Said controller (100) is programmed to store the data of the conversation from 1st to generate the data in the database for said in response to the natural language user input;

Said controller (100) is programmed to identify in the database is stored in the 2nd the type of the data;

Said controller (100) is also programmed in order to from the said natural language input by the user in the database is stored in the 2nd type of syntax analysis of the data, and in the above-mentioned result of the syntax analysis of data in the database is stored to the 2nd; and

Said controller (100) is also programmed in order to obtained from the network of the analysis with the said associated the data of the additional information and said additional information into said database 1st 2nd and the said at least one of the database.

2. According to claim 1 the simulator, wherein the data includes user preference data, the database is attribute database.

3. According to claim 2 the emulator, wherein said attribute database storage instruction is used for video programming of the data of the user's preference.

4. According to claim 1 the simulator, wherein said database is stored in the portable medium, with the need for the apparatus to use of the user information.

5. According to claim 1 the simulator, wherein said controller (100) programmed to ask questions, draw the problem is stored in the external database type of data, in response to the external database content.

6. A method for loading database, comprising the following steps:

Conversation simulator simulating the use of the conversation with the user, at the same time maintaining a conversation database 1st data, said analog comprises the steps of from the 1st generated in the data in the database sentence;

Receive answer from the user;

Determine when the data in the reply including the incorporated into the 2nd type of data in the database;

In response to said determining step, in the 2nd incorporate the data in the database; and

Obtained from the network in reply with the said associated with the data of the additional information and said additional information into said database 1st 2nd and the said at least one of the database.

7. According to claim 6 of the method, wherein said database is a 2nd the user preference data attribute database.

8. According to claim 6 of the method, wherein said database is a 2nd the database of the reply.

9. According to claim 6 of the method, wherein said database is 2nd of the popular list of a series of Internet web site.

10. According to claim 6 of the method, wherein said network comprises a local area network and/or the Internet.

11. A method for loading a database, comprising the following steps:

Using conversation simulator simulating the user talks between, the 1st database storing data, said analog comprises the steps of from the 1st sentence is generated in data of the database;

In response to the said determined information request of content of the database;

Generating for said sentence in the conversation simulator in response to said information request information, and receiving from said user in response to this answer;

Determining when data in said replies includes in response to said information request data;

In response to said determining step, said data in the database into said; and

Obtained from the network in reply with the said associated with the data of the additional information and said additional information into said database.

CPC - классификация

G G0 G06 G06F G06F1 G06F16 G06F16/G06F16/3 G06F16/33 G06F16/337 G06F16/9 G06F16/90 G06F16/903 G06F16/9033 G06F16/90332 G06F17 G06F17/G06F17/3 G06F17/30 G06F17/307 G06F17/3070 G06F17/30702 G06F17/309 G06F17/3097 G06F17/30976 G1 G10 G10L G10L1 G10L15 G10L15/G10L15/1 G10L15/18 G10L15/181 G10L15/1815 G10L15/182 G10L15/1822 G10L15/2 G10L15/22 H H0 H04 H04N H04N2 H04N21 H04N21/H04N21/4 H04N21/43 H04N21/439 H04N21/4394 H04N21/44 H04N21/440 H04N21/4400 H04N21/44008 H04N21/46 H04N21/466 H04N21/4665

IPC - классификация

G G0 G06 G06F G06F1 G06F17 G06F17/G06F17/2 G06F17/20 G06F17/3 G06F17/30 G06F19 G06F19/G06F19/0 G06F19/00 G1 G10 G10L G10L1 G10L15 G10L15/G10L15/1 G10L15/18 G10L15/2 G10L15/22 H H0 H04 H04N H04N2 H04N21 H04N21/H04N21/4 H04N21/43 H04N21/439 H04N21/44 H04N21/46 H04N21/466

Получить PDF