A chatbot is a piece of software that simulates a conversation via auditory or textual methods in applications, on websites, in mobile apps, or through the telephone line. For businesses, chatbots open up a world of scaling customer service and help customers leave feedback, schedule appointments, or order products quicker.
The Web Speech API enables us to incorporate voice functionality easily into web applications. First, we are setting up a basic index.html with only a few elements.
- We are creating one hyperlink tag with class=”talk” (line 7). If this is clicked, the voice will be activated later on. A button can be used alternatively.
- We are creating a paragraph tag with class=”voice2text” (line 8). Whatever we speak to the microphone will be printed as text here later.
- First, we query our two HTML elements talk and voice2text, and assign them to equally named variables (lines 1+2).
- We are setting up the recognition interface to SpeechRecognition independent of the browser (line 4) and instantiating the speech recognition interface and assign it to const recorder (line 5).
- Adding an EventListener allows us to start the SpeechRecognition on click of our <a> tag (with class talk)(lines 15–17).
- The onstart property represents an event handler that runs when the speech recognition service has started listening to incoming audio. In this case, we just print out that the voice is activated (lines 7–9).
- The onresult interface represents an event handler that runs when the speech recognition service returns a result. We start with printing the result to the console (lines 11–13). Let’s have a look at the result.
Whatever we speak into the microphone is being logged in the event of the onresult interface. We can get it out of the path event.results.resultIndex..transcript and can do something with it, e.g. printing it on the webpage. We just need to slightly adapt the onresult interface.
- We are accessing the actual transcript of our speech and storing it in the transcript constant (line 3).
- Afterwards, we are adding the transcript as text to our paragraph to display it on the screen (line 4). If you add a little bit of CSS it might look like this:
As we are now able to speak to our web app and the application is able to understand and display what we said, the next step is to have a spoken response.
- We add a new line to our onresult interface in which we are calling a new function botVoice to which we hand over our transcript as input (line 17).
- In the botVoice function, we are creating a new speech request with SpeechSynthesisUtterance() (line 2)
- Now we can access and adapt the different parameters of the speech request:
→ text is used to specify the text of the speech. We are setting a default text (line 3). As we are forwarding our transcript we can let the bot also directly interact with it. In this case, if our transcript includes “how are you” we are setting the response text to “I am fine thanks. How are you?”. (line 4–6) Here you can bring in your own creativity, API or Artificial Intelligence code.
→ volume is used to control the volume where 1 is the maximum and 0 the minimum
→ rate is used to control the speed of the voice with 0.1 (lowest) and 10 (highest)
→ pitch is used to control the pitch of the speech with 0 (lowest) and 2 (highest)
- Finally, we can let the bot talk with window.speechSynthesis.speak () (line 10).