Interactive voice response (IVR)

Interactive voice response (opens in a new tab) is a Voice feature available via API or web interface. Over web interface, it's available via our customer engagement platform, Moments.

Use IVR to initiate outbound calls to one or more destination numbers (landline or mobile), or to receive inbound calls to your Voice number from your end users (Inbound IVR). When the end user answers the call, the IVR scenario is executed. This is an IVR scenario you previously created.

IVR automates the voice call processes for your business. Use this two-way communication option to set up different actions for your customers during the call.

There isn't a limit as to how many different IVR (and non-IVR) actions you can set.

IVR is available for the following types of Voice calls:

Inbound calls - Create an IVR scenario and configure it for your selected Voice Number. Whenever customers initiate an inbound call towards this Voice Number, this call will be answered using the inbound IVR scenario.
Outbound calls - Create more complex scenarios when initiating calls towards your customers.

Response codes

Response codes are also known as Dual Tone Multiple Frequency (DTMF (opens in a new tab)) codes. They can be used both with IVR in Flow using Moments and Broadcast when sending broadcast voice messages to interact with your customers and set up the next steps during a call.

Your end user enters predefined response codes on their phone and our platform will perform actions in accordance with how they're defined when you created a broadcast.

IVR in Moments

For more information on IVR, refer to Moments product documentation, IVR in Flow. Also, to check which IVR elements you can use in your communication, visit the IVR Elements section.

To purchase Moments, please contact your dedicated Account Manager or contact our Sales team (opens in a new tab).

Dial to Conversations

Use the Dial to Conversations IVR element to redirect the current call toward Conversations.

IVR over API

You can use IVR over API if you want to integrate your platform with the Infobip platform and initiate calls towards your end users triggered by a specific event or if you have your own web interface.

You can use these features with IVR over API:

Text-to-Speech
Answering Machine Detection
Speech Recognition
1. To use IVR over API, create IVR scenario (opens in a new tab). For more details, refer to our Voice API documentation (opens in a new tab). Combine different elements to create a multilevel scenario to suit your business needs.
2. Start an outbound IVR call using Launch API (opens in a new tab) or configure an IVR scenario for your Voice number. To use inbound IVR API, configure IVR API on your Voice number. Refer to Create Voice Setup on a Number (opens in a new tab). Use Forward To IVR and scenario key you have received in the response when creating the IVR scenario.
  
  Available IVR elements over API:
  - Call API – Connect with a URL of your platform, send and receive some value as defined in element parameters.
  - Capture- The customer sends a voice response which is saved into a variable. It can be used afterward for branching the IVR scenario further. For more information, refer to Speech Recognition.
  - Collect – The customer taps the keypad reproducing a DTMF code (opens in a new tab). The entered value will be stored in a variable you specified. It might be used in other elements such as If Thenor While Do.
  - Dial – Redirect the current call towards another phone number.
  - Dial to Conversations - Redirect the current call to Conversations.
  - Dial to Many – Redirect the current call towards multiple phone numbers in parallel or sequential order. Once the call is answered by one of the multiple phone numbers, dialing towards other numbers stops.
  - Dial to WebRTC - Redirect the current call to Web and In-app Calls.
  - For-Each – Defines the variable name that is used within a loop for every single value from a list of values.
  - Go To – Define where to continue the IVR scenario.
  - Hang-up – This is a pseudo-action. It marks the end of the IVR flow and ends the call.
  - If-Then-Else – Checks if the expression is true or false and then branches the IVR call.
  - Machine Detection – Detects whether there's an answering machine at the beginning of the outbound IVR call.
  - Play – Reproduce a pre-recorded audio file (available as URL). Paste the URL as a parameter.
  - Play from recording – Reproduce the audio file recorded using the Record element. You could use the file recorded during the current call or files recorded during different calls.
  - Repeat-While - Executes a block of code repeatedly, as long as the condition set up for that block is true. Make sure not to provide a condition that causes an infinite loop.
  - Say – Convert text to audio and play it back to the customer.
  - Send SMS – Send an SMS with a predefined text.
  - Set Variable – Sets the variable value.
  - Switch-Case – Checks variable value and branches the IVR call.
  - While-Do – Checks if the expression is true or false and branches the IVR call.
3. Once your IVR scenario is pre-configured (created), you can update it by using the PUT method (opens in a new tab), without the need to create a new scenario. You can also DELETE (opens in a new tab) the previously configured IVR scenario. After the call has finished, you can check out the call details either through Analyze, under REPORTS (opens in a new tab) in web interface, or by checking out delivery reports or logs (opens in a new tab) via API.

You can use the GET (opens in a new tab) all delivery reports or configure (opens in a new tab) that the Infobip platform sends these delivery reports to your URL.

Refer to Reports for detailed information.

Speech recognition

Infobip Speech Recognition captures end-user’s speech when they are communicating using Interactive Voice Response (opens in a new tab) (IVR). The Infobip platform can save and match the spoken input with a predefined word or phrase. Then, depending on your pre-configuration, execute further IVR actions. This gives your end users more flexibility considering they are not limited to only having to tap digits on a keypad.

AVAILABILITY

Speech Recognition is available via API only. It's not available over the Infobip web interface.

When creating an IVR scenario to capture end-user’s spoken input (using the Capture IVR action) make sure to follow these steps:

One of the most important steps when setting up Speech Recognition using the Capture action is to choose the language you expect your end user to be speaking.
After that, you can set up some key phrases that need to be matched, as well as additional available parameters such as timeout, silence timeout, etc.
You can also choose not to match a key phrase but rather capture everything your end user will say and send that input as text to your platform via API using an additional IVR action.

To learn more on how to preconfigure an IVR scenario and use Speech Recognition, make sure to check out our API documentation (opens in a new tab).

Best practices

Here are some tips and tricks on how to better use Speech Recognition.

Tip 1 - Know your use case

To use Speech Recognition efficiently, you need to take a few factors into consideration:

Always make an effort to think about your specific use case and tailor it to your end-users’ needs.
Use the most specific variant of the language that is supported in IVR.
Assume there might be differences in phone and call quality. Have a plan for how to reach end users in certain hard-to-reach areas.
Expect background noise. Anticipate end users might be on their phones while walking in the street, crowded area, during rush hour, etc.
Prepare short and concise questions. What kind of replies do you expect to receive? Don't use complex terms. Use simple language so your end users can understand you.
Be mindful of the speech recognition duration period. Is it reasonable to expect a reply in 5, 10, or 20 seconds?

Tip 2 - Tweak timeout and maxSilence carefully

One of the most important things - consider the time end users need to reply to your question.

Let's consider the following use case for example:

json

 
    {
      "capture": "myVar",
      "timeout": 10,
      "speechOptions": {
        "language": "en-US",
        "keyPhrases": ["info", "update"]
      }
    }

In this example, IVR option is set to 10 seconds. You expect to capture the following phrases ("info" or "update) in those 10 seconds. Now you test and tweak. Is 10 seconds enough, too long? This depends on multiple factors such as word length, ends-user's speech pattern, and reaction time.

Obviously, you don't want end users to have to wait too long to move on to the next action, nor to get upset for not having enough time to say the keyword.

What if you use maxSilence (as shown below)?

json

 
    {
      "capture": "myVar",
      "timeout": 10,
      "speechOptions": {
        "language": "en-US",
        "maxSilence": 3,
        "keyPhrases": ["info", "update"]
      }
    }

Here, the end user has 10 seconds with a 3 seconds maxSilence option. This means if IVR detects 3 seconds of silence, it stops capturing.

However, if your end user said the keyword within a second, they will have to wait 2 seconds before moving on to the next action in IVR.

What if the end user stops to think and makes a pause longer than 3 seconds? In this case, IVR will not capture user's input.

Obviously, it is hard to offer general advice that works for everyone but our suggestions are:

When possible, ask simple questions and suggest short and simple answers to your users
Try different variations of timeout and maxSilence and test speech capturing behavior before you offer it to your end users

Tip 3 - Use simple key phrases

The keyPhrases matches captured speech with the provided text, and it branches the IVR scenario based on that match.

Example:

json

 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["yes, I want more info", "no, I do not want more info", "I am not sure"]
      }
    },
    {
      "if": "${myVar == 'yes, I want more info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

Here's where you have to be careful. You can't expect your end users to always say the exact keyphrase.

Let's say your end user says 'Yes, give more info' instead of the keyphrase you defined - 'Yes, I want more info'. Since the keyphrase is not matched, the IF expression is not triggered.

The longer the keyphrase, the lower the odds the end user matches that specific keyphrase successfully.

Example:

json

 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["more info", "exit", "maybe"]
      }
    },
    {
      "if": "${myVar == 'more info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

We've made a few tweaks to the example above. Now, the probability to match the keyphrase is higher.

The keyphrase does not limit the end-users's speech. End users can say a lot more than just a keyphrase. However, the keyphrase is the key to correctly match and branch the IVR scenario. For example, if the end user says: "Yes, I want more info" or "Yes, give me more info", in both cases the keyphrase will be matched because myVar is more info.

If you are interested in full captured speech you can try using the myVar_Full variable (note that 'myVar' are just our examples, you can name these any way you like).

Finally, both examples can be reduced to even shorter phrases:

json

 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["yes", "no", "maybe"]
      }
    },
    {
      "if": "${myVar == 'yes'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

Keyphrases are always matched by comparing fully captured speech text and provided keyphrases.

Let's see the final example in this tip:

json

 
    {
      "say": "Do you want discount?"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["discount"]
      }
    },
    {
      "if": "${myVar == 'discount'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

Be mindful of how you select specific words for your keyphrases. Let's say you set your keyphrase to be "discount".

You have one end user say: "No, I don't want a discount" and another end user says: "Yes, I want a discount". See the problem?

In both cases, the keyphrase will match and both end users will receive more details about the discount. This is why you should always use short, clear, and simple keyphrases and think about all possible use cases to prevent confusion.

Tip 4 - Capturing starts after sound signal (beep)

Let your end users know they should start speaking after they hear a sound signal (beep).

Example:

json

 
    {
      "say": "If you want more info, say: info. If you want to reach our agent, say: agent"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["info", "agent"]
      }
    },
    {
      "if": "${myVar == 'info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

If the end user says "info" before the first audio message is played completely, nothing will be captured. If this happens in your use case, you can suggest the end user to start speaking after the sound signal.

json

 
    {
      "say": "If you want more info, after the beep signal say: info. If you want to reach our agent, after the beep signal say: agent"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["info", "agent"]
      }
    },
    {
      "if": "${myVar == 'info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }

Tip 5 - Provide DTMF failover

Sometimes it's difficult to capture voice input, for example, when end users speak with a heavy accent, have speech difficulties, poor signal strength, or echo. In these cases, use DTMF (opens in a new tab)as a failover option.

This means that they can answer in the form of speech input or by pressing predefined digits on their phone keypad.

json

 
    {
      "say": "Say discount or press 1 to get discount. Say exit or press 0 to exit."
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {
          "maxInputLength": 1
       }
    },
    {
      "if": "${myVar == 'discount' || myVar == '1'}",
      "then": [
        {
          "say": "You will get discount"
        }
      ],
      "else": [
        {
          "say": "Goodbye"
        }
      ]
    }

The current example shows the end user can tap 1 or 0 instead of speaking.

If there's a DTMF failover, it has priority over Speech Recognition. For example, if the end user says something and taps 1, then myVar will be recorded instead of speech.

However, if the end user says "discount" as a keyphrase and doesn't tap anything on their keypad, myVar and myVar_Full is captured as usual.

Tip 6 - Use maxInputLength for capturing DTMF

When using DTMF failover by pressing digits on the phone keypad, IVR has to decide when to stop capturing.

Let's revisit the previous example which is slightly modified to demonstrate the point:

json

 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {}
    }

By not having maxInputLength the IVR doesn't know how many digits are going to be pressed, so it will wait for timeout expiration.

In this example, IVR will wait 5 seconds to capture all digits pressed. By using maxInputLength, IVR can stop capturing and proceed with the scenario as soon as it captures the specified number of digits.

As soon as the first digit is pressed, the IVR can continue the scenario if maxInputLength is 1. timeout is still respected in this case, meaning that end user has 5 seconds to say something or press a digit.

Example:

json

 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {
          "maxInputLength": 1
       }
    }

Speech recognition supported languages

During the Early Access phase, we will support the languages listed in the table below (and will expand the list in the near future).

Check out the table to see which abbreviation you need to use when selecting a specific language in the API request.

Language	Abbreviation
Afrikaans (South Africa)	af-ZA
Albanian (Albania)	sq-AL
Amharic (Ethiopia)	am-ET
Arabic (Algeria)	ar-DZ
Arabic (Bahrain)	ar-BH
Arabic (Egypt)	ar-EG
Arabic (Iraq)	ar-IQ
Arabic (Israel)	ar-IL
Arabic (Jordan)	ar-JO
Arabic (Kuwait)	ar-KW
Arabic (Lebanon)	ar-LB
Arabic (Libya)	ar-LY
Arabic (Morocco)	ar-MA
Arabic (Oman)	ar-OM
Arabic (Qatar)	ar-QA
Arabic (Saudi Arabia)	ar-SA
Arabic (State of Palestine)	ar-PS
Arabic (Syria)	ar-SY
Arabic (Tunisia)	ar-TN
Arabic (United Arab Emirates)	ar-AE
Arabic (Yemen)	ar-YE
Armenian (Armenia)	hy-AM
Azerbaijani (Azerbaijan)	az-AZ
Basque (Spain)	eu-ES
Bengali (Bangladesh)	bn-BD
Bengali (India)	bn-IN
Bosnian (Bosnia and Herzegovina)	bs-BA
Bulgarian (Bulgaria)	bg-BG
Burmese (Myanmar)	my-MM
Catalan (Spain)	ca-ES
Chinese (Cantonese, Traditional)	zh-HK
Chinese (Mandarin, Simplified)	zh-CN
Chinese (Mandarin, Taiwan)	zh-TW
Croatian (Croatia)	hr-HR
Czech (Czech Republic)	cs-CZ
Danish (Denmark)	da-DK
Dutch (Belgium)	nl-BE
Dutch (Netherlands)	nl-NL
English (Australia)	en-AU
English (Canada)	en-CA
English (Ghana)	en-GH
English (Great Britain)	en-GB
English (Hong Kong)	en-HK
English (India)	en-IN
English (Ireland)	en-IE
English (Kenya)	en-KE
English (New Zealand)	en-NZ
English (Nigeria)	en-NG
English (Pakistan)	en-PK
English (Philippines)	en-PH
English (Singapore)	en-SG
English (South Africa)	en-ZA
English (Tanzania)	en-TZ
English (US)	en-US
Estonian (Estonia)	et-EE
Filipino (Philippines)	fil-pH
Finnish (Finland)	fi-FI
French (Belgium)	fr-BE
French (Canada)	fr-CA
French (France)	fr-FR
French (Switzerland)	fr-CH
Galician (Spain)	gl-ES
Georgian (Georgia)	ka-GE
German (Austria)	de-AT
German (Germany)	de-DE
German (Switzerland)	de-CH
Greek (Greece)	el-GR
Gujarati (India)	gu-IN
Hebrew (Israel)	he-IL
Hindi (India)	hi-IN
Hungarian (Hungary)	hu-HU
Icelandic (Iceland)	is-IS
Indonesian (Indonesia)	id-ID
Irish (Ireland)	ga-IE
Italian (Italy)	it-IT
Italian (Switzerland)	it-CH
Japanese (Japan)	ja-JP
Javanese (Indonesia)	jv-ID
Kannada (India)	kn-IN
Kazakh (Kazakhstan)	kk-KZ
Khmer (Cambodia)	km-KH
Korean (South Korea)	ko-KR
Lao (Laos)	lo-LA
Latvian (Latvia)	lv-LV
Lithuanian (Lithuania)	lt-LT
Macedonian (North Macedonia)	mk-MK
Malay (Malaysia)	ms-MY
Malayalam (India)	ml-IN
Maltese (Malta)	mt-MT
Marathi (India)	mr-IN
Mongolian (Mongolia)	mn-MN
Nepali (Nepal)	ne-NP
Norwegian Bokmål (Norway)	no-NO
Persian (Iran)	fa-IR
Polish (Poland)	pl-PL
Portuguese (Brazil)	pt-BR
Portuguese (Portugal)	pt-PT
Punjabi (Gurmukhi India)	pa-Guru-IN
Romanian (Romania)	ro-RO
Russian (Russia)	ru-RU
Serbian (Serbia)	sr-RS
Sinhala (Sri Lanka)	si-LK
Slovak (Slovakia)	sk-SK
Slovenian (Slovenia)	sl-SI
Spanish (Argentina)	es-AR
Spanish (Bolivia)	es-BO
Spanish (Chile)	es-CL
Spanish (Colombia)	es-CO
Spanish (Costa Rica)	es-CR
Spanish (Cuba)	es-CU
Spanish (Dominican Republic)	es-DO
Spanish (Ecuador)	es-EC
Spanish (El Salvador)	es-SV
Spanish (Equatorial Guinea)	es-GQ
Spanish (Guatemala)	es-GT
Spanish (Honduras)	es-HN
Spanish (Mexico)	es-MX
Spanish (Nicaragua)	es-NI
Spanish (Panama)	es-PA
Spanish (Paraguay)	es-PY
Spanish (Peru)	es-PE
Spanish (Puerto Rico)	es-PR
Spanish (Spain)	es-ES
Spanish (USA)	es-US
Spanish (Uruguay)	es-UY
Spanish (Venezuela)	es-VE
Sundanese (Indonesia)	su-ID
Swahili (Kenya)	sw-KE
Swahili (Tanzania)	sw-TZ
Swedish (Sweden)	sv-SE
Tamil (India)	ta-IN
Tamil (Malaysia)	ta-MY
Tamil (Singapore)	ta-SG
Tamil (Sri Lanka)	ta-LK
Telugu (India)	te-IN
Thai (Thailand)	th-TH
Turkish (Turkey)	tr-TR
Ukrainian (Ukraine)	uk-UA
Urdu (India)	ur-IN
Urdu (Pakistan)	ur-PK
Uzbek (Uzbekistan)	uz-UZ
Vietnamese (Vietnam)	vi-VN
Zulu (South Africa)	zu-ZA

WebRTC Voice messages

Interactive voice response (IVR)

Response codes

IVR in Moments

Dial to Conversations

IVR over API

Speech recognition

Best practices

Speech recognition supported languages

Help shape the future of our products