Voice and Video
Interactive voice response (IVR)

Interactive voice response (IVR)

Interactive voice response (opens in a new tab) is a Voice feature available via API or web interface. Over web interface, it's available via our customer engagement platform, Moments.

Use IVR to initiate outbound calls to one or more destination numbers (landline or mobile), or to receive inbound calls to your Voice number from your end users (Inbound IVR). When the end user answers the call, the IVR scenario is executed. This is an IVR scenario you previously created.

IVR automates the voice call processes for your business. Use this two-way communication option to set up different actions for your customers during the call.

There isn't a limit as to how many different IVR (and non-IVR) actions you can set.

IVR is available for the following types of Voice calls:

  • Inbound calls - Create an IVR scenario and configure it for your selected Voice Number. Whenever customers initiate an inbound call towards this Voice Number, this call will be answered using the inbound IVR scenario.
  • Outbound calls - Create more complex scenarios when initiating calls towards your customers.

Response codes

Response codes are also known as Dual Tone Multiple Frequency (DTMF (opens in a new tab)) codes. They can be used both with IVR in Flow using Moments and Broadcast when sending broadcast voice messages to interact with your customers and set up the next steps during a call.

Your end user enters predefined response codes on their phone and our platform will perform actions in accordance with how they're defined when you created a broadcast.

IVR in Moments

For more information about IVR and to check which IVR elements you can use in your communication, refer to the IVR in Flow documentation in Moments.

To purchase Moments, please contact your dedicated account manager or contact our Support (opens in a new tab).

Dial to Conversations

Use the Dial to Conversations IVR element to redirect the current call toward Conversations.

IVR over API

You can use IVR over API if you want to integrate your platform with the Infobip platform and initiate calls towards your end users triggered by a specific event or if you have your own web interface.

You can use these features with IVR over API:

  • Text-to-Speech

  • Answering Machine Detection

  • Speech Recognition

    1. To use IVR over API,  create IVR scenario (opens in a new tab). For more details, refer to our Voice API documentation (opens in a new tab).  Combine different elements to create a multilevel scenario to suit your business needs.

    2. Start an outbound IVR call using  Launch API (opens in a new tab) or configure an IVR scenario for your Voice number. To use inbound IVR API, configure IVR API on your Voice number. Refer to  Create Voice Setup on a Number (opens in a new tab). Use Forward To IVR and scenario key you have received in the response when creating the IVR scenario.

      Available IVR elements over API:

      • Call API – Connect with a URL of your platform, send and receive some value as defined in element parameters.
      • Capture- The customer sends a voice response which is saved into a variable. It can be used afterward for branching the IVR scenario further. For more information, refer to Speech Recognition.
      • Collect – The customer taps the keypad reproducing a DTMF code (opens in a new tab). The entered value will be stored in a variable you specified. It might be used in other elements such as If Thenor While Do.
      • Dial – Redirect the current call towards another phone number.
      • Dial to Conversations - Redirect the current call to Conversations.
      • Dial to Many – Redirect the current call towards multiple phone numbers in parallel or sequential order. Once the call is answered by one of the multiple phone numbers, dialing towards other numbers stops.
      • For-Each – Defines the variable name that is used within a loop for every single value from a list of values.
      • Go To – Define where to continue the IVR scenario.
      • Hang-up – This is a pseudo-action. It marks the end of the IVR flow and ends the call.
      • If-Then-Else – Checks if the expression is true or false and then branches the IVR call.
      • Machine Detection – Detects whether there's an answering machine at the beginning of the outbound IVR call.
      • Play – Reproduce a pre-recorded audio file (available as URL). Paste the URL as a parameter.
      • Play from recording – Reproduce the audio file recorded using the Record element. You could use the file recorded during the current call or files recorded during different calls.
      • Repeat-While - Executes a block of code repeatedly, as long as the condition set up for that block is true. Make sure not to provide a condition that causes an infinite loop.
      • Say – Convert text to audio and play it back to the customer.
      • Send SMS – Send an SMS with a predefined text.
      • Set Variable – Sets the variable value.
      • Switch-Case – Checks variable value and branches the IVR call.
      • While-Do – Checks if the expression is true or false and branches the IVR call.
    3. Once your IVR scenario is pre-configured (created), you can update it by using the  PUT method (opens in a new tab), without the need to create a new scenario. You can also  DELETE (opens in a new tab) the previously configured IVR scenario. After the call has finished, you can check out the call details either through Analyze, under  REPORTS (opens in a new tab) in web interface, or by checking out delivery reports or  logs (opens in a new tab) via API.

You can use the  GET (opens in a new tab) all delivery reports or  configure (opens in a new tab) that the Infobip platform sends these delivery reports to your URL.

Refer to Reports for detailed information.

Speech recognition

Infobip Speech Recognition captures end-user’s speech when they are communicating using Interactive Voice Response (opens in a new tab) (IVR). The Infobip platform can save and match the spoken input with a predefined word or phrase. Then, depending on your pre-configuration, execute further IVR actions. This gives your end users more flexibility considering they are not limited to only having to tap digits on a keypad.

AVAILABILITY

Speech Recognition is available via API only. It's not available over the Infobip web interface.

When creating an IVR scenario to capture end-user’s spoken input (using the Capture IVR action) make sure to follow these steps:

  1. One of the most important steps when setting up Speech Recognition using the Capture action is to choose the language you expect your end user to be speaking.
  2. After that, you can set up some key phrases that need to be matched, as well as additional available parameters such as timeoutsilence timeout, etc.
  3. You can also choose not to match a key phrase but rather capture everything your end user will say and send that input as text to your platform via API using an additional IVR action.

To learn more on how to preconfigure an IVR scenario and use Speech Recognition, make sure to check out our  API documentation (opens in a new tab).

Best practices

Here are some tips and tricks on how to better use Speech Recognition.

Tip 1 - Know your use case

To use Speech Recognition efficiently, you need to take a few factors into consideration:

  1. Always make an effort to think about your specific use case and tailor it to your end-users’ needs.
  2. Use the most specific variant of the language that is supported in IVR.
  3. Assume there might be differences in phone and call quality. Have a plan for how to reach end users in certain hard-to-reach areas.
  4. Expect background noise. Anticipate end users might be on their phones while walking in the street, crowded area, during rush hour, etc.
  5. Prepare short and concise questions. What kind of replies do you expect to receive? Don't use complex terms. Use simple language so your end users can understand you.
  6. Be mindful of the speech recognition duration period. Is it reasonable to expect a reply in 5, 10, or 20 seconds?

Tip 2 - Tweak timeout and maxSilence carefully

One of the most important things - consider the time end users need to reply to your question.

Let's consider the following use case for example:

json
 
    {
      "capture": "myVar",
      "timeout": 10,
      "speechOptions": {
        "language": "en-US",
        "keyPhrases": ["info", "update"]
      }
    }
 

In this example, IVR option is set to 10 seconds. You expect to capture the following phrases ("info" or "update) in those 10 seconds. Now you test and tweak. Is 10 seconds enough, too long? This depends on multiple factors such as word length, ends-user's speech pattern, and reaction time.

Obviously, you don't want end users to have to wait too long to move on to the next action, nor to get upset for not having enough time to say the keyword.

What if you use maxSilence (as shown below)?

json
 
    {
      "capture": "myVar",
      "timeout": 10,
      "speechOptions": {
        "language": "en-US",
        "maxSilence": 3,
        "keyPhrases": ["info", "update"]
      }
    }
 

Here, the end user has 10 seconds with a 3 seconds maxSilence option. This means if IVR detects 3 seconds of silence, it stops capturing.

However, if your end user said the keyword within a second, they will have to wait 2 seconds before moving on to the next action in IVR.

What if the end user stops to think and makes a pause longer than 3 seconds? In this case, IVR will not capture user's input.

Obviously, it is hard to offer general advice that works for everyone but our suggestions are:

  • When possible, ask simple questions and suggest short and simple answers to your users
  • Try different variations of timeout and maxSilence and test speech capturing behavior before you offer it to your end users

Tip 3 - Use simple key phrases

The keyPhrases matches captured speech with the provided text, and it branches the IVR scenario based on that match.

Example:

json
 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["yes, I want more info", "no, I do not want more info", "I am not sure"]
      }
    },
    {
      "if": "${myVar == 'yes, I want more info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

Here's where you have to be careful. You can't expect your end users to always say the exact keyphrase.

Let's say your end user says 'Yes, give more info' instead of the keyphrase you defined - 'Yes, I want more info'. Since the keyphrase is not matched, the IF expression is not triggered.

The longer the keyphrase, the lower the odds the end user matches that specific keyphrase successfully.

Example:

json
 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["more info", "exit", "maybe"]
      }
    },
    {
      "if": "${myVar == 'more info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

We've made a few tweaks to the example above. Now, the probability to match the keyphrase is higher.

The keyphrase does not limit the end-users's speech. End users can say a lot more than just a keyphrase. However, the keyphrase is the key to correctly match and branch the IVR scenario.  For example, if the end user says: "Yes, I want more info" or "Yes, give me more info", in both cases the keyphrase will be matched because myVar is more info.

If you are interested in full captured speech you can try using the myVar_Full variable (note that 'myVar' are just our examples, you can name these any way you like).

Finally, both examples can be reduced to even shorter phrases:

json
 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["yes", "no", "maybe"]
      }
    },
    {
      "if": "${myVar == 'yes'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

Keyphrases are always matched by comparing fully captured speech text and provided keyphrases.

Let's see the final example in this tip:

json
 
    {
      "say": "Do you want discount?"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["discount"]
      }
    },
    {
      "if": "${myVar == 'discount'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

Be mindful of how you select specific words for your keyphrases. Let's say you set your keyphrase to be "discount".

You have one end user say: "No, I don't want a discount" and another end user says: "Yes, I want a discount". See the problem?

In both cases, the keyphrase will match and both end users will receive more details about the discount. This is why you should always use short, clear, and simple keyphrases and think about all possible use cases to prevent confusion.

Tip 4 - Capturing starts after sound signal (beep)

Let your end users know they should start speaking after they hear a sound signal (beep).

Example:

json
 
    {
      "say": "If you want more info, say: info. If you want to reach our agent, say: agent"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["info", "agent"]
      }
    },
    {
      "if": "${myVar == 'info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

If the end user says "info" before the first audio message is played completely, nothing will be captured. If this happens in your use case, you can suggest the end user to start speaking after the sound signal.

json
 
    {
      "say": "If you want more info, after the beep signal say: info. If you want to reach our agent, after the beep signal say: agent"
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "maxSilence": 3,
          "keyPhrases": ["info", "agent"]
      }
    },
    {
      "if": "${myVar == 'info'}",
      "then": [
        {
          "say": "Ok. I will send you more details"
        }
      ],
      "else": []
    }
 

Tip 5 - Provide DTMF failover

Sometimes it's difficult to capture voice input, for example, when end users speak with a heavy accent, have speech difficulties, poor signal strength, or echo. In these cases, use DTMF (opens in a new tab)as a failover option.

This means that they can answer in the form of speech input or by pressing predefined digits on their phone keypad.

json
 
    {
      "say": "Say discount or press 1 to get discount. Say exit or press 0 to exit."
    },
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {
          "maxInputLength": 1
       }
    },
    {
      "if": "${myVar == 'discount' || myVar == '1'}",
      "then": [
        {
          "say": "You will get discount"
        }
      ],
      "else": [
        {
          "say": "Goodbye"
        }
      ]
    }
 

The current example shows the end user can tap 1 or 0 instead of speaking.

If there's a DTMF failover, it has priority over Speech Recognition. For example, if the end user says something and taps 1, then myVar will be recorded instead of speech.

However, if the end user says "discount" as a keyphrase and doesn't tap anything on their keypad, myVar and myVar_Full is captured as usual.

Tip 6 - Use maxInputLength for capturing DTMF

When using DTMF failover by pressing digits on the phone keypad, IVR has to decide when to stop capturing.

Let's revisit the previous example which is slightly modified to demonstrate the point:

json
 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {}
    }
 

By not having maxInputLength the IVR doesn't know how many digits are going to be pressed, so it will wait for timeout expiration.

In this example, IVR will wait 5 seconds to capture all digits pressed. By using maxInputLength, IVR can stop capturing and proceed with the scenario as soon as it captures the specified number of digits.

As soon as the first digit is pressed, the IVR can continue the scenario if maxInputLength is 1. timeout is still respected in this case, meaning that end user has 5 seconds to say something or press a digit.

Example:

json
 
    {
      "capture": "myVar",
      "timeout": 5,
      "speechOptions": {
          "language": "en-US",
          "model": "DEFAULT",
          "keyPhrases": ["discount", "exit"]
      },
      "dtmfOptions": {
          "maxInputLength": 1
       }
    }
 

Speech recognition supported languages

During the Early Access phase, we will support the languages listed in the table below (and will expand the list in the near future).

Check out the table to see which abbreviation you need to use when selecting a specific language in the API request.

LanguageAbbreviation
Afrikaans (South Africa)af-ZA
Albanian (Albania)sq-AL
Amharic (Ethiopia)am-ET
Arabic (Algeria)ar-DZ
Arabic (Bahrain)ar-BH
Arabic (Egypt)ar-EG
Arabic (Iraq)ar-IQ
Arabic (Israel)ar-IL
Arabic (Jordan)ar-JO
Arabic (Kuwait)ar-KW
Arabic (Lebanon)ar-LB
Arabic (Libya)ar-LY
Arabic (Morocco)ar-MA
Arabic (Oman)ar-OM
Arabic (Qatar)ar-QA
Arabic (Saudi Arabia)ar-SA
Arabic (State of Palestine)ar-PS
Arabic (Syria)ar-SY
Arabic (Tunisia)ar-TN
Arabic (United Arab Emirates)ar-AE
Arabic (Yemen)ar-YE
Armenian (Armenia)hy-AM
Azerbaijani (Azerbaijan)az-AZ
Basque (Spain)eu-ES
Bengali (Bangladesh)bn-BD
Bengali (India)bn-IN
Bosnian (Bosnia and Herzegovina)bs-BA
Bulgarian (Bulgaria)bg-BG
Burmese (Myanmar)my-MM
Catalan (Spain)ca-ES
Chinese (Cantonese, Traditional)zh-HK
Chinese (Mandarin, Simplified)zh-CN
Chinese (Mandarin, Taiwan)zh-TW
Croatian (Croatia)hr-HR
Czech (Czech Republic)cs-CZ
Danish (Denmark)da-DK
Dutch (Belgium)nl-BE
Dutch (Netherlands)nl-NL
English (Australia)en-AU
English (Canada)en-CA
English (Ghana)en-GH
English (Great Britain)en-GB
English (Hong Kong)en-HK
English (India)en-IN
English (Ireland)en-IE
English (Kenya)en-KE
English (New Zealand)en-NZ
English (Nigeria)en-NG
English (Pakistan)en-PK
English (Philippines)en-PH
English (Singapore)en-SG
English (South Africa)en-ZA
English (Tanzania)en-TZ
English (US)en-US
Estonian (Estonia)et-EE
Filipino (Philippines)fil-pH
Finnish (Finland)fi-FI
French (Belgium)fr-BE
French (Canada)fr-CA
French (France)fr-FR
French (Switzerland)fr-CH
Galician (Spain)gl-ES
Georgian (Georgia)ka-GE
German (Austria)de-AT
German (Germany)de-DE
German (Switzerland)de-CH
Greek (Greece)el-GR
Gujarati (India)gu-IN
Hebrew (Israel)he-IL
Hindi (India)hi-IN
Hungarian (Hungary)hu-HU
Icelandic (Iceland)is-IS
Indonesian (Indonesia)id-ID
Irish (Ireland)ga-IE
Italian (Italy)it-IT
Italian (Switzerland)it-CH
Japanese (Japan)ja-JP
Javanese (Indonesia)jv-ID
Kannada (India)kn-IN
Kazakh (Kazakhstan)kk-KZ
Khmer (Cambodia)km-KH
Korean (South Korea)ko-KR
Lao (Laos)lo-LA
Latvian (Latvia)lv-LV
Lithuanian (Lithuania)lt-LT
Macedonian (North Macedonia)mk-MK
Malay (Malaysia)ms-MY
Malayalam (India)ml-IN
Maltese (Malta)mt-MT
Marathi (India)mr-IN
Mongolian (Mongolia)mn-MN
Nepali (Nepal)ne-NP
Norwegian Bokmål (Norway)no-NO
Persian (Iran)fa-IR
Polish (Poland)pl-PL
Portuguese (Brazil)pt-BR
Portuguese (Portugal)pt-PT
Punjabi (Gurmukhi India)pa-Guru-IN
Romanian (Romania)ro-RO
Russian (Russia)ru-RU
Serbian (Serbia)sr-RS
Sinhala (Sri Lanka)si-LK
Slovak (Slovakia)sk-SK
Slovenian (Slovenia)sl-SI
Spanish (Argentina)es-AR
Spanish (Bolivia)es-BO
Spanish (Chile)es-CL
Spanish (Colombia)es-CO
Spanish (Costa Rica)es-CR
 Spanish (Cuba)es-CU
Spanish (Dominican Republic)es-DO
Spanish (Ecuador)es-EC
Spanish (El Salvador)es-SV
Spanish (Equatorial Guinea)es-GQ
Spanish (Guatemala)es-GT
Spanish (Honduras)es-HN
Spanish (Mexico)es-MX
Spanish (Nicaragua)es-NI
Spanish (Panama)es-PA
Spanish (Paraguay)es-PY
Spanish (Peru)es-PE
Spanish (Puerto Rico)es-PR
Spanish (Spain)es-ES
Spanish (USA)es-US
Spanish (Uruguay)es-UY
Spanish (Venezuela)es-VE
Sundanese (Indonesia)su-ID
Swahili (Kenya)sw-KE
Swahili (Tanzania)sw-TZ
Swedish (Sweden)sv-SE
Tamil (India)ta-IN
Tamil (Malaysia)ta-MY
Tamil (Singapore)ta-SG
Tamil (Sri Lanka)ta-LK
Telugu (India)te-IN
Thai (Thailand)th-TH
Turkish (Turkey)tr-TR
Ukrainian (Ukraine)uk-UA
Urdu (India)ur-IN
Urdu (Pakistan)ur-PK
Uzbek (Uzbekistan)uz-UZ
Vietnamese (Vietnam)vi-VN
Zulu (South Africa)zu-ZA

IVR variables

Variables make your IVR scenarios dynamic and flexible. You can use them to personalize greetings, branch into different paths depending on a customer’s response, or connect your IVR with your digital ecosystem.

Variables are containers that temporarily store information, such as a caller’s name, customer ID, or the result of an API call.

IVR actions can read from these variables and write new values to them.Variables enable your IVR to:

  • Communicate with external systems.

  • Keep track of what happens during a call.

  • Personalize the caller experience.

Set a variable

To set variables in your IVR scenario, you can use these options:

Launch outbound IVR with variables

For outbound IVR, you can send variables in the HTTP request that starts the scenario. For example, you might include a customer’s name:

json
 
{
   "messages": [
       {
           "from": "41793026700",
           "destinations": [
               {
                   "to": "41793026727"
               }
           ],
           "scenarioId": "6298AA7707903A4ED680B436929681AD",
           "parameters": {
               "name": "John"
           }
       }
   ]
}
 

You can then reference the variable in your scenario, for example, in a Say action:

json
 
{
   "say": "Hello ${name}"
}

You can plug variables into your text using $``{name} as shown above. IVR supports variable expressions using the $``{} format, based on Unified Expression Language (UEL) (opens in a new tab).

Set Variable action

You can use the Set Variable action to update variables as the scenario runs. This variable is useful for creating or updating information during the call.

Parse API responses

When your scenario calls an external API, each field returned in the response becomes a variable you can use elsewhere in the scenario. For example, if you get an order status from your backend, you can reference it as a variable throughout the rest of the call.

Collect DTMF

When you use Collect action to capture some digits or the Capture action to capture the speech, the result will be stored in a variable that can be used later on in the scenario

Implicit variables

Every scenario includes built-in variables that are always available:

  • from: The caller’s number (inbound) or the from number (outbound).
  • to: The destination number.
  • startTime:The time the call started.
  • answerTime: The time the call was actually answered.
  • __externalMessageId: The unique call ID, set through the launch parameter and included in delivery reports.
NOTE

You can use implicit variables anywhere in the scenario, but you can’t change them.

Action variables

Some actions create their own special variables. For example, after making an external API call, these variables are available:

  • __apiResponseStatusCode: The HTTP response code, so you can decide what happens next if your API call fails or succeeds.
  • __callApiCurrentTimestamp: The timestamp when the API call started.

Link IVR to the delivery report

To connect your IVR scenario with backend systems set the clientCallbackData variable. This variable appears in the delivery report and can pass user IDs or other custom identifiers back to your monitoring system.

If you’re launching outbound IVR, you can alternatively include this data in your launch parameter called callbackData.

Variable behavior

  • Type: All variables are strings.
  • Nulls: There is no null—if you try to use a variable that hasn’t been set, you’ll get an empty string ("").
  • Expressions: You can use simple expressions, for example, ${digit == '2'} to resolve a pressed digit in an If action.

Best practices: IVR variables

  • Use ${variableName} to reference variables anywhere in your scenario.
  • Built-in variables give you info about the call, for free.
  • All variables are strings. Unset variables return an empty string.
  • You can pass variables when starting calls, set them mid-flow, or get them from API responses.

You can use variables to personalize and automate your call flows, making your IVR scenarios smarter and more connected.

Need assistance

Explore Infobip Tutorials

Encountering issues

Contact our support

What's new? Check out

Release Notes

Unsure about a term? See

Glossary
Service status

Copyright @ 2006-2025 Infobip ltd.

Service Terms & ConditionsPrivacy policyTerms of use