Voice and Video
Get started

Get started

Create an Infobip account

To use any Infobip product, you need to sign up (opens in a new tab) for a free account. See: Create an Account for more details.

If you already have an account and our other solutions, contact sales or support (opens in a new tab) to enable different Voice and Video capabilities based on your specific needs.

Free trial

Once you create an account, you are automatically enrolled in a free trial until you run out of allocated free credits.

Here are some specific details for Voice and Video to keep in mind during the free trial:

  1. You are entitled to:
    • 15 inbound calls from phone
    • 15 outbound calls to phones
    • 30 calls over webRTC
  2. Voice calls in the trial period can be directed only to a mobile phone number you verified during the signup process.
  3. Voice calls in the trial period are limited to a 5-minute duration
  4. Among all possible voice add-ons, Calls API Conferencing and webRTC Rooms are available during the trial. Contact Sales (opens in a new tab) if you need to have other options activated.
  5. Voice channel is available in more than 200 countries worldwide. Sign up and trial users are limited to the countries specified in this list (opens in a new tab). If you are interested in any country not specified in this list, contact our Sales (opens in a new tab) organization.

Upgrade to a paying account

At any time, you can decide to finish the free trial and upgrade your account. Use the Upgrade options button on the Home page to view available billing options.

Numbers and senders

For each of our voice solutions, you can select a phone number to be used as the caller ID. This caller ID may or may not be displayed as-is on the end-users' phone depending on the voice connection used to reach the destination. If you need to ensure your caller ID is always displayed, reach out to our Support team (opens in a new tab).

To process inbound calls with our voice solutions, you need to use Infobip voice numbers, unless your inbound calls will reach the Infobip platform over SIP or webRTC. You can lease voice numbers from Infobip and set up the desired voice action on these at any time.

See Voice Numbers for more details.

Inbound and outbound calls for self-signup customers

Outbound voice calls for self-signup users are available in 215 countries. Inbound voice calls for self-signup users are available in 30 countries worldwide.

Check out the list of supported countries (opens in a new tab).

If you are interested in one of the unsupported countries, contact our Sales (opens in a new tab) to help you with that.


Outbound calls to the United States are not enabled by default on self-signup accounts. To add the US as a destination, please reach out to our support (opens in a new tab) or Sales (opens in a new tab) to validate your use case(s) complies with the FCC's robot-calling restrictions.

Add-ons and options

You can combine different add-ons depending on your business needs with Voice and Video channel. To enable these add-ons, contact sales or support (opens in a new tab).


With the Recording add-on, you can record all voice and video communications, whether you chose to record the whole conversations or part of these.

You can activate Recording in different places depending on the voice and video solution you use:

  • During voice action setup on voice number: when configuring your Voice number, various voice actions (Forward to IVR, Forward to Phone, Forward to SIP, etc) come with the ability to automatically record the inbound calls and any child call that would be connected to it.
  • When creating a Broadcast over the voice channel.
  • When a new SIP trunk has been created over the portal, clicking on that trunk allows activating the recording of all traffic going through it.
  • When creating a new webRTC token for one of your webRTC users.
  • When using Calls API, the recording of Calls, Dialogs and Conferences can be started and stopped at any time with the recording API methods.
  • When using Click to Call, IVR, or Advanced Voice Messages APIs, as optional parameters for your requests.

The complete Recording facility is made of 3 complementary add-ons:

  • Recording: whether your account is allowed to trigger voice and video call recordings.
  • Recording Storage: required if your account does not use our SFTP facility.
  • Video composition: required if you plan to have video conferences or rooms with multiple participants, and when all participants' recordings should be merged into a single media file.

Certain types of Voice and Video calls and call recordings might be subjected to specific country regulations. Before you set up and start using Voice and Video, make sure you've checked the country's telecom regulations.

Infobip Cloud or SFTP

You may choose to have all your voice and video recordings stored on Infobip's own cloud storage, or immediately pushed to your SFTP server once the recording is complete.

Your SFTP server address and credentials can only be set up in the web interface in the Settings section under the Recording section of the Voice channel application (opens in a new tab).


If your SFTP server is unreachable, recording files will be discarded and not stored on the Infobip cloud storage.

Retrieve voice and video recordings from your account

You can find Voice and video recordings on your account, under the Recordings section of the Voice channel application (opens in a new tab).

Recordings are split into 3 different categories:

CallsRecordings of single-leg calls, typically used by:
  1. Calls API applications implementing any user-defined scenario where the end user is only interacting with the application
  2. Voice message recording
  3. IVR recordings
ConferencesRecordings of multi-party calls (2 participants or more), typically used by:
  1. Calls API applications using the connect or conference API methods to connect 2 or more participants
  2. Infobip Conversations, our Contact Center as a Service solution. Note that Conversations customers are strongly advised to refer to their recordings from Conversations as the Voice recording page does not include any metadata (conversationId, agent name or Id, etc.) related to Conversations.
  3. WebRTC 1-on-1 and webRTC Rooms.
  4. Recordings from voice messages and IVR, when a call forwarding was performed and 2 participants were connected.
DialogsRecordings of 2 party calls, typically used by:
  1. Calls API applications using the Dialog API method to bridge 2 calls together.
  2. Recordings from Forward to Phone setups.
  3. Recordings from Forward to IP setups.
  4. Recordings from Number Masking sessions.
  5. Recordings from IVR, when a call forwarding was. performed and 2 participants were bridged together and hangup propagation was activated.

For a limited period, you may find recordings performed over SIP trunks (Forward to IP action), Voice Messages, IVR, Number Masking, and Click to Call under the Analyze/Recordings section of your account.

We are gradually transitioning these recordings to a new Voice recording page.

Retrieve voice and video recordings via API

You can retrieve recordings in 2 ways, depending on the Voice API you are using:

  1. For recordings performed with Calls API, see our related product documentation.
  2. For recordings performed with IVR API, check our dedicated search and download (opens in a new tab) methods.

Answering Machine Detection

Answering Machine Detection (AMD) is a feature that determines whether a machine (voicemail answering machine) or a human answered the call. AMD can be applied to automated outbound calls (outbound IVR, Text-to-Speech, pre-recorded automated calls, or click-to-call).

AMD can be used in the web interface (over Broadcast or Moments using Flow) or API.

Here is a diagram showing how Answering Machine Detection works as a feature on the Voice platform.

Voice and Video - Answering machine detection diagram

For this feature to work, you have to configure what you want to do when an answering machine answers the call you initiated for your customer. Whether to hang up the call or continue and your message might end up in the end-user's voicemail.

AMD was programmed with a detection time of 4 seconds minimum. If the AMD detects silence after the call has been answered, it will interpret it as if it was an answering machine. However, if any noise is detected once the call has been answered, AMD will interpret that as if a human answered the call.

Our AMD mechanism is 95% accurate for Spanish and Portuguese languages in countries like Spain, Colombia, Mexico, Peru, and Brazil respectively. For other markets, accuracy is around 80%, with constant work on improving the model.

By having AMD, you will save money by avoiding leaving voice messages on people's answering machines. For example, it wouldn't be a good idea to leave private voice messages, such as one-time PINs, on the end-user's voicemail.

On the other hand, if you want your end users to hear your message later in their voicemail, configure AMD to play the voice message anyway. In this case, when an answering machine is detected, we will wait for the answering machine message to finish, and then play your message. And then it will be saved in your end-user's voicemail.

SIP trunking

This add-on is required to use SIP trunks.


The Conferencing add-on is required to use Calls API Conferences and WebRTC Rooms.


Text-to-speech (TTS (opens in a new tab)) is used to convert a written message into an audio file. That file is then played to your customers over the voice & video product or API triggering its usage. You can use it for both promotional and transactional traffic. With this feature, you don't have to bother with pre-recorded audio. Additionally, you save time and are able to speed up your go-to-market strategy.

For text-to-speech conversion, we offer more than 100 languages and accents. A full list of supported languages is shown in the Speech languages reference.

Speech Synthesis Markup Language (SSML) is supported with text-to-speech. For more information, see SSML support.

Speech capture

Infobip Speech Capture feature collects end-user’s speech and returns a text with the recognized content. This is currently only available via API in:

  • IVR API scenarios, with the capture action type
  • Calls API, with the Capture speech method

For more information on the supported languages for speech recognition, see the full list in Speech recognition languages. The reference lists the abbreviation you need to use when selecting a specific language in the API request.

Enhance recognition for specific words or expressions

Depending on the API you use for Speech recognition, you might be offered the opportunity to define key phrases or hints. Key phrases are used to match captured speech.

If the full captured text contains one of the specified phrases, that phrase will be highlighted on the outcome of your Speech recognition action. Each key phrase can contain up to five words and the number of key phrases is unlimited.

Audio streaming

Audio streaming is a feature of our Calls API platform which allows duplicating (forking) the audio of a call towards an external service of your choice using websocket as the transport protocol.

Need assistance

Explore Infobip tutorials

Encountering issues

Contact our support

What's new? Check out

Release notes

Unsure about a term? See


Research panel

Help shape the future of our products
Service Terms & ConditionsPrivacy policyTerms of use