# Let AI and WhatsApp add some fun to your photos
---

AI has numerous applications, ranging from highly practical to just plain fun. In this tutorial, you'll learn how to upload a photo from an end-user's WhatsApp app and have an LLM humorously roast it in the style of Monty Python, sending the response back via WhatsApp.

## Prerequisites

1. [Infobip account](https://portal.infobip.com/login/?callback=https%3A%2F%2Fportal.infobip.com%2F). If you do not have one, you can easily [register](https://www.infobip.com/docs/essentials/getting-started/create-an-account).
2. JVM environment setup. Kotlin is used as a programming language in this tutorial.
3. WhatsApp installed on your phone.
4. Publicly accessible web server so that Infobip can send webhook requests to your app. If you're developing locally, you can use a tool like [ngrok](https://ngrok.com/).
5. A solution capable of detecting photo content and giving a comment. This tutorial shows how to use OpenAI's GPT-4o for both tasks. [Step 5](https://www.infobip.com/docs/tutorials/let-ai-and-whatsapp-add-some-fun-to-your-photos#step-5-get-the-photo-explanation-implementation) provides more information.

## Java dependencies

We used **spring-web** for exposing an endpoint, **okttp3** for talking with the Infobip and OpenAI APIs, and **jackson** for serialization/deserialization:

```kotlin
<dependencies>
<dependency>
<groupid>com.squareup.okhttp3</groupid>
<artifactid>okhttp</artifactid>
<version>4.12.0</version>
</dependency>
<dependency>
<groupid>org.springframework.boot</groupid>
<artifactid>spring-boot-starter-web</artifactid>
</dependency>
<dependency>
<groupid>com.fasterxml.jackson.module</groupid>
<artifactid>jackson-module-kotlin</artifactid>
</dependency>
</dependencies>
```

... and for starting your application:

```kotlin
@SpringBootApplication
class Application

const val INFOBIP_API_KEY = "<insert api="" infobip="" key="">"
const val BASE_URL = "https://api.infobip.com"
const val OPENAI_API_KEY = "<insert api="" key="" openai="">"

fun main(args: Array<string>) {
    SpringApplication.run(Application::class.java, *args)
}
```

## Implementation

### Step 1: Handle the request from Infobip [#step-1-handle-the-request-implementation]

For Infobip to forward the WhatsApp messages, expose an endpoint that will handle those requests. The payload you will get is explained [here](https://www.infobip.com/docs/api/channels/whatsapp/whatsapp-inbound-messages/receive-whatsapp-inbound-messages).

```kotlin
@PostMapping("/webhook")
fun handleInboundMessage(@RequestBody payload: WhatsappInboundMessagePayload) {
    val endUserPhoneNumber = payload.results[0].from
    val infobipPhoneNumber = payload.results[0].to
    val mediaUrl = payload.results[0].message.url

    println ("Got the message from $endUserPhoneNumber, sent to $infobipPhoneNumber with link to the photo $mediaUrl")
}
```

### Step 2: Configure your sender and webhook URL [#step-2-configure-your-sender-and-webhook-url-implementation]

Now that a working endpoint is exposed using [ngrok](https://ngrok.com/) or a similar tool, Infobip can access it. Be sure to inform Infobip of your endpoint's URL.

After completing the signup process, you can manage your sender numbers in your [Infobip account](https://portal.infobip.com/channels-and-numbers/channels/whatsapp/senders).

We highly recommend registering your sender. However, for simplicity, you can use the shared number and your default keyword, which corresponds to your username. In the screenshot above, the keyword is *PDUCICUSECASESWORKSHOP*.

Next, configure your keyword to set the endpoint where Infobip will forward messages received by your sender (or the shared sender, along with your keyword).

Now, you can test sending a photo to your shared sender number, including your keyword. The keyword is only required for the first message and can be omitted in subsequent ones.

### Step 3: Download the photo [#step-3-download-the-photo-implementation]

Once you have the photo URL, it's time to download the photo. To do this, use the [Download inbound media](https://www.infobip.com/docs/api/channels/whatsapp/whatsapp-inbound-messages/download-whatsapp-inbound-media) endpoint.

To make this work, configure your Infobip API key and set the necessary permissions.

```kotlin
private fun pullWhatsAppPhoto(mediaUrl: String): ByteArray {
      val client = OkHttpClient().newBuilder()
          .build()
      val request: Request = Request.Builder()
          .url(mediaUrl)
          .get()
          .addHeader("Authorization", "App $INFOBIP_API_KEY")
          .build()

      val execute = client.newCall(request).execute()
      return execute.body!!.bytes()
  }
```

### Step 4: Use your API key and base URL [#step-4-use-api-key-and-base-url-implementation]

Head over to your Infobip account and copy the auto-generated API key and your base URL. Read more about the base URL [here](https://www.infobip.com/docs/essentials/api-essentials/base-url).

If there is no option to copy the API key, you can create a new API key with the appropriate API scopes to cover all the API calls needed for this tutorial:

- inbound-message:read
- whatsapp:inbound-message:read
- whatsapp:manage
- message:send
- whatsapp:message:send

Read more about API scopes [here](https://www.infobip.com/docs/essentials/api-essentials/api-authorization#api-scopes).

### Step 5: Get the photo explanation [#step-5-get-the-photo-explanation-implementation]

For the photo explanation, we used the [GPT-4o vision capability](https://platform.openai.com/docs/guides/vision). To do so, you first need to create an account on the OpenAI platform. Try playing with your prompt to get the most appropriate answer for you.

```kotlin
private fun getPhotoExplanation(photoByteArray: ByteArray): String {
       val jacksonObjectMapper = jacksonObjectMapper()
       jacksonObjectMapper.setSerializationInclusion(JsonInclude.Include.NON_NULL)
       jacksonObjectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)

       val base64Image = Base64.getEncoder().encodeToString(photoByteArray)
       val imageContent = OpenAIContent(type = "image_url", image_url = ImageUrl("data:image/jpeg;base64,$base64Image"))
       val textContent = OpenAIContent(type = "text", text = "What’s in this image? Give a funny comment in Monty Python style")
       val openAIMessages = OpenAIMessage(role = "user", content = listOf(textContent, imageContent))
       val openAIRequest = OpenAIRoot(model = "gpt-4o", max_tokens = 300, messages = listOf(openAIMessages))
       val payload = jacksonObjectMapper.writeValueAsString(openAIRequest)

       val client = OkHttpClient()
       val request = Request.Builder()
           .url("https://api.openai.com/v1/chat/completions")
           .post(payload.toRequestBody("application/json".toMediaType()))
           .headers(mapOf("Authorization" to "Bearer $OPENAI_API_KEY").toHeaders())
           .build()
       val response = client.newCall(request).execute()
       val responseString = response.body!!.string()
       val myObject: OpenAIResponse = jacksonObjectMapper.readValue(responseString)
       return myObject.choices[0].message.content
   }
```

Here are also some alternatives to ChatGPT4, available at the time of writing this tutorial:

- [Pix2Struct](https://huggingface.co/docs/transformers/en/model_doc/pix2struct) ([PDF version](https://arxiv.org/pdf/2210.03347))
- [BLIP](https://huggingface.co/Salesforce/blip-image-captioning-base) ([PDF version](https://arxiv.org/pdf/2201.12086))

And some LLMs that could roast your photos:

- [Llama3](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- [Phi3](https://huggingface.co/docs/transformers/main/en/model_doc/phi3)

### Step 6: Send the WhatsApp message [#step-6-send-the-whatsapp-message-implementation]

After receiving the comment, the only task remaining is to send the response to the end user.

```kotlin
private fun sendWhatsappMessage(from: String, to: String, message: String) {
      val client = OkHttpClient().newBuilder().build()
      val whatsAppMessage = WhatsAppMessage(from, to, Content(message))
      val body = jacksonObjectMapper().writeValueAsString(whatsAppMessage)
          .toRequestBody("application/json".toMediaTypeOrNull())
      val request: Request = Request.Builder()
          .url("$BASE_URL/whatsapp/1/message/text")
          .post(body)
          .addHeader("Authorization", "App $INFOBIP_API_KEY")
          .build()
      client.newCall(request).execute()
  }
```

### Step 7: Put it all together [#step-7-put-it-all-together-implementation]

Here is the complete Kotlin class, with all necessary classes and methods:

```kotlin
@SpringBootApplication
class Application

const val INFOBIP_API_KEY = "<insert api="" infobip="" key="">"
const val BASE_URL = "https://qwerty.api.infobip.com"
const val OPENAI_API_KEY = "<insert api="" key="" openai="">"

fun main(args: Array<string>) {
    SpringApplication.run(Application::class.java, *args)
}

@RestController
class Controller {

    private val objectMapper = jacksonObjectMapper().apply {
        setSerializationInclusion(JsonInclude.Include.NON_NULL)
        configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
    }

    private val client = OkHttpClient()

    @PostMapping("/webhook")
    fun handleInboundMessage(@RequestBody payload: WhatsappInboundMessagePayload) {
        payload.results[0].run {
            val photoByteArray: ByteArray = pullWhatsAppPhoto(message.url)
            val photoComment = getPhotoExplanation(photoByteArray)
            sendWhatsappMessage(to, from, photoComment)
        }
    }

    private fun sendWhatsappMessage(from: String, to: String, message: String) {
        val body = objectMapper.writeValueAsString(WhatsAppMessage(from, to, Content(message)))
            .toRequestBody("application/json".toMediaTypeOrNull())
        val request: Request = Request.Builder()
            .url("$BASE_URL/whatsapp/1/message/text")
            .post(body)
            .addHeader("Authorization", "App $INFOBIP_API_KEY")
            .build()
        client.newCall(request).execute()
    }

    private fun pullWhatsAppPhoto(mediaUrl: String): ByteArray {
        val request: Request = Request.Builder()
            .url(mediaUrl)
            .get()
            .addHeader("Authorization", "App $INFOBIP_API_KEY")
            .build()

        return client.newCall(request).execute().body!!.bytes()
    }

    private fun getPhotoExplanation(photoByteArray: ByteArray): String {
        val base64Image = Base64.getEncoder().encodeToString(photoByteArray)
        val imageContent = OpenAIContent(type = "image_url", image_url = ImageUrl("data:image/jpeg;base64,$base64Image"))
        val textContent = OpenAIContent(type = "text", text = "What’s in this image? Give a funny comment in Monty Python style")
        val openAIMessages = OpenAIMessage(role = "user", content = listOf(textContent, imageContent))
        val openAIRequest = OpenAIRoot(model = "gpt-4o", max_tokens = 300, messages = listOf(openAIMessages))
        val payload = objectMapper.writeValueAsString(openAIRequest)

        val request = Request.Builder()
            .url("https://api.openai.com/v1/chat/completions")
            .post(payload.toRequestBody("application/json".toMediaType()))
            .headers(mapOf("Authorization" to "Bearer $OPENAI_API_KEY").toHeaders())
            .build()
        val response = client.newCall(request).execute()
        val responseString = response.body!!.string()
        val myObject: OpenAIResponse = objectMapper.readValue(responseString)
        return myObject.choices[0].message.content
    }
}

data class OpenAIRoot(
    val model: String,
    val messages: List<openaimessage>,
    val max_tokens: Int
)

data class OpenAIMessage(
    val role: String,
    val content: List<openaicontent>
)

data class OpenAIContent(
    val type: String,
    val text: String? = null,
    val image_url: ImageUrl? = null
)

data class ImageUrl(
    val url: String
)

data class OpenAIResponse(
    val choices: List<choice>,
)

data class Choice(
    val message: OpenAIResponseMessage,
)

data class OpenAIResponseMessage(
    val content: String
)

data class WhatsappInboundMessagePayload(
    val results: List<result>
)

data class Result(
    val from: String,
    val to: String,
    val message: Message
)

data class Message(
    val url: String
)

data class Price(
    val pricePerMessage: Int,
    val currency: String
)

data class WhatsAppMessage(
    val from: String,
    val to: String,
    val content: Content,
)

data class Content(
    val text: String
)
```

Be aware that in this tutorial, for the sake of simplicity, we did not do any proper logging or monitoring, nor did we handle error codes or exceptions.
We strongly recommend checking [this page](https://www.infobip.com/docs/essentials/api-essentials/response-status-and-error-codes) to get familiar with both HTTP status codes and WhatsApp message status codes.

Additionally, we removed all unused fields from the classes to keep the code as short as possible. You can find all the fields provided by the Infobip platform on the API documentation pages mentioned above.

This is just one way to utilize AI and LLMs, but the possibilities are endless.