Let AI and WhatsApp add some fun to your photos
AI has numerous applications, ranging from highly practical to just plain fun. In this tutorial, you'll learn how to upload a photo from an end-user's WhatsApp app and have an LLM humorously roast it in the style of Monty Python, sending the response back via WhatsApp.
Prerequisites
- Infobip account. If you do not have one, you can easily register.
- JVM environment setup. Kotlin is used as a programming language in this tutorial.
- WhatsApp installed on your phone.
- Publicly accessible web server so that Infobip can send webhook requests to your app. If you're developing locally, you can use a tool like ngrok.
- A solution capable of detecting photo content and giving a comment. This tutorial shows how to use OpenAI's GPT-4o for both tasks. Step 5 provides more information.
Java dependencies
We used spring-web for exposing an endpoint, okttp3 for talking with the Infobip and OpenAI APIs, and jackson for serialization/deserialization:
Imports that will be needed for the entire tutorial...
... and for starting your application:
Implementation
Step 1: Handle the request from Infobip [#step-1-handle-the-request-implementation]
For Infobip to forward the WhatsApp messages, expose an endpoint that will handle those requests. The payload you will get is explained here).
Step 2: Configure your sender and webhook URL [#step-2-configure-your-sender-and-webhook-url-implementation]
Now that a working endpoint is exposed using ngrok or a similar tool, Infobip can access it. Be sure to inform Infobip of your endpoint's URL.
After completing the signup process, you can manage your sender numbers in your Infobip account.
We highly recommend registering your sender. However, for simplicity, you can use the shared number and your default keyword, which corresponds to your username. In the screenshot above, the keyword is PDUCICUSECASESWORKSHOP.
Next, configure your keyword to set the endpoint where Infobip will forward messages received by your sender (or the shared sender, along with your keyword).
Now, you can test sending a photo to your shared sender number, including your keyword. The keyword is only required for the first message and can be omitted in subsequent ones.
Step 3: Download the photo [#step-3-download-the-photo-implementation]
Once you have the photo URL, it's time to download the photo. To do this, use the Download inbound media endpoint.
To make this work, configure your Infobip API key and set the necessary permissions.
Step 4: Use your API key and base URL [#step-4-use-api-key-and-base-url-implementation]
Head over to your Infobip account and copy the auto-generated API key and your base URL. Read more about the base URL here.
If there is no option to copy the API key, you can create a new API key with the appropriate API scopes to cover all the API calls needed for this tutorial:
- inbound-message:read
- whatsapp:inbound-message:read
- whatsapp:manage
- message:send
- whatsapp:message:send
Read more about API scopes here.
Step 5: Get the photo explanation [#step-5-get-the-photo-explanation-implementation]
For the photo explanation, we used the GPT-4o vision capability. To do so, you first need to create an account on the OpenAI platform. Try playing with your prompt to get the most appropriate answer for you.
Here are also some alternatives to ChatGPT4, available at the time of writing this tutorial:
And some LLMs that could roast your photos:
Step 6: Send the WhatsApp message [#step-6-send-the-whatsapp-message-implementation]
After receiving the comment, the only task remaining is to send the response to the end user.
Step 7: Put it all together [#step-7-put-it-all-together-implementation]
Here is the complete Kotlin class, with all necessary classes and methods:
Be aware that in this tutorial, for the sake of simplicity, we did not do any proper logging or monitoring, nor did we handle error codes or exceptions. We strongly recommend checking this page to get familiar with both HTTP status codes and WhatsApp message status codes.
Additionally, we removed all unused fields from the classes to keep the code as short as possible. You can find all the fields provided by the Infobip platform on the API documentation pages mentioned above.
This is just one way to utilize AI and LLMs, but the possibilities are endless.