ESP32-CAM Image Classification using Machine Learning

This tutorial covers how to implement an ESP32-CAM Image classification system using Machine Learning. The ESP32-CAM has the capability to acquire video and images, we will use this capability to classify images using machine learning. Mixing the ESP32-CAM vision capability with cloud machine learning, in this tutorial, we will bring the power of the computer vision to a tiny device.

Machine Learning image classification is the task of extracting information from an image using a trained model.

In order to classify an image, the ESP32-CAM will connect to a cloud machine learning platform named Clarifai.com (you can create an account for free).

How the ESP32-CAM Image classification works

These are the setps to:

  1. Acquire images using ESP32-CAM
  2. Encode the image in base64
  3. Invoke an API exposed by the machine learning cloud platform, sending the image acquired by the ESP32
  4. Parse the response and extract the information

The advantage of this method is that it is not necessary to train a model to classify the images by ourselves, but the ESP32-CAM uses a pre-trained model built by Clarifai. This machine learning model is capable to identify and classify more than 10,000 concepts. From the images captured by the ESP32-CAM, applying image recognition, it is possible to extract information such as:

  • if there is a person or not
  • indoor or outdoor
  • objects
  • moods

and much more. Image recognition is an important branch of computer vision.

If this is the first time you use ESP32-CAM, you should read how to stream video using ESP32-CAM. In this article the ESP32-CAM uses an external machine learning system to classify images. If you want to run directly the machine learning engine on your device, you have to read how to use Tensorflow lite with ESP32.

Let’s start!

Initializing the ESP32-CAM

The first step is initializing the ESP32-CAM. This tutorial uses PlatformIO as IDE, but you can use other IDEs if you like.

Create a new file name ESP32-Vision.ino and add the following lines:

#include "Arduino.h"
#include "esp_camera.h"
#include <WiFi.h>

// Select camera model
//#define CAMERA_MODEL_WROVER_KIT // Has PSRAM
//#define CAMERA_MODEL_ESP_EYE // Has PSRAM
//#define CAMERA_MODEL_M5STACK_PSRAM // Has PSRAM
//#define CAMERA_MODEL_M5STACK_WIDE	// Has PSRAM
#define CAMERA_MODEL_AI_THINKER // Has PSRAM
//#define CAMERA_MODEL_TTGO_T_JOURNAL // No PSRAM

#include "camera_pins.h"

const char* ssid = "your_ssid";
const char* password = "wifi_password";

void setup() {
  Serial.begin(9600);
  Serial.setDebugOutput(true);
  Serial.println();

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  
  // if PSRAM IC present, init with UXGA resolution and higher JPEG quality
  //                      for larger pre-allocated frame buffer.
  if(psramFound()){
    config.frame_size = FRAMESIZE_QVGA;
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_QVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;
  }

#if defined(CAMERA_MODEL_ESP_EYE)
  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);
#endif

  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }


#if defined(CAMERA_MODEL_M5STACK_WIDE)
  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);
#endif

  WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.println("WiFi connected");
  
  classifyImage();
 
}
Code language: PHP (php)

Even this code seems complex, it is quite simple. First, it is necessary to select your CAM type. Change it according to your ESP32-CAM. Then, we have to set the wifi ssid and the wifi passord so that it connects to the WiFi.

It is important to notice that we have reduced the camera resolution because the base64 encoding requires a lot of memory. Anyway, we do not need a higher resolution to recognize the image.

More useful resources:
How to use ESP32-CAM and Tensorflow.js to detect objects

Acquiring a picture using ESP32-CAM

Next, we can capture the image we want to classify. Add the following lines:

void classifyImage() {
  
  // Capture picture
   camera_fb_t * fb = NULL;
   fb = esp_camera_fb_get();
   
   if(!fb) {
    Serial.println("Camera capture failed");
    return;
   }

  size_t size = fb->len;
  String buffer = base64::encode((uint8_t *) fb->buf, fb->len);
  ....
}
Code language: PHP (php)

camera_fb_t holds the picture information and the data representing the image captured. Using the method esp_camera_fb_get(), the ESP32-CAM captures the image.

Finally, we encode in base64 the image. fb->buf holds the data and fb->len is the buffer size. Moreover, add the following line at the beginning of the file:

#include <base64.h>Code language: CSS (css)

Applying image recognition using ESP32-CAM

Once the image is captured, the next step is recognize the image and extract information from it. Even if it is possible to use machine learning model running on ESP32, we want to use a cloud machine learning platform that uses pre-trained models. To achieve it, it is necessary to invoke an API and send the encoded image. In the classifyImage method add the following lines:

  String payload = "{\"inputs\": [{ \"data\": {\"image\": {\"base64\": \"" + buffer + "\"}}}]}";

  buffer = "";
  // Uncomment this if you want to show the payload
  // Serial.println(payload);

  esp_camera_fb_return(fb);
  
  // Generic model
  String model_id = "aaa03c23b3724a16a56b629203edc62c";

  HTTPClient http;
  http.begin("https://api.clarifai.com/v2/models/" + model_id + "/outputs");
  http.addHeader("Content-Type", "application/json");     
  http.addHeader("Authorization", "Key your_key"); 
  int response_code = http.POST(payload);
Code language: JavaScript (javascript)

The code in line 10 is the model id, we want to use to recognize the image. By now do not consider in line 14 the key. It is an authorization key. We will see later how to get it.

Add this line at the beginning:

#include <HTTPClient.h>Code language: CSS (css)

Adding computer vision capability to the ESP32-CAM

After we have sent the base64 image to the machine learning cloud platform, we get the response with all the concepts extracted from the image.

Concepts are labels that are used to classify the image and recognize it. Using the labels, we get an image description. Each label has a probability.

Add to the previous method the following lines:

// Parse the json response: Arduino assistant
const int jsonSize = JSON_ARRAY_SIZE(1) + JSON_ARRAY_SIZE(20) + 3*JSON_OBJECT_SIZE(1) + 6*JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 20*JSON_OBJECT_SIZE(4) + 2*JSON_OBJECT_SIZE(6);
DynamicJsonDocument doc(jsonSize);
deserializeJson(doc, response);

for (int i=0; i < 10; i++) {
  const name = doc["outputs"][0]["data"]["concepts"][i]["name"];
  const float p = doc["outputs"][0]["data"]["concepts"][i]["value"];
    
  Serial.println("=====================");
  Serial.print("Name:");
  Serial.println(name[i]);
  Serial.print("Prob:");
  Serial.println(p);
  Serial.println();
}Code language: PHP (php)

In this code, we use the ArduinoJson library to parse the output. Moreover, add the following line at the file top:

#include <ArduinoJson.h>Code language: CSS (css)

Finally, make the ESP32-CAM sleep waiting for the pression of the reset button:

Serial.println("\nSleep....");
esp_deep_sleep_start();Code language: JavaScript (javascript)

Test the image recognition

We are ready to test how the image classification works with our ESP32-CAM. After uploading the sketch into the ESP32-CAM, you have to press the reset button to start the image recognition process.

To visualize the image captured, you can use a base64 to image decoder passing the encoded byte stream representing the image.

These are some examples:

esp32-cam image classificationesp32-cam computer vision

The ESP32-CAM has correctly identified all the concepts: flower, no person, nature and leafs.

This is another example:

esp32-cam visionESP32-CAM image classification

Notice all the information extracted from the image: no person, ball, recreation, soccer. As you can see applying computer vision to the ESP32-CAM we can extract interesting concepts from an image. The ESP32 cam is capable to identify the image correctly.

Testing machine learning model using food

In this last example, we will test the ESP32-CAM image recognition using foods. Therefore, it is necessary to change the model. If you are wondering where the models are, you can use this link.

Change the model id in the previous code commenting the old model and the this line:

  // Generic model
  //String model_id = "aaa03c23b3724a16a56b629203edc62c";
  // Food model
  String model_id = "bd367be194cf45149e75f01d59f77ba7";Code language: JavaScript (javascript)

Then upload the sketch again into the ESP32-CAM and verify how the ESP32-CAM recognizes objects using the new machine learning models:

Apply machine learning to ESP32-CAMimage classification using machine learning and ESP32-CAM

As stated in the previous image, the probability that it is an apple is 96%. Therefore, the ESP32-CAM has identified the image correctly again.

Finally, the last example:

ESP32-cam using computer visionMachine learning object detection with ESP32-CAM

Wrapping up

At the end of this tutorial, we have discovered how to implement ESP32-CAM image classification using a cloud machine learning API provided by Clarifai. We have demonstrated how easy it is to implement a computer vision system based on ESP32-CAM. The integration between the ESP32-CAM capability and the machine learning model can make this tiny device to detect objects and recognize them. Moreover, using computer vision, we have extracted image information using the machine learning pre-trained models.

    1. HENIO OLIVEIRA DOS REIS June 22, 2020
      • Francesco Azzola June 23, 2020
        • Henio Reis July 5, 2020
    2. HENIO OLIVEIRA DOS REIS July 5, 2020
      • Francesco Azzola July 6, 2020
    3. Zhi Zheng August 4, 2020
      • Francesco Azzola August 5, 2020
        • Zhi Zheng August 7, 2020
        • Zhi Zheng August 8, 2020
    4. Esp32 October 14, 2020
    5. Youngju Lee December 3, 2020
    6. Paulo Borges December 14, 2020
    7. Paulo Borges December 14, 2020
    8. Paulo Borges December 14, 2020
    9. Paulo Borges December 14, 2020
    10. Paulo Borges December 14, 2020
    11. Paulo Borges December 14, 2020
    12. Paulo December 16, 2020
    13. Sourav January 22, 2021
      • Francesco Azzola January 23, 2021
    14. ESP January 23, 2021
      • Francesco Azzola January 23, 2021
    15. lucas roth November 10, 2021
    16. frealluv December 29, 2022
    17. ririn January 13, 2023
    18. Jatin March 31, 2023

    Add Your Comment