How to use ESP32-CAM with Tensorflow.js

This tutorial describes how to use ESP32-CAM with Tensorflow.js. The idea that stands behind this tutorial is explaining how to capture an image with ESP32-CAM and process it with Tensorflow.js. Tensorflow.js is a library for machine learning in Javascript. This library can be used to run the machine learning in a browser.

In the previous post, we have covered how to compile Tensorflow for ESP32 and how to run Machine Learning models on ESP32 using Tensorflow Lite library for ESP32. This is project is different because the models don’t run on the ESP32-CAM but in the client browser using Tensorflow.js javascript library.

Project overview

Ths image below describes how to integrate ESP32-CAM with Tensorflow.js:

ESP32-CAM with Tensorflow.js to classify image

These are the main steps:

  1. The browser connects to the ESP32-CAM Web server requesting ts.html page
  2. The ESP32-CAM provides the ts.html page that holds all the HTML and javascript code to run Tensorflow.js
  3. The user clicks on capture image sending the request to the ESP32-CAM that sends back the captured image
  4. Tensorflow.js model runs on the user browser and classifies the image captured

Therefore, the ESP32-CAM has these tasks:

  • Stream video
  • Capture image
  • Provide the HTML page that will be shown in a browser that runs the Tensorflow.js machine learning models

In the end, the Tensorflow.js will apply the machine learning image classification model on the image captured by the ESP32-CAM.

HTML page and Tensorflows.js with ESP32-CAM

Let’s start from the HTML page with all the Javascript that is necessary to run the Tensorfow.js machine learning model with ESP32-CAM. This page holds the video stream coming from ESP32. As soon as the user clicks on capture image button this command is sent to ESP32-CAM that captures the image. This images is sent to the Tensorflow model that will classify it.

This the HTML page structure:

The source code is very simple:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>ESP32-CAM TensorflowJS</title>
    <style>
    body {
    font-family: 'PT Sans', sans-serif;
    background-color: #dde2e1;
    margin: 0;
    color: #636060;
    line-height: 1.6;
    }
    a {
    text-decoration: none;
    color: #ccc;
}
    h2 {
    display: block;
    font-size: 1.17em;
    margin-block-start: 1em;
    margin-block-end: 1em;
    margin-inline-start: 0px;
    margin-inline-end: 0px;
    font-weight: bold;
    }
    .container {
    max-width: 1180px;
    text-align: center;
    margin: 0 auto;
    padding: 0 3rem;
    }
    .btn {
    padding: 1rem;
    color: #fff;
    display: inline-block;
    background: red;
    margin-bottom: 1rem;
    }
    
    </style>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"> </script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"> </script>
    <script language="javascript">
        
        
        function classifyImg() {
           const img = document.getElementById('img1');
           const r = document.getElementById('results');
           r.innerHTML = '';
           
           console.log("Classify...");
           img.crossorigin = ' ';
           img.onload = function() {
                console.log('Wait to load..');
                mobilenet.load().then(model => {
                // Classify the image.
                model.classify(img).then(predictions => {
                  for (i in predictions) {
                    r.innerHTML = r.innerHTML + '<b>' + predictions[i].className + "</b> - " + predictions[i].probability + "<br/>";
                    img.onload = null;
                    img.src = 'http://192.168.1.121:81';
                  }
                    
                });
                
               });
           }
           img.src = 'http://192.168.1.121/capture?t=' + Math.random();
        }
    </script>
    
   </head>
   <body>
     <div class="container">
     <h2>TensorflowJS with ESP32-CAM</h2>    
     <section>
       <img id="img1" width="320" height="200" src='http://192.168.1.121:81' crossorigin style="border:1px solid red"/>
       <div id="results"/>
     </section>
     <section>
       <a href="#" class="btn" onclick="classifyImg()">Classify the image</a>
      </section>
      <section id="i" />
   </div>
   </body>
</html>Code language: HTML, XML (xml)

Including Tensorflow.js in the ESP32-CAM HTML page

The first step is including the Tensorflow.js library in the HTML page provided by the ESP32-CAM:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"> </script>
Code language: JavaScript (javascript)

Notice that we are using Mobilenet model to classify the image. There are other models you can use.

Run Machine Learning models using Tensorflow.js with ESP32-CAM

The next step is running the Tensorflow.js model on the image provided by the ESP32-CAM. To achieve it, we will use the javascript as shown below:

  function classifyImg() {
       const img = document.getElementById('img1');
       const r = document.getElementById('results');
       r.innerHTML = '';
           
       console.log("Classify...");
       img.crossorigin = ' ';
       img.onload = function() {
       mobilenet.load().then(model => {
          // Classify the image.
         model.classify(img).then(predictions => {
           for (i in predictions) {
              r.innerHTML = r.innerHTML + '<b>' + predictions[i].className + "</b> - " + predictions[i].probability + "<br/>";
              img.onload = null;
              img.src = 'http://192.168.1.121:81';
            }
                    
         });
     });
   }
   img.src = 'http://192.168.1.121/capture?t=' + Math.random();
 }
  Code language: JavaScript (javascript)

This is the javascript code description:

  1. the javascript code gets the reference of the image tag that will hold the image captured
  2. using image.onload method, the code waits until the image is loaded.
  3. Once the image is loaded, it loads the Tensorflow.js machine learning model
  4. Next, it applies the machine learning classification model to the image
  5. Then, it shows the labels extracted from the image

Before running all the javascript described above, we invoke the resource on the ESP32-CAM to capture the image:

img.src = 'http://192.168.1.121/capture?t=' + Math.random();Code language: JavaScript (javascript)

Notice the random number to break the browser cache.

Using this simple javascript we can use ESP32-CAM with Tensorflow.js where the ESP32-CAM provides the image that will be classified using machine learning models.

If you are wondering what happens when the Tensorflow.js classification process ends, the javascript code invokes again the stream video from ESP32-CAM.

Executing Tensorflow.js model with ESP32-CAM

The last step is executing all the javascript code shown before. To achieve is, we can simply trigger the classification process when the user clicks on the button:

<a href="#" class="btn" onclick="classifyImg()">Classify the image</a>Code language: HTML, XML (xml)

ESP32-CAM image capture to use with Tensorfow.js

We won’t cover in details the code to capture images to use with Tensorflow.js. The code is quite simple:

static esp_err_t capture_handler(httpd_req_t *req){
    Serial.println("Capture image");
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    fb = esp_camera_fb_get();
    if (!fb) {
        Serial.println("Camera capture failed");
        httpd_resp_send_500(req);
        return ESP_FAIL;
    }
    httpd_resp_set_type(req, "image/jpeg");
    httpd_resp_set_hdr(req, "Content-Disposition", "inline; filename=capture.jpg");
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
  
    res = httpd_resp_send(req, (const char *)fb->buf, fb->len);
    esp_camera_fb_return(fb);
    return res;
}Code language: C++ (cpp)

Implementing ESP32-CAM tasks to run Tensorflow.js

Finally, it is necessary to expose three different resources that can be invoked by the HTML page to run Tensorflow.js image classification with ESP32-CAM:

void startCameraServer(){
    httpd_config_t config = HTTPD_DEFAULT_CONFIG();
    httpd_uri_t index_uri = {
        .uri       = "/",
        .method    = HTTP_GET,
        .handler   = stream_handler,
        .user_ctx  = NULL
    };
    httpd_uri_t page_uri = {
        .uri       = "/ts",
        .method    = HTTP_GET,
        .handler   = page_handler,
        .user_ctx  = NULL
    };
    httpd_uri_t capture_uri = {
        .uri       = "/capture",
        .method    = HTTP_GET,
        .handler   = capture_handler,
        .user_ctx  = NULL
    };

    Serial.printf("Starting web server on port: '%d'\n", config.server_port);
    if (httpd_start(&camera_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(camera_httpd, &capture_uri);
        httpd_register_uri_handler(camera_httpd, &page_uri);
    }
    // start stream using another webserver
    config.server_port += 1;
    config.ctrl_port += 1;
    Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
    if (httpd_start(&stream_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(stream_httpd, &index_uri);
    }
}Code language: PHP (php)

Notice that there are three different resources:

  • /ts that will provide the HTML page that will implement ESP32-CAM with Tensorflow.js
  • /capture that is used to capture the image
  • / on port 81 to stream the video

If you are wondering why we are using two different ports one for the video stream and the other for the image capture, this is because it is not possible to use the same web server to accomplish both tasks.

If you are interested in using ESP32-CAM you can read how to use ESP32-CAM with Telegram to send images.

Full source code to run ESP32-CAM with Tensorflow.js

This is the full source code that you can use to run Tensorflow.js with ESP32-CAM in order to classify images:

#include <Arduino.h>
#include <WiFi.h>
#include "esp_http_server.h"
#include "esp_timer.h"
#include "esp_camera.h"
#include "img_converters.h"
#include "Arduino.h"
#include "camera_pins.h"
#include "page.h"
#define PART_BOUNDARY "123456789000000000000987654321"

static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";
httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;
const char* ssid = "<your_ssid>";
const char* password = "your_wifi_password";
static esp_err_t capture_handler(httpd_req_t *req){
    Serial.println("Capture image");
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    fb = esp_camera_fb_get();
    if (!fb) {
        Serial.println("Camera capture failed");
        httpd_resp_send_500(req);
        return ESP_FAIL;
    }
    httpd_resp_set_type(req, "image/jpeg");
    httpd_resp_set_hdr(req, "Content-Disposition", "inline; filename=capture.jpg");
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
  
    res = httpd_resp_send(req, (const char *)fb->buf, fb->len);
    esp_camera_fb_return(fb);
    return res;
}
static esp_err_t page_handler(httpd_req_t *req) {
    httpd_resp_set_type(req, "text/html");
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
    httpd_resp_send(req, page, sizeof(page));
}
static esp_err_t stream_handler(httpd_req_t *req){
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    size_t _jpg_buf_len = 0;
    uint8_t * _jpg_buf = NULL;
    char * part_buf[64];

    res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
    if(res != ESP_OK){
        return res;
    }
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
    while(true){
        fb = esp_camera_fb_get();
        if (!fb) {
            Serial.println("Camera capture failed");
            res = ESP_FAIL;
        } else {
            
                if(fb->format != PIXFORMAT_JPEG){
                    bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
                    esp_camera_fb_return(fb);
                    fb = NULL;
                    if(!jpeg_converted){
                        Serial.println("JPEG compression failed");
                        res = ESP_FAIL;
                    }
                } else {
                    _jpg_buf_len = fb->len;
                    _jpg_buf = fb->buf;
                }
             }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
        }
        if(res == ESP_OK){
            size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
            res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
        }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
        }
        if(fb){
            esp_camera_fb_return(fb);
            fb = NULL;
            _jpg_buf = NULL;
        } else if(_jpg_buf){
            free(_jpg_buf);
            _jpg_buf = NULL;
        }
        if(res != ESP_OK){
            break;
        }
    }
    return res;
}
void startCameraServer(){
    httpd_config_t config = HTTPD_DEFAULT_CONFIG();
    httpd_uri_t index_uri = {
        .uri       = "/",
        .method    = HTTP_GET,
        .handler   = stream_handler,
        .user_ctx  = NULL
    };
    httpd_uri_t page_uri = {
        .uri       = "/ts",
        .method    = HTTP_GET,
        .handler   = page_handler,
        .user_ctx  = NULL
    };
    httpd_uri_t capture_uri = {
        .uri       = "/capture",
        .method    = HTTP_GET,
        .handler   = capture_handler,
        .user_ctx  = NULL
    };

    Serial.printf("Starting web server on port: '%d'\n", config.server_port);
    if (httpd_start(&camera_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(camera_httpd, &capture_uri);
        httpd_register_uri_handler(camera_httpd, &page_uri);
    }
    // start stream using another webserver
    config.server_port += 1;
    config.ctrl_port += 1;
    Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
    if (httpd_start(&stream_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(stream_httpd, &index_uri);
    }
}
void setup() {
  Serial.begin(9600);
  Serial.setDebugOutput(true);
  Serial.println();
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  
  // if PSRAM IC present, init with UXGA resolution and higher JPEG quality
  //                      for larger pre-allocated frame buffer.
  if(psramFound()){
    config.frame_size = FRAMESIZE_QVGA;
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_QVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;
  }
#if defined(CAMERA_MODEL_ESP_EYE)
  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);
#endif
  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }
  sensor_t * s = esp_camera_sensor_get();
  // initial sensors are flipped vertically and colors are a bit saturated
  if (s->id.PID == OV3660_PID) {
    s->set_vflip(s, 1); // flip it back
    s->set_brightness(s, 1); // up the brightness just a bit
    s->set_saturation(s, -2); // lower the saturation
  }
  // drop down frame size for higher initial frame rate
  s->set_framesize(s, FRAMESIZE_QVGA);
#if defined(CAMERA_MODEL_M5STACK_WIDE)
  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);
#endif
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.println("WiFi connected");
  startCameraServer();
  Serial.print("Camera Ready! Use 'http://");
  Serial.print(WiFi.localIP());
  Serial.println("' to connect");
}
void loop() {
  // put your main code here, to run repeatedly:
  delay(10);
}Code language: C# (cs)

If you want to know more about this code you can read how to stream video using ESP32-CAM. If you like to display the image on your TFT display you can read how to use ESP32-CAM with TFT display.

Testing the image classification using ESP32-CAM and Tensorflow.js

Now we can upload the code into the ESP32-CAM and test how Tensorflow.js works with ESP32-CAM to recognize images.

Run the code and connect to

http://<your-esp32-cam-ip>/tsCode language: JavaScript (javascript)

Below some examples:

classify images using ESP32-CAM and Tensorflow.js

Another example:

How to implement image classification using ESP32-CAM and Tensorflow.js

You can notice that Tensorflow.js works perfectly with ESP32-CAM.

If you have an Arduino and want to try edge Machine Learning you can read the tutorial how to use Machine Learning with Arduino.

Wrapping up

At the end of this tutorial, we have explored how to run Tensorflow.js on ESP32-CAM. This device captures the images and then used Tensorflow.js to classify them.

This is a simple example demonstrating how we can use machine learning models with ESP32-CAM.

    1. Young December 3, 2020
    2. P3T3 February 20, 2021
    3. Petr Bláha February 20, 2021
    4. Dave April 17, 2021
    5. Ralph December 12, 2022

    Add Your Comment