This tutorial describes how to use ESP32-CAM with Tensorflow.js. The idea that stands behind this tutorial is explaining how to capture an image with ESP32-CAM and process it with Tensorflow.js. Tensorflow.js is a library for machine learning in Javascript. This library can be used to run the machine learning in a browser.
In the previous post, we have covered how to compile Tensorflow for ESP32 and how to run Machine Learning models on ESP32 using Tensorflow Lite library for ESP32. This is project is different because the models don’t run on the ESP32-CAM but in the client browser using Tensorflow.js javascript library.
Project overview
Ths image below describes how to integrate ESP32-CAM with Tensorflow.js:

These are the main steps:
- The browser connects to the ESP32-CAM Web server requesting ts.html page
- The ESP32-CAM provides the ts.html page that holds all the HTML and javascript code to run Tensorflow.js
- The user clicks on capture image sending the request to the ESP32-CAM that sends back the captured image
- Tensorflow.js model runs on the user browser and classifies the image captured
Therefore, the ESP32-CAM has these tasks:
- Stream video
- Capture image
- Provide the HTML page that will be shown in a browser that runs the Tensorflow.js machine learning models
In the end, the Tensorflow.js will apply the machine learning image classification model on the image captured by the ESP32-CAM.
HTML page and Tensorflows.js with ESP32-CAM
Let’s start from the HTML page with all the Javascript that is necessary to run the Tensorfow.js machine learning model with ESP32-CAM. This page holds the video stream coming from ESP32. As soon as the user clicks on capture image button this command is sent to ESP32-CAM that captures the image. This images is sent to the Tensorflow model that will classify it.
This the HTML page structure:

The source code is very simple:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ESP32-CAM TensorflowJS</title>
<style>
body {
font-family: 'PT Sans', sans-serif;
background-color: #dde2e1;
margin: 0;
color: #636060;
line-height: 1.6;
}
a {
text-decoration: none;
color: #ccc;
}
h2 {
display: block;
font-size: 1.17em;
margin-block-start: 1em;
margin-block-end: 1em;
margin-inline-start: 0px;
margin-inline-end: 0px;
font-weight: bold;
}
.container {
max-width: 1180px;
text-align: center;
margin: 0 auto;
padding: 0 3rem;
}
.btn {
padding: 1rem;
color: #fff;
display: inline-block;
background: red;
margin-bottom: 1rem;
}
</style>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"> </script>
<script language="javascript">
function classifyImg() {
const img = document.getElementById('img1');
const r = document.getElementById('results');
r.innerHTML = '';
console.log("Classify...");
img.crossorigin = ' ';
img.onload = function() {
console.log('Wait to load..');
mobilenet.load().then(model => {
// Classify the image.
model.classify(img).then(predictions => {
for (i in predictions) {
r.innerHTML = r.innerHTML + '<b>' + predictions[i].className + "</b> - " + predictions[i].probability + "<br/>";
img.onload = null;
img.src = 'http://192.168.1.121:81';
}
});
});
}
img.src = 'http://192.168.1.121/capture?t=' + Math.random();
}
</script>
</head>
<body>
<div class="container">
<h2>TensorflowJS with ESP32-CAM</h2>
<section>
<img id="img1" width="320" height="200" src='http://192.168.1.121:81' crossorigin style="border:1px solid red"/>
<div id="results"/>
</section>
<section>
<a href="#" class="btn" onclick="classifyImg()">Classify the image</a>
</section>
<section id="i" />
</div>
</body>
</html>
Code language: HTML, XML (xml)
Including Tensorflow.js in the ESP32-CAM HTML page
The first step is including the Tensorflow.js library in the HTML page provided by the ESP32-CAM:
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"> </script>
Code language: JavaScript (javascript)
Notice that we are using Mobilenet model to classify the image. There are other models you can use.
Run Machine Learning models using Tensorflow.js with ESP32-CAM
The next step is running the Tensorflow.js model on the image provided by the ESP32-CAM. To achieve it, we will use the javascript as shown below:
function classifyImg() {
const img = document.getElementById('img1');
const r = document.getElementById('results');
r.innerHTML = '';
console.log("Classify...");
img.crossorigin = ' ';
img.onload = function() {
mobilenet.load().then(model => {
// Classify the image.
model.classify(img).then(predictions => {
for (i in predictions) {
r.innerHTML = r.innerHTML + '<b>' + predictions[i].className + "</b> - " + predictions[i].probability + "<br/>";
img.onload = null;
img.src = 'http://192.168.1.121:81';
}
});
});
}
img.src = 'http://192.168.1.121/capture?t=' + Math.random();
}
Code language: JavaScript (javascript)
This is the javascript code description:
- the javascript code gets the reference of the image tag that will hold the image captured
- using
image.onload
method, the code waits until the image is loaded. - Once the image is loaded, it loads the Tensorflow.js machine learning model
- Next, it applies the machine learning classification model to the image
- Then, it shows the labels extracted from the image
Before running all the javascript described above, we invoke the resource on the ESP32-CAM to capture the image:
img.src = 'http://192.168.1.121/capture?t=' + Math.random();
Code language: JavaScript (javascript)
Notice the random number to break the browser cache.
Using this simple javascript we can use ESP32-CAM with Tensorflow.js where the ESP32-CAM provides the image that will be classified using machine learning models.
If you are wondering what happens when the Tensorflow.js classification process ends, the javascript code invokes again the stream video from ESP32-CAM.
Executing Tensorflow.js model with ESP32-CAM
The last step is executing all the javascript code shown before. To achieve is, we can simply trigger the classification process when the user clicks on the button:
<a href="#" class="btn" onclick="classifyImg()">Classify the image</a>
Code language: HTML, XML (xml)
ESP32-CAM image capture to use with Tensorfow.js
We won’t cover in details the code to capture images to use with Tensorflow.js. The code is quite simple:
static esp_err_t capture_handler(httpd_req_t *req){
Serial.println("Capture image");
camera_fb_t * fb = NULL;
esp_err_t res = ESP_OK;
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
httpd_resp_send_500(req);
return ESP_FAIL;
}
httpd_resp_set_type(req, "image/jpeg");
httpd_resp_set_hdr(req, "Content-Disposition", "inline; filename=capture.jpg");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
res = httpd_resp_send(req, (const char *)fb->buf, fb->len);
esp_camera_fb_return(fb);
return res;
}
Code language: C++ (cpp)
Implementing ESP32-CAM tasks to run Tensorflow.js
Finally, it is necessary to expose three different resources that can be invoked by the HTML page to run Tensorflow.js image classification with ESP32-CAM:
void startCameraServer(){
httpd_config_t config = HTTPD_DEFAULT_CONFIG();
httpd_uri_t index_uri = {
.uri = "/",
.method = HTTP_GET,
.handler = stream_handler,
.user_ctx = NULL
};
httpd_uri_t page_uri = {
.uri = "/ts",
.method = HTTP_GET,
.handler = page_handler,
.user_ctx = NULL
};
httpd_uri_t capture_uri = {
.uri = "/capture",
.method = HTTP_GET,
.handler = capture_handler,
.user_ctx = NULL
};
Serial.printf("Starting web server on port: '%d'\n", config.server_port);
if (httpd_start(&camera_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(camera_httpd, &capture_uri);
httpd_register_uri_handler(camera_httpd, &page_uri);
}
// start stream using another webserver
config.server_port += 1;
config.ctrl_port += 1;
Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
if (httpd_start(&stream_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(stream_httpd, &index_uri);
}
}
Code language: PHP (php)
Notice that there are three different resources:
- /ts that will provide the HTML page that will implement ESP32-CAM with Tensorflow.js
- /capture that is used to capture the image
- / on port 81 to stream the video
If you are wondering why we are using two different ports one for the video stream and the other for the image capture, this is because it is not possible to use the same web server to accomplish both tasks.
If you are interested in using ESP32-CAM you can read how to use ESP32-CAM with Telegram to send images.
Full source code to run ESP32-CAM with Tensorflow.js
This is the full source code that you can use to run Tensorflow.js with ESP32-CAM in order to classify images:
#include <Arduino.h>
#include <WiFi.h>
#include "esp_http_server.h"
#include "esp_timer.h"
#include "esp_camera.h"
#include "img_converters.h"
#include "Arduino.h"
#include "camera_pins.h"
#include "page.h"
#define PART_BOUNDARY "123456789000000000000987654321"
static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";
httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;
const char* ssid = "<your_ssid>";
const char* password = "your_wifi_password";
static esp_err_t capture_handler(httpd_req_t *req){
Serial.println("Capture image");
camera_fb_t * fb = NULL;
esp_err_t res = ESP_OK;
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
httpd_resp_send_500(req);
return ESP_FAIL;
}
httpd_resp_set_type(req, "image/jpeg");
httpd_resp_set_hdr(req, "Content-Disposition", "inline; filename=capture.jpg");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
res = httpd_resp_send(req, (const char *)fb->buf, fb->len);
esp_camera_fb_return(fb);
return res;
}
static esp_err_t page_handler(httpd_req_t *req) {
httpd_resp_set_type(req, "text/html");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
httpd_resp_send(req, page, sizeof(page));
}
static esp_err_t stream_handler(httpd_req_t *req){
camera_fb_t * fb = NULL;
esp_err_t res = ESP_OK;
size_t _jpg_buf_len = 0;
uint8_t * _jpg_buf = NULL;
char * part_buf[64];
res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
if(res != ESP_OK){
return res;
}
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
while(true){
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
res = ESP_FAIL;
} else {
if(fb->format != PIXFORMAT_JPEG){
bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
esp_camera_fb_return(fb);
fb = NULL;
if(!jpeg_converted){
Serial.println("JPEG compression failed");
res = ESP_FAIL;
}
} else {
_jpg_buf_len = fb->len;
_jpg_buf = fb->buf;
}
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
}
if(res == ESP_OK){
size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
}
if(fb){
esp_camera_fb_return(fb);
fb = NULL;
_jpg_buf = NULL;
} else if(_jpg_buf){
free(_jpg_buf);
_jpg_buf = NULL;
}
if(res != ESP_OK){
break;
}
}
return res;
}
void startCameraServer(){
httpd_config_t config = HTTPD_DEFAULT_CONFIG();
httpd_uri_t index_uri = {
.uri = "/",
.method = HTTP_GET,
.handler = stream_handler,
.user_ctx = NULL
};
httpd_uri_t page_uri = {
.uri = "/ts",
.method = HTTP_GET,
.handler = page_handler,
.user_ctx = NULL
};
httpd_uri_t capture_uri = {
.uri = "/capture",
.method = HTTP_GET,
.handler = capture_handler,
.user_ctx = NULL
};
Serial.printf("Starting web server on port: '%d'\n", config.server_port);
if (httpd_start(&camera_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(camera_httpd, &capture_uri);
httpd_register_uri_handler(camera_httpd, &page_uri);
}
// start stream using another webserver
config.server_port += 1;
config.ctrl_port += 1;
Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
if (httpd_start(&stream_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(stream_httpd, &index_uri);
}
}
void setup() {
Serial.begin(9600);
Serial.setDebugOutput(true);
Serial.println();
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
// for larger pre-allocated frame buffer.
if(psramFound()){
config.frame_size = FRAMESIZE_QVGA;
config.jpeg_quality = 10;
config.fb_count = 2;
} else {
config.frame_size = FRAMESIZE_QVGA;
config.jpeg_quality = 12;
config.fb_count = 1;
}
#if defined(CAMERA_MODEL_ESP_EYE)
pinMode(13, INPUT_PULLUP);
pinMode(14, INPUT_PULLUP);
#endif
// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}
sensor_t * s = esp_camera_sensor_get();
// initial sensors are flipped vertically and colors are a bit saturated
if (s->id.PID == OV3660_PID) {
s->set_vflip(s, 1); // flip it back
s->set_brightness(s, 1); // up the brightness just a bit
s->set_saturation(s, -2); // lower the saturation
}
// drop down frame size for higher initial frame rate
s->set_framesize(s, FRAMESIZE_QVGA);
#if defined(CAMERA_MODEL_M5STACK_WIDE)
s->set_vflip(s, 1);
s->set_hmirror(s, 1);
#endif
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
startCameraServer();
Serial.print("Camera Ready! Use 'http://");
Serial.print(WiFi.localIP());
Serial.println("' to connect");
}
void loop() {
// put your main code here, to run repeatedly:
delay(10);
}
Code language: C# (cs)
If you want to know more about this code you can read how to stream video using ESP32-CAM. If you like to display the image on your TFT display you can read how to use ESP32-CAM with TFT display.
Testing the image classification using ESP32-CAM and Tensorflow.js
Now we can upload the code into the ESP32-CAM and test how Tensorflow.js works with ESP32-CAM to recognize images.
Run the code and connect to
http://<your-esp32-cam-ip>/ts
Code language: JavaScript (javascript)
Below some examples:

Another example:

You can notice that Tensorflow.js works perfectly with ESP32-CAM.
If you have an Arduino and want to try edge Machine Learning you can read the tutorial how to use Machine Learning with Arduino.
Wrapping up
At the end of this tutorial, we have explored how to run Tensorflow.js on ESP32-CAM. This device captures the images and then used Tensorflow.js to classify them.
This is a simple example demonstrating how we can use machine learning models with ESP32-CAM.
Can I get the header file( “page.h”) ?
I pasted page.h here https://pastebin.com/embed_iframe/PAhx4PH4
I pasted page.h here https://pastebin.com/embed_iframe/PAhx4PH4
Wow!! It works!
For beginners like me…. In the file “page.h” linked by P3T3 and Petr Blaha modify the string : “http://192.168.1.119” with your IP . With Arduino: “New sketch”, “Save as..” go to” libraries” folder (or it will conflict while compiling). Now copy the source code given above adding “#define CAMERA_MODEL_AI_THINKER” before ‘#include “camera_pins.h”‘. Modify “Serial.begin(9600)” with “Serial.begin(115200)” and “OV3660_PID” with “OV2640_PID” (my case).
Add two tabs: “page.h” and “camera_pins.h” (used the one found in CameraWebServer expressif arduino libraries example) to your sketch, compile, upload, open serial monitor, choose baud rate at 115200, press reset on eps32 cam, wait… read and control if the IP adress given on the serial monitor is the same uploaded in the tab “page.h”. Now follow the end of the tutorial opening your browser and enjoy it!!
https://drive.google.com/file/d/1kCX4H3LjhBvIMoqUOMMGl68bp9HKAoE7/view?usp=sharing
Screw driwer…
Thank you for your support!
Getting the error below, how do i solve it?
fatal error: camera_pins.h: No such file or directory
#include “camera_pins.h”
^~~~~~~~~~~~~~~
compilation terminated.
exit status 1
Compilation error: camera_pins.h: No such file or directory