This tutorial covers how to use TinyML with ESP32-CAM. It describes how to classify images using ESP32-CAM using deep learning. The machine learning model running directly on the device. To do it, it is necessary to create a machine learning model using Tensorflow lite and shrink the model. There are several ways to do it, this tutorial uses Edge Impulse that simplifies all the steps. We will explore the power of TinyML with ESP32-CAM to recognize and classify images.
How to use TinyML with ESP32-CAM
In order to use deep learning with ESP32-CAM, so that ESP32-CAM can classify images there are several steps to follow:
- Find the dataset where to train the model
- Manipulate the dataset if necessary
- Define the model architecture
- Train the model
- Develop the ESP32-CAM code to run the model
Edge Impulse helps us to speed up the deep learning model definition and the training phase producing a ready-to-use tinyml model that we can use with the ESP32-CAM. This is model is based on Tensorflow lite.
In this ESP32-CAM tutorial, we will use a dataset to recognize flowers. This project is still experimental and it must be improved in several aspects. Anyway, it provides a guide if you want to experiment with how to run a machine learning /deep learning model directly on your device.
Define the dataset to train the model to use with ESP32-CAM
There are several datasets we can use to train our tinyml model. You can find one you like. As said, we want to classify flowers using ESP32-CAM and deep learning. Therefore, we will look for a model that contains several flowers grouped in classes. Kaggle is a good starting point when you look for a dataset repository or you want to have information about Machine Learning.
This is the dataset we will use to train our machine learning model to use with ESP32-CAM. It contais 5 different flower classes:
- Daisy
- Dandelion
- Rose
- Sunflower
- Tulip
Before going on, it is necessary to create a Kaggle account.
Preparing the dataset to use with Edge Impulse
Before training the model, it is necessary to upload the data to Edge Impulse. To do it, we can install everything you need locally or you can use Google Colab to do it. This tutorial uses this second option. This is the link where you can download the code.
First of all, we have to download the dataset from Kaggle:
!pip install -q kaggle
import os
os.environ['KAGGLE_USERNAME']='your_username'
os.environ['KAGGLE_KEY']='your_key'
# Flower dataset
!kaggle datasets download -d alxmamaev/flowers-recognition
!unzip flowers-recognition.zip
# Fruit dataset
!kaggle datasets download -d moltean/fruits
!unzip fruits.zip
Code language: Bash (bash)
Notice, we download two different dataset: one that contains flower and another one that contains fruits. When we train a model, it is necessary to select a class that differs from the class we want to classify. In this dataset, there are the classes we will use in the ESP32-CAM to classify images using tinyml.
I won’t cover all the datails about creating and uploading the dataset to edge impulse because it is very simple. You can refer to the colab code. Generally speaking, we will define a number of samples we want to use to classify our deep learning model, and the code uploads it to the edge impulse.
At the end, depending on the sample numbers you have configured, on the Edge Impulse side you will have:

Defining the model and training it to classify images
Once the data is ready, we can define the model we will use to recognize images with ESP32-CAM. Below the model in Edge Impulse:

There are some aspects to notice:
- As model input, we will use an image with 48×48 pixels. This is an important aspects. Keep in mind that the ESP32-CAM (but in general all the devices like ESP32) has a limited ammount of memory. Therefore, we have to reduce the image size. If we use a RGB image the features number that the ESP32-CAM has to handles is 48x48x3. You can easily understand that increasing the image size, the model won’t fit into the ESP32-CAM.
- We use the transfer learning to train the model. We will cover it later.
Training the machine learning model using Transfer learning
After the features are extracted, we can train the model. This is the parameters used to train the model:

Notice that we have used MobileNetV2 0.05 because the model must fit into the device memory. The confusion matrix is shown below:

The model accuracy is 77%. Of course, we should improve it somehow but for this project is enough.
The last step is the model quantization and finally we can download the library to use it with the ESP32 CAM. The libray contains all we need to run the image classification using ESP32 CAM.
How to run image classification on the ESP32-CAM using deep learnng
This is the time to implement the code on the ESP32-CAM device to run the classification model using deep learning. To do it, we can start from the static buffer example shipped with the library. It is necessary to modify the sample code so that we can:
- acquire the image
- adapt the image size to the dataset
- run the classification process
The code is shown below:
#include <Arduino.h>
#include <WiFi.h>
#include "esp_http_server.h"
#include "esp_timer.h"
#include "img_converters.h"
#include "esp_camera.h"
#define CAMERA_MODEL_AI_THINKER // Has PSRAM
#include "camera_pins.h"
#include <-image_inference.h>
#include "esp_camera.h"
#define CAMERA_MODEL_AI_THINKER // Has PSRAM
#include "camera_pins.h"
// raw frame buffer from the camera
#define FRAME_BUFFER_COLS 240
#define FRAME_BUFFER_ROWS 240
#define CUTOUT_COLS EI_CLASSIFIER_INPUT_WIDTH
#define CUTOUT_ROWS EI_CLASSIFIER_INPUT_HEIGHT
const int cutout_row_start = (FRAME_BUFFER_ROWS - CUTOUT_ROWS) / 2;
const int cutout_col_start = (FRAME_BUFFER_COLS - CUTOUT_COLS) / 2;
#define PART_BOUNDARY "123456789000000000000987654321"
static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";
httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;
const char* ssid = "your_ssid";
const char* password = "your_wifi_pwd";
camera_fb_t * fb = NULL;
uint8_t * _jpg_buf = NULL;
void r565_to_rgb(uint16_t color, uint8_t *r, uint8_t *g, uint8_t *b) {
*r = (color & 0xF800) >> 8;
*g = (color & 0x07E0) >> 3;
*b = (color & 0x1F) << 3;
}
int cutout_get_data(size_t offset, size_t length, float *out_ptr) {
// so offset and length naturally operate on the *cutout*, so we need to cut it out from the real framebuffer
size_t bytes_left = length;
size_t out_ptr_ix = 0;
// read byte for byte
while (bytes_left != 0) {
// find location of the byte in the cutout
size_t cutout_row = floor(offset / CUTOUT_COLS);
size_t cutout_col = offset - (cutout_row * CUTOUT_COLS);
// then read the value from the real frame buffer
size_t frame_buffer_row = cutout_row + cutout_row_start;
size_t frame_buffer_col = cutout_col + cutout_col_start;
uint16_t pixelTemp = fb->buf[(frame_buffer_row * FRAME_BUFFER_COLS) + frame_buffer_col];
uint16_t pixel = (pixelTemp>>8) | (pixelTemp<<8);
uint8_t r, g, b;
r565_to_rgb(pixel, &r, &g, &b);
float pixel_f = (r << 16) + (g << 8) + b;
out_ptr[out_ptr_ix] = pixel_f;
out_ptr_ix++;
offset++;
bytes_left--;
}
// and done!
return 0;
}
void classify() {
ei_printf("Edge Impulse standalone inferencing (Arduino)\n");
ei_impulse_result_t result = { 0 };
// Convert to RGB888
//fmt2rgb888(fb->buf, fb->len, PIXFORMAT_RGB888, _jpg_buf);
//Serial.println("Signal...");
// Set up pointer to look after data, crop it and convert it to RGB888
signal_t signal;
signal.total_length = CUTOUT_COLS * CUTOUT_ROWS;
signal.get_data = &cutout_get_data;
// Feed signal to the classifier
EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false /* debug */);
// Returned error variable "res" while data object.array in "result"
ei_printf("run_classifier returned: %d\n", res);
if (res != 0) return;
// print the predictions
ei_printf("Predictions ");
ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
result.timing.dsp, result.timing.classification, result.timing.anomaly);
ei_printf(": \n");
// Print short form result data
ei_printf("[");
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
ei_printf("%.5f", result.classification[ix].value);
#if EI_CLASSIFIER_HAS_ANOMALY == 1
ei_printf(", ");
#else
if (ix != EI_CLASSIFIER_LABEL_COUNT - 1) {
ei_printf(", ");
}
#endif
}
#if EI_CLASSIFIER_HAS_ANOMALY == 1
ei_printf("%.3f", result.anomaly);
#endif
ei_printf("]\n");
// human-readable predictions
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
ei_printf(" %s: %.5f\n", result.classification[ix].label, result.classification[ix].value);
}
#if EI_CLASSIFIER_HAS_ANOMALY == 1
ei_printf(" anomaly score: %.3f\n", result.anomaly);
#endif
}
static esp_err_t capture_handler(httpd_req_t *req){
Serial.println("Capture image");
esp_err_t res = ESP_OK;
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
httpd_resp_send_500(req);
return ESP_FAIL;
}
classify();
httpd_resp_set_type(req, "image/jpeg");
httpd_resp_set_hdr(req, "Content-Disposition", "inline; filename=capture.jpg");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
res = httpd_resp_send(req, (const char *)fb->buf, fb->len);
esp_camera_fb_return(fb);
return res;
}
static esp_err_t page_handler(httpd_req_t *req) {
httpd_resp_set_type(req, "text/html");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
// httpd_resp_send(req, page, sizeof(page));
}
static esp_err_t stream_handler(httpd_req_t *req){
camera_fb_t * fb = NULL;
esp_err_t res = ESP_OK;
size_t _jpg_buf_len = 0;
uint8_t * _jpg_buf = NULL;
char * part_buf[64];
res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
if(res != ESP_OK){
return res;
}
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
while(true){
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
res = ESP_FAIL;
} else {
if(fb->format != PIXFORMAT_JPEG){
bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
esp_camera_fb_return(fb);
fb = NULL;
if(!jpeg_converted){
Serial.println("JPEG compression failed");
res = ESP_FAIL;
}
} else {
_jpg_buf_len = fb->len;
_jpg_buf = fb->buf;
}
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
}
if(res == ESP_OK){
size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
}
if(fb){
esp_camera_fb_return(fb);
fb = NULL;
_jpg_buf = NULL;
} else if(_jpg_buf){
free(_jpg_buf);
_jpg_buf = NULL;
}
if(res != ESP_OK){
break;
}
}
return res;
}
void startCameraServer(){
httpd_config_t config = HTTPD_DEFAULT_CONFIG();
httpd_uri_t index_uri = {
.uri = "/",
.method = HTTP_GET,
.handler = stream_handler,
.user_ctx = NULL
};
httpd_uri_t capture_uri = {
.uri = "/capture",
.method = HTTP_GET,
.handler = capture_handler,
.user_ctx = NULL
};
Serial.printf("Starting web server on port: '%d'\n", config.server_port);
if (httpd_start(&camera_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(camera_httpd, &capture_uri);
//httpd_register_uri_handler(camera_httpd, &page_uri);
}
// start stream using another webserver
config.server_port += 1;
config.ctrl_port += 1;
Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
if (httpd_start(&stream_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(stream_httpd, &index_uri);
}
}
void setup() {
Serial.begin(9600);
Serial.setDebugOutput(true);
Serial.println();
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
// for larger pre-allocated frame buffer.
if(psramFound()){
config.frame_size = FRAMESIZE_240X240;
config.jpeg_quality = 10;
config.fb_count = 2;
} else {
config.frame_size = FRAMESIZE_240X240;
config.jpeg_quality = 12;
config.fb_count = 1;
}
#if defined(CAMERA_MODEL_ESP_EYE)
pinMode(13, INPUT_PULLUP);
pinMode(14, INPUT_PULLUP);
#endif
// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}
sensor_t * s = esp_camera_sensor_get();
// initial sensors are flipped vertically and colors are a bit saturated
if (s->id.PID == OV3660_PID) {
s->set_vflip(s, 1); // flip it back
s->set_brightness(s, 1); // up the brightness just a bit
s->set_saturation(s, 0); // lower the saturation
}
// drop down frame size for higher initial frame rate
s->set_framesize(s, FRAMESIZE_240x240);
#if defined(CAMERA_MODEL_M5STACK_WIDE)
s->set_vflip(s, 1);
s->set_hmirror(s, 1);
#endif
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
startCameraServer();
Serial.print("Camera Ready! Use 'http://");
Serial.print(WiFi.localIP());
Serial.println("' to connect");
}
void loop() {
// put your main code here, to run repeatedly:
delay(10);
}
Code language: C++ (cpp)
The code includes a Web server so that you can run the classification from a Web interface (the HTML source code is not included). Moreover, it implements a stream video server that sends video to the Web page.
More useful resources:
Run Tensorflow.js with ESP32-CAM
ESP32 Tensorflow Microspeech with Google Dataset
ESP32 KNN classifier
How to use Tensorflow lite micro with ESP32-CAM
How to feed the model with ESP32-CAM pictures
The first thing is adapt the image captured by the ESP32-CAM so that we can pass it to the model we have trained before. There are two aspects to consider:
- the image size
- the color codification
The images in the dataset are 48×48. Even if the ESP32-CAM can take pictures with size 96×96, I had some problems streaming the video to the Web interface using this resolution. The best resolution, after some trials, is 240×240. It is a waste of resources but it works by now. In the function cutout_get_data
the image is resized to 48×48 from the picture size 240×240 and converted to RGB888. This method is described in the Edge Impulse forum. I’ve adapted it to the ESP32-CAM.
Implement the classification model on the ESP32-CAM
The last step is running the classification process on the ESP32-CAM. In the example above, the classification is triggered from the Web interface but you can easily modify the code if you don’t want to use it. It is import to notice this piece of code:
signal_t signal;
signal.total_length = CUTOUT_COLS * CUTOUT_ROWS;
signal.get_data = &cutout_get_data;
Code language: C++ (cpp)
This is where we invoke the method that adapts the image size and the color.
Below the simple web interface and the final result:

and this is the result:

Wrapping up
At the end of this post, we have classified a simple machine learning model that reconizes flowers and we are able to run it on the ESP32-CAM directly. The ESP32-CAM captures the image and next classify it running a tinyml model on the device. This project is experimental and there are different aspects to improve: the model accuracy, the way the image is resized and adapted. Anyway, it could be a starting point if you want to explore how to classify images using ESP32-CAM with the inference process running on the device.
HI,
What if I just want to do it on the device only without linking it to the internet, what should I modify? Also, is the model name is -image_inference.h?
Sorry, I am still new to TinyML, so your kindness and help are highly appreciated. Thank you.
You have to remove all the parts regarding the webserver and the wifi: http handler, stream handler and so on.
Hi,
Why did I encounter with this issue regarding the FRAMESIZE in this part:
if(psramFound()){
config.frame_size = FRAMESIZE_240X240;
config.jpeg_quality = 10;
config.fb_count = 2;
} else {
config.frame_size = FRAMESIZE_240X240;
config.jpeg_quality = 12;
config.fb_count = 1;
}
esp32-classifier-v1:307: error: ‘FRAMESIZE_240X240’ was not declared in this scope
config.frame_size = FRAMESIZE_240X240;
Please help me out. Thanks a lot.
It looks strange. Try to check if there are some updates for your borads.
Hi,
How to modify the code to be available for the image classification on the device (ESP32-CAM) only? Thanks in advance
It works already on the deivce. The inference process takes place on the ESP32-CAM
Hi,
Which Platform IDE that you used, as I face an error regarding: The filename or extension is too long. when compiling in Arduino IDE. Please describe more on how you deploy to the microcontroller.
Thank you.
I’ve used PlatformIO but I guess it should work with Arduino IDE too.
Hi,
Thanks for the answers that you have provided. There are several questions from me:
1. How do you install the .zip file from the Edge Impulse? I use the command:
platformio lib install .zip file
How about you?
2. How do you upload the file to ESP32-CAM? I am currently using Arduino Uno, therefore my platformio.ini has
[env:esp-wrover-kit]
platform = espressif32
board = esp-wrover-kit
framework = arduino
monitor_speed = 115200
build_flags = -DBOARD_HAS_PSRAM -mfix-esp32-psram-cache-issue
board_build.partitions = huge_app.csv
board_build.f_flash = 40000000L
board_build.flash_mode = qio
lib_deps =
espressif/esp32-camera@^1.0.0
D:/DegreeFinalYearProject/flood_classifier/survivingwithandroid/ei-esp32-cam-arduino-1.0.1.zip
Is this the right way to do it as I am using Arduino Uno to upload it?
3. Currently, I have built the main.cpp file successfully, but not sure it is working or not. But, when I try to upload it to ESP32-CAM, I have this issue :
A fatal error occurred: Failed to connect to ESP32: Timed out waiting for packet header
What should I do now? I pressed the RST button on the ESP32-CAM when I saw uploading…
These are the questions of mine. Sorry for disturbing you for the whole day. By the way, I highly appreciate the help that you have given to me. I currently move from Arduino IDE to PlatormIO as I successfully get rid the filename or extension is too long. I think that is the limitation of ArduinoIDE.
Hi,
Thank you for answering my questions. I highly appreciate it. However, I have few more questions to ask:
1. How do you load the folder from the Edge Impulse into PlatformIO? Personally, I used this line in CLI:
platformio lib install address_to_the_folder.zip
Also, I generated this file by choosing Arduino option when it comes to deploy on Edge Impulse.
2. Did you add the src folder from the .zip folder into c_cpp_properties.json (located in .vscode)? I am not sure as I face issue before having the src folder in this .json file. How about you?
3. In platform.ini file, I have this configuration:
[env:esp-wrover-kit]
platform = espressif32
board = esp-wrover-kit
framework = arduino
monitor_speed = 115200
build_flags = -DBOARD_HAS_PSRAM -mfix-esp32-psram-cache-issue
board_build.partitions = huge_app.csv
board_build.f_flash = 40000000L
board_build.flash_mode = qio
lib_deps =
espressif/esp32-camera@^1.0.0
D:/DegreeFinalYearProject/flood_classifier/survivingwithandroid/ei-esp32-cam-arduino-1.0.1.zip
I set it as esp-wrover-kit as I upload the code to ESP32-CAM using Arduino UNO via the connection of TX & RX. How about you? Did I correctly configure it?
4. Lastly, I have an isuue when uploading the .cpp to ESp32-CAM (the file is successfully built):
Failed to connect to ESP32: Timed out waiting for packet header
I have pressed RST button when I saw ‘connecting…’ . So, what should I do to solve this issue?
Sorry for disturbing you a lot. Thanks in advance. Hopefully, you can help me out.
Thank you.
that a known windows Arduino issue
you can install the patch here :
https://docs.edgeimpulse.com/docs/running-your-impulse-arduino#code-compiling-fails-under-windows-os
Thank you. I manage to get it~
Hi, Sir.
I have an issue now, it is regarding that my model is facing this problem.
These are the info shown in Serial Monitor (I use PlatformIO and Visual Studio Code):
Capture image
Edge Impulse standalone inferencing (Arduino)
run_classifier returned: 0
Predictions (DSP: 2 ms., Classification: 449 ms., Anomaly: 0 ms.):
[0.00000, 0.99609]
Flood: 0.00000
Non-Flood: 0.99609
Capture image
Edge Impulse standalone inferencing (Arduino)
run_classifier returned: 0
Predictions (DSP: 3 ms., Classification: 449 ms., Anomaly: 0 ms.):
[0.00000, 0.99609]
Flood: 0.00000
Non-Flood: 0.99609
Capture image
Edge Impulse standalone inferencing (Arduino)
run_classifier returned: 0
Predictions (DSP: 3 ms., Classification: 448 ms., Anomaly: 0 ms.):
[0.00000, 0.99609]
Flood: 0.00000
Non-Flood: 0.99609
Capture image
Edge Impulse standalone inferencing (Arduino)
run_classifier returned: 0
Predictions (DSP: 2 ms., Classification: 445 ms., Anomaly: 0 ms.):
[0.00000, 0.99609]
Flood: 0.00000
Non-Flood: 0.99609
It seems like the result is stuck…. Any suggestion to improve its performance? Thank you.
“I won’t cover all the datails about creating and uploading the dataset to edge impulse because it is very simple”.
Sir please can you give the details or the colab code for that, I couldn’t find it on the post.
Thank you
Give a look here https://github.com/survivingwithandroid/EdeImpulse-fruit-recognition-ESP32-CAM
can u guys plz help
this is frustrating on arduino IDE
got a {this URL dosnt exist} error
the url provided by the serial monitor
and i am new to platformIO
if u can send the platformIO files ! thanx
msg please
ahmedali50710@gmail.com
how to add and include the zip.file edge impulse library to the platform io
i was using arduino IDE and didnt work and switched to platform io
I cannot see any web Interface here please provide modified app_httpd.cpp file