This tutorial describes how to implement a machine learning system based on Arduino, Edge Impulse and Google Keyword Speech dataset. In more detail, we will use Arduino and machine learning to recognize some keywords. We have already covered how to use Arduino with Tensorflow to recognize speech and how to use Arduino with Edge Impulse, anyway in this tutorial we will use a different approach.
Arduino Keyword speech dataset: project overview
The goal of this Arduino Machine Learning tutorial is to train a machine learning model using the Google keyword speech dataset with Edge impulse and then deploy it on Arduino Nano 33 BLE. While in the previous post, we have created our data capturing some voice samples using Arduino Nano 33 BLE built-in microphone, now we want to train the model using a more complete dataset.
As you know already if we want to train a machine learning model the dataset where we train our model is crucial.
Google keyword speech dataset
Before digging into the details of creating a machine learning model with Arduino Nano 33 BLE and Edge Impulse, it is useful to describe briefly this dataset. It contains several single words such as “Yes”, “No” and so on. It has 65000 wave files that contain different words pronounced by several people around the world. Every file is 1 second long. You can have more information here.
More useful resources:
How to create a speech dataset for Edge Impulse
Once you have downloaded the file you could simply upload the wave files with their labels directly into the Edge impulse using the ingestion feature. Anyway, before doing it, it is necessary to manipulate the data so that it can be uploaded to the Edge Impulse. For this purpose, I have created a colab file that you can use.
Downloading the keyword speech dataset
The first step is downloading the file that holds the samples so that we can extract it and manipulate it:
!wget http://download.tensorflow.org/data/speech_commands_v0.01.tar.gzCode language: Bash (bash)
next, we decompress it:
!tar xvf speech_commands_v0.01.tar.gz -C ./dataCode language: Bash (bash)
When finished, you should have a data structure like this:
Prepare the data
Now, you can select the words that Arduino will recognize using the machine learning model. Moreover, you have to define how many samples you will use to train the machine learning model:
# Number of samples num_samples = 1500 output_dir = './edgeimpulse_output_data' words = ['one', 'two', 'three', 'four', 'noise', 'unknown']Code language: Python (python)
Notice that in this example we want that Arduino recognizes numbers from one to four. Moreover, there are two other labels: noise and unknown. Noise is the background noise when we don’t pronounce any words and unknown is a set of unknown words.
How to use noise and unknown word dataset from Edge Impulse
Even if Google keyword dataset has already its noise wave files, we will use the dataset provided by Edge Impulse and we will add it to the dataset extracted from Google:
# Download the zip dataset file !wget https://cdn.edgeimpulse.com/datasets/keywords2.zipCode language: Bash (bash)
# unzip the file !unzip keywords2.zip -d ./edgeimpulse_datasetCode language: Bash (bash)
Create the dataset for your words from Google keyword speech
It is time to create the dataset that will contain the words we want to use with Arduino and where we will train the machine learning model with Edge Impulse. One important aspect is that the dataset must be balanced. The python code below takes care of selecting the right number of files as specified previously selecting them randomly in the Google keyword dataset. Moreover, it controls that the wave file selected has the right samples, otheriwise it adds zero to the end:
import os import random import shutil import librosa import soundfile as sf import numpy as np # Copy num_samples samples to another directory if not os.path.exists(output_dir): print("Making dir ["+output_dir+"]") os.mkdir(output_dir) # Prepare the output directory structure for word in words: dest = output_dir + '/' + word if not os.path.exists(dest): print("Create dest dir ", dest) os.mkdir(dest) # Initialize random random.seed(); for word in words: print("Selected word ["+word+"]") file_list =  for filename in os.listdir('./data/' + word): # print("Filename: ", filename) _, ext = os.path.splitext(filename) if (ext.lower() == '.wav'): # append the files file_list.append(filename) random.shuffle(file_list) # print("File size:", len(file_list)) # Copy files from the origin directory to the output dir for i in range(num_samples): src = './data/' + word + '/' + file_list[i] dest = output_dir + '/' + word +'/' + word + '.' + file_list[i] # We can check if the file has the correct length s, sr = librosa.load(src, sr=16000, mono=True) # samle rate * sample time is in this case 16000 if (len(s) < 16000): print("Padding the file...") s = np.append(s, np.zeros( int(16000 - len(s)))) sf.write(dest, s, 16000) else: print("Copy file from ", src, " to ", dest) shutil.copyfile(src, dest) print("Finished!")Code language: Python (python)
Now, in your output_dir you have created a balanced set of samples holding wave files with the words you have selected. Moreover, the dataset holds the noise and unknown words. We will use this dataset to train a machine learning model in Edge Impulse. After we have trained the model, we will deploy it on the Arduino. Next, we will use the machine learning model with the Arduino Nano 33 BLE to recognize keywords. At the end, you should have something like shown in the picture below:
Uploading the samples to Edge Impulse
Now it is time to upload the samples to Edge Impulse using the ingestion feature.
!npm install -g --unsafe-perm edge-impulse-cliCode language: Bash (bash)
# API Key api_key = 'your_project_key' for word in words: sample_dir = output_dir + '/' + word + '/*.wav' print("Uploading files from ", sample_dir) cmd = 'edge-impulse-uploader --api-key ' + api_key + ' --label ' + word + ' ' + sample_dir os.system(cmd) print("Done!");Code language: Python (python)
That’s all. Now move to Edge Impulse interface and check if your dataset was uploaded:
One important thing, you have to do before going on is rebalance your dataset.
Edge Impulse machine learning model
Let’s create the machine learning model. This is the model we will use to recognize keywords using Arduino Nano 33. The picture shows the model:
Now you can extract features using MFCC from the wave files you have uploaded and train your model. This is the confusion matrix:
How to use the Edge Impulse machine learning model with Arduino
Once you have trained your model and are satisfied with the results, you can deploy it on your Arduino Nano 33 BLE. In this example, to mix the machine learning with a physical object, we will connect Arduino to a TFT LCD display (ST7735s). When you pronounce one of the words trained before, Arduino will visualize the result on the TFT display.
To connect ST7735 to Arduino follow this schema:
|Arduino Nano 33 BLE||ST7735s|
How to display the inference result
Import the Arduino library into the Arduino IDE and let us modify it. Add the following lines:
Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);Code language: C++ (cpp)
next, in the
setup() method add this code:
tft.initR(INITR_BLACKTAB);Code language: CSS (css)
Finally, in the
Note: If it is the first time you use the ST7735 TFT display you have to import the Adafruit Library.
Below Arduino that runs the machine learning model to recognize keywords:
At the end of this tutorial, we have covered how to train a machine learning model using Google keywords speech command dataset with Edge Impulse and Arduino. The model we have trained will be used on Arduino so that it can recongize keywords speech commands. You can customize this model and use other words.