TinyML – Machine Learning on Microcontrollers

1. What Is TinyML & Why Now?

TinyML brings machine learning inference to microcontrollers (MCUs) that run on milliwatts—or even microwatts—of power. Unlike SBCs (e.g., Raspberry Pi 4), MCUs have kilobytes to a few megabytes of RAM and no operating system, yet can perform keyword spotting, anomaly detection, and simple vision tasks offline with real-time latency.

Benefits: instant response, privacy (no cloud), low bandwidth, long battery life, low BOM cost, and improved reliability in poor connectivity environments.

2. Hardware: Boards, MCUs & Sensors

Popular Boards

Arduino Nano 33 BLE Sense (nRF52840, 1 MB Flash, 256 KB RAM; on-board mic/IMU/temp)
ESP32-S3 (Xtensa dual-core with vector accel; Wi-Fi/BLE; more RAM via PSRAM)
STM32 (Cortex-M4/M7; DSP extensions; wide ecosystem)
Raspberry Pi Pico (RP2040) (dual-core M0+; PIO; add external sensors)

Sensor Choices

Audio mic (keyword spotting, cough detection)
IMU (gesture, activity, machinery vibration)
Environmental (temp, humidity, gas—contextual features)
Low-res camera (basic motion/object presence)

Selection Criteria

RAM/Flash headroom vs. model size
Vector/DSP support (CMSIS-NN, ESP-NN)
Power modes (deep sleep, ULP co-processor)
I/O & radios (BLE/Wi-Fi) for event streaming

3. Workflow: Data → Features → Model → Firmware

Collect: Record representative data on the target sensor at production sampling rates.
Label & Split: Train/val/test; keep a hold-out from a different day/device.
Feature Engineering: MFCCs for audio; spectral stats for vibration; simple image downsampling (e.g., 96×96 grayscale).
Model: Choose MCU-friendly architectures; keep parameters small.
Optimize: int8 quantization; optional pruning/distillation.
Package: Convert to TFLite Micro or ONNX and generate a C array for Flash.
Deploy: Integrate with RTOS/loop; add ring buffers and state smoothing.
Measure: RAM/Flash usage, latency, accuracy, and current draw.

4. Model Architectures That Fit in KBs

Classical ML

Logistic regression / linear SVM on MFCC or spectral features
Random Forest (shallow) for vibrations/IMU
K-NN with prototype compression

Tiny DNNs

DS-CNN for keyword spotting (depthwise-separable convs)
1D CNN for IMU sequences
Tiny MobileNet for 96×96 grayscale vision
TCN (very small) for time-series patterns

Anomaly Detection

Autoencoder on spectral features
One-Class SVM / Isolation Forest
Statistical thresholds with adaptive baselines

Rule of thumb: under ~100K parameters and int8 weights can run comfortably on many Cortex-M4F/M7 class MCUs.

5. Quantization, Pruning & Distillation:

Integer Quantization (int8): Converts weights/activations to 8-bit. Calibrate with a representative dataset.
Pruning: Remove small-magnitude weights, then fine-tune and re-quantize.
Knowledge Distillation: Train a small “student” to match a large “teacher.”
Operator Fusion & CMSIS-NN: Use optimized kernels for ARM; ESP-NN for ESP32.

Size/Speed Tradeoff (textual)

FP32 DS-CNN: 420 KB weights → int8: ~110 KB; latency ↓ ~3–5×; accuracy -0.5 to -1.5 pp

6. Deployment with TFLM, Edge Impulse & ONNX:

TensorFlow Lite Micro (TFLM)

Compile model as a C array; no dynamic allocation by default.
Use arena sizing to control RAM; enable CMSIS-NN or ESP-NN.
Integrate with Arduino, Zephyr, FreeRTOS, or bare-metal loops.

Edge Impulse Studio

Collect/label data, design DSP blocks + models in GUI.
Auto-quantization and deployment to many boards.
Generates firmware and profiling reports.

ONNX Runtime for MCUs

Convert PyTorch/Sklearn models; subset of ops on embedded.
Static memory planning; mixed int8/FP16 on some targets.

7. Power, Latency & Memory Budgets

Latency ≤ 50 ms (audio) RAM budget: 64–256 KB Flash budget: 256 KB–1 MB Sleep current: µA Active current: mA

Duty Cycling: Wake on interrupt (sound level, IMU motion), process a short window, go back to sleep.
Cascaded Pipelines: Cheap heuristic gate → expensive model only on triggers.
Buffering: Use ring buffers for streaming windows; avoid heap fragmentation.
Fixed-point DSP: Prefer int16/int8 path; pre-compute FFT twiddles.

8. Privacy, Security & Updates:

On-device inference: no raw data leaves device by default.
Signed firmware: Secure boot + OTA with signature checks.
Data retention: Store only features, not raw audio/video, when possible.
Shadow mode: Log predictions for tuning before enabling actions.

9. Use Cases & Project Blueprints:

Keyword Spotting (Wake Word)

Sensors: MEMS mic • Features: MFCC (20–40 coeffs) • Model: DS-CNN (~60–100K params)

Pipeline: VAD gate → MFCC → int8 DS-CNN → smoothing (debounce 200 ms) → trigger GPIO/BLE event.

Vibration Anomaly (Predictive Maintenance)

Sensors: IMU/accelerometer • Features: spectral bands, RMS, kurtosis • Model: 1D CNN or autoencoder

Actions: BLE alert when anomaly score > threshold for N windows; store top-K FFT bins.

Gesture Recognition

Sensors: 6-axis/9-axis IMU • Features: sliding windows (100–200 ms) • Model: TCN/1D CNN

Simple Vision Events

Sensors: tiny grayscale camera • Preproc: downsample + normalize • Model: tiny MobileNet; or motion heuristics + classifier.

10. Example Code: C++ Inference Loop (TFLM)

Minimal inference loop with ring buffer and debounce

// Pseudo-code for Arduino/ESP32 using TFLite Micro
#include "model_data.h"     // const unsigned char g_model[]; size_t g_model_len;
#include "tflite_micro_all.h"

constexpr int kAudioHz = 16000;
constexpr int kWinMs = 30, kHopMs = 10;
RingBuffer<int16_t, kAudioHz * 1> audio_rb; // 1s buffer
int8_t input_tensor[INPUT_BYTES];           // from model metadata
int8_t output_tensor[OUTPUT_BYTES];

void setup() {
  init_mic(kAudioHz);
  init_mfcc(/*coeffs=*/32, kWinMs, kHopMs);
  tflm_init(g_model, g_model_len);          // sets up arena, CMSIS-NN
}

void loop() {
  if (mic_available()) {
    int16_t s = mic_read();
    audio_rb.push(s);
  }
  if (audio_rb.ready_frame(kWinMs, kHopMs)) {
    mfcc_extract(audio_rb.latest_frame(), input_tensor);
    tflm_invoke(input_tensor, output_tensor);
    int cls = argmax(output_tensor);
    if (cls == KEYWORD && debounce_ms(200)) {
      trigger_event(); // GPIO/BLE/Wi-Fi
    }
  }
}

11. Debugging & Benchmarking:

Sanity checks: Overfit a tiny subset; ensure pipeline correctness.
Feature drift: Log mean/var of features on device; compare with training.
Profiling: Toggle GPIO before/after inference; measure with logic analyzer.
Memory: Print arena usage; binary size breakdown (map file).
Latency: Aim <50 ms for audio; <100 ms for IMU gestures.

Benchmark Table (Example)

Task	Board	Model	Flash	RAM	Latency	Accuracy
Keyword	Nano 33 BLE	DS-CNN int8	120 KB	60 KB	22 ms	94%
Vibration	ESP32-S3	1D CNN int8	150 KB	80 KB	18 ms	95% AUC
Gesture	RP2040	TCN int8	90 KB	48 KB	28 ms	92%

Note: Numbers above are illustrative—measure on your exact firmware, compiler flags, and kernels.

12. FAQs

Can TinyML train on-device?

Mostly no—training is compute-heavy. Some boards can fine-tune tiny models or thresholds; for full training, use desktop/cloud and deploy weights.

How do I update models in the field?

Use OTA with signed artifacts. Keep model and features backward-compatible; version your DSP blocks.

What about non-audio, non-IMU tasks?

Environmental anomaly detection, smart agriculture (soil moisture patterns), and simple presence detection are viable.

How do I avoid false triggers?

Use multi-stage detection: cheap VAD/motion gate → classifier → temporal smoothing and confidence thresholds.

TinyML – Machine Learning on Microcontrollers (2025 Hands-On Guide)

1. What Is TinyML & Why Now?

2. Hardware: Boards, MCUs & Sensors

Popular Boards

Sensor Choices

Selection Criteria

3. Workflow: Data → Features → Model → Firmware

4. Model Architectures That Fit in KBs

Classical ML

Tiny DNNs

Anomaly Detection

5. Quantization, Pruning & Distillation:

6. Deployment with TFLM, Edge Impulse & ONNX:

TensorFlow Lite Micro (TFLM)

Edge Impulse Studio

ONNX Runtime for MCUs

7. Power, Latency & Memory Budgets

8. Privacy, Security & Updates:

9. Use Cases & Project Blueprints:

Keyword Spotting (Wake Word)

Vibration Anomaly (Predictive Maintenance)

Gesture Recognition

Simple Vision Events

10. Example Code: C++ Inference Loop (TFLM)

11. Debugging & Benchmarking:

Benchmark Table (Example)

12. FAQs

Can TinyML train on-device?

How do I update models in the field?

What about non-audio, non-IMU tasks?

How do I avoid false triggers?

You may like these posts

Post a Comment

0 Comments

Featured Post

Labels

Menu Footer Widget

Contact form