NPU vs TPU: Edge AI Boards Compared

AI is rapidly moving beyond cloud data centers into small, embedded devices. This shift is called Edge AI, and it demands hardware that can deliver high performance while consuming very little power. Instead of deploying large GPUs like RTX cards everywhere, engineers now rely on NPUs and TPUs — purpose-built accelerators designed for local inference at the edge.

What is Edge AI?

Edge AI refers to running artificial intelligence (AI) algorithms directly on devices that sit “at the edge” of the network — closer to where the data is collected. This means inference happens on cameras, IoT boards, or robotics systems without sending all data to the cloud. It reduces latency, increases privacy, and enables real-time decisions in environments with limited connectivity.

For example, a smart surveillance camera with an NPU can detect motion and classify objects instantly, even if it’s not connected to the internet. Similarly, an agricultural drone with a TPU can analyze crops in real time without relying on a remote server. Edge AI makes AI more reliable, scalable, and cost-effective.

With the rise of autonomous systems, industrial automation, and AI-powered consumer devices, the demand for efficient edge processors has surged. This is where NPUs (Neural Processing Units) and TPUs (Tensor Processing Units) come into play.

What are NPUs and TPUs?

NPU (Neural Processing Unit): NPUs are dedicated processors built to accelerate deep learning inference tasks, such as image recognition, object detection, or speech processing. They excel at running multiple neural networks in parallel while consuming very low power. Modern NPUs (like Hailo-8 or Rockchip NPUs) often support ONNX and TFLite models, making them versatile for many AI frameworks.

TPU (Tensor Processing Unit): TPUs were introduced by Google and optimized for tensor operations, the backbone of deep learning. While Cloud TPUs are massive chips for model training in Google’s data centers, Edge TPUs are compact accelerators designed for inference on devices. Edge TPUs, like those found in Google Coral and Asus Tinker Edge T, are particularly good at running TensorFlow Lite INT8 models with extreme efficiency.

Boards & Accelerators

1. Asus Tinker Edge R (NPU)

The Tinker Edge R is powered by a Rockchip Hexa-core SoC with an integrated NPU capable of 3 TOPS. It supports TensorFlow, PyTorch (via conversion), and ONNX. With its higher RAM (4GB) and extensive I/O, it’s designed as an AIoT gateway or robotics brain. The Edge R balances CPU + NPU performance, making it suitable for multi-model workloads such as navigation and vision simultaneously. For developers, it offers strong SDK and Linux support, making deployment straightforward.

2. Asus Tinker Edge T (TPU)

The Tinker Edge T integrates Google’s Edge TPU with a compact board layout. Its TPU provides up to 4 TOPS performance for INT8 TensorFlow Lite models. The Edge T shines when running small-to-medium models like MobileNet, EfficientDet, or PoseNet with minimal power draw. Compared to NPUs, it is more restrictive in supported frameworks (TFLite only), but the ease of use and reliable performance make it ideal for vision IoT devices. This board is often deployed in smart cameras and environmental sensors.

3. Google Coral Dev Board (TPU)

The Coral Dev Board combines an NXP i.MX8M SoC with an onboard Edge TPU. This board is designed for rapid prototyping and experimentation. It supports TensorFlow Lite quantized models out-of-the-box, allowing quick deployment of object detection, classification, and NLP models. While limited to 4 TOPS, its simplicity, ready-to-use SDK, and strong community support make it one of the most developer-friendly edge AI boards. Coral is excellent for proof-of-concept IoT and educational research projects.

4. Hailo-8 M.2 (Standalone NPU)

The Hailo-8 is one of the most powerful edge NPUs available today, reaching 26 TOPS of performance at very low power consumption. Designed as an M.2 or mini-PCIe card, it can be embedded into SBCs, industrial PCs, or even laptops for edge inference. Unlike TPUs, it supports a broader range of models (via ONNX or TensorFlow Lite) and excels at multi-stream vision. It’s particularly well-suited for applications like autonomous driving, industrial inspection, and smart city infrastructure where large amounts of video must be processed in real time.

Hailo-8 M.2 NPU — Hailo-8 M.2 delivering 26 TOPS for multi-camera workloads.

5. Raspberry Pi 5 + Hailo-8 (Our Setup)

The Raspberry Pi 5 is not an AI accelerator itself, but with its new PCIe M.2 HAT+, it becomes a versatile edge AI hub. By pairing it with a Hailo-8 NPU module, the Pi can achieve up to 26 TOPS performance. Our setup, housed in an acrylic case with active cooling, demonstrates how compact yet powerful an edge AI device can be. The case dimensions are 95mm (L) × 65mm (W) × 35mm (H), making it small enough for embedded projects yet powerful enough for multi-camera inference. This combination allows the Pi 5 to handle multi-camera pipelines, real-time analytics, and robotics AI tasks while retaining the flexibility of the Pi ecosystem. Developers benefit from Hailo’s compiler and runtime (HailoRT), which support ONNX and TFLite conversion workflows.

Raspberry Pi 5 with Hailo-8 acrylic case top view

Raspberry Pi 5 with Hailo-8 acrylic case front view

Installation & Deployment Guide

Setting up a Raspberry Pi 5 with a Hailo-8 accelerator is straightforward. Below is a step-by-step process from hardware installation to running your first AI model.

1. Hardware Setup

Mount the Raspberry Pi 5 into the acrylic case.
Attach the M.2 HAT+ to the PCIe connector.
Insert the Hailo-8 M.2 module securely into the M.2 slot.
Connect an active cooler/heatsink to avoid thermal throttling.
Power the Pi with a reliable 5V / 5A USB-C power supply.

2. Operating System & Updates

sudo apt update && sudo apt -y full-upgrade
sudo reboot

Ensure you are running 64-bit Raspberry Pi OS for compatibility with the Hailo runtime.

3. Install Hailo Runtime & Tools

Hailo provides a Debian repository with prebuilt packages. Typical installation:

# Install Hailo runtime & tools
sudo apt install -y hailo-all
sudo reboot

# Verify the device is detected
hailortcli fw-control identify

If `hailo-all` packages was not found make sure you have `raspios-bookworm-arm64` edition. You can check the OS edition using:

cat /etc/os-release

should show VERSION_CODENAME=bookworm (NOT trixie)

4. Running a Demo Inference

# Execute inference with a compiled HEF model
hailortcli run --hef ./model.hef --input video.mp4 --output result.mp4

Alternatively, use GStreamer pipelines to process live video streams directly from a USB or CSI camera.

5. Python API Example

import numpy as np
from hailo_platform import HEF, Device, VDevice, ConfigureParams

hef = HEF("model.hef")
with VDevice(Device()) as device:
    net_group = device.configure(hef, ConfigureParams())
    with net_group.create_streams() as streams:
        in_stream = streams.get_input_streams()[0]
        out_stream = streams.get_output_streams()[0]

        frame = np.zeros(in_stream.get_frame_size(), dtype=np.uint8)
        in_stream.write(frame)
        result = out_stream.read()
        print("Inference output size:", len(result))

6. Deployment Notes

Model Conversion: Use the Hailo Dataflow Compiler to convert ONNX/TFLite models into .hef files.
Thermals: Continuous inference requires proper cooling — the acrylic case with active fan is recommended.
I/O Bandwidth: The Pi 5 provides PCIe Gen2 x1, sufficient for most workloads but avoid excessive batching.
Frameworks: Official support includes ONNX and TensorFlow Lite (quantized INT8 models).

Comparison

Board	Type	TOPS	Frameworks	Use Case
Asus Tinker Edge R	NPU	3	TF, PyTorch (via conversion), ONNX	Robotics, gateway, multi-model workloads
Asus Tinker Edge T	TPU	2–4	TensorFlow Lite (INT8)	IoT vision devices, compact inference
Google Coral Dev Board	TPU	4	TensorFlow Lite (INT8)	IoT prototyping, education, PoC
Raspberry Pi 5 + Hailo-8	SBC + NPU	26	ONNX, TFLite	Flexible AI gateway, robotics, analytics
Hailo-8 (Standalone)	NPU	26	ONNX, TFLite	Multi-camera vision, industrial automation

Conclusion

NPU vs TPU is not a matter of one replacing the other, both represent different philosophies in edge AI design.

NPUs (like Hailo-8 or Tinker Edge R) offer flexibility and higher throughput, often supporting multiple frameworks. They are better suited for robotics, autonomous systems, and scenarios where multiple AI tasks must run in parallel. TPUs (like Coral or Tinker Edge T) are narrower in focus, but they excel at highly efficient TensorFlow Lite inference, making them excellent for IoT and vision-specific tasks.

From an academic and research perspective, NPUs provide a more general-purpose platform for experimenting with ONNX and PyTorch-converted models, while TPUs demonstrate how extreme efficiency can be achieved by narrowing focus to one framework. Both are critical in pushing AI into the real world beyond cloud servers.

As edge AI adoption grows in smart cities, healthcare, agriculture, and industrial automation, expect NPUs and TPUs to coexist. Researchers, hobbyists, and developers now have the power to run state-of-the-art AI anywhere, from a Raspberry Pi with Hailo-8 to specialized TPU boards.

NPU vs TPU: A Practical Guide for Raspberry Pi 5 with Hailo M.2 (and Coral Edge TPU)