In recent years, edge AI has rapidly evolved from research prototypes to practical deployments. Among the most common use cases — such as smart cameras, industrial inspection, and autonomous robots — YOLOv8 has become the go-to model for real-time object detection due to its speed, accuracy, and lightweight design.
However, running YOLOv8 on resource-limited devices like RK3566 is not as straightforward as on a desktop GPU.
RK3566, equipped with a quad-core ARM Cortex-A55 CPU and a 1 TOPS NPU, offers a cost-effective and power-efficient platform for AI inference at the edge — but requires specific optimizations.
This guide provides a step-by-step deployment process — from model export to NPU acceleration — helping you deploy YOLOv8 on RK3566, and turn YOLOv8 into a fully functional edge detection system on RK3566.
Understanding the Hardware: RK3566 at a Glance
Component | Specification |
---|---|
CPU | Quad-core ARM Cortex-A55 (1.8GHz) |
NPU | 0.8–1.0 TOPS (Rockchip 3rd-gen NPU) |
GPU | Mali-G52 (optional for OpenCL acceleration) |
Memory | Up to 4GB LPDDR4 |
OS | Linux / Android (Debian, Ubuntu, or Buildroot variants) |
SDK | Rockchip RKNN Toolkit 2.x |
The NPU inside RK3566 is specifically designed for int8 quantized models, meaning you’ll need to convert YOLOv8 (originally in FP32) to RKNN format with proper calibration and optimization.
YOLOv8 Overview
YOLOv8, developed by Ultralytics, is the latest iteration of the popular “You Only Look Once” object detection series.
Compared with YOLOv5 and YOLOv7, YOLOv8 introduces:
- Improved architecture with CSPDarknet + C2f blocks
- Dynamic shape support for flexible resolutions
- ONNX export compatibility for cross-platform deployment
- Smaller model sizes (YOLOv8-n, YOLOv8-s) ideal for edge devices
On RK3566, the YOLOv8-n (Nano) model is recommended for achieving real-time inference while maintaining decent accuracy.
Deployment Workflow Overview
Before diving into code, let’s review the high-level process:
--- title: "YOLOv8 Deployment Workflow on RK3566" --- graph TD %% ====== Styles ====== classDef step fill:#FFFFFF,stroke:#555,stroke-width:1.6,rx:6,ry:6,color:#222; classDef model fill:#E8F0FE,stroke:#1A5FFF,stroke-width:2,rx:6,ry:6,color:#0B2161,font-weight:bold; classDef convert fill:#E0F7FA,stroke:#00838F,stroke-width:2,rx:6,ry:6,color:#004D40,font-weight:bold; classDef deploy fill:#E8F5E9,stroke:#0A7E07,stroke-width:2,rx:6,ry:6,color:#064C00,font-weight:bold; classDef output fill:#FFF3D6,stroke:#E69A00,stroke-width:2,rx:6,ry:6,color:#663C00,font-weight:bold; %% ====== Workflow ====== A["Train or Download YOLOv8 Model"]:::model --> B["Export to ONNX Format"]:::convert --> C["Convert ONNX → RKNN<br/>via RKNN Toolkit"]:::convert --> D["Quantize & Optimize Model<br/>(INT8/FP16)"]:::convert --> E["Deploy RKNN on RK3566 Board"]:::deploy --> F["Run Inference via<br/>Python or C++ API"]:::deploy --> G["Display or Stream Results<br/>in Real Time"]:::output %% ====== Link Style ====== linkStyle default stroke:#555,stroke-width:1.6;
Each step corresponds to one technical stage:
- Model preparation — train or download YOLOv8 weights.
- ONNX export — use Ultralytics CLI or API.
- Conversion — convert ONNX → RKNN using Rockchip RKNN Toolkit 2.
- Deployment — run inference via RKNN runtime on the device.
- Visualization — render detection boxes on camera input.
Environment Preparation
1. Hardware Requirements
- RK3566 board (e.g., Radxa Zero 3W, Pine64 Quartz64, or custom industrial SBC)
- 5V/3A power supply
- USB serial cable or SSH access
- Camera (USB / MIPI)
2. Software Environment
Layer | Tool / Version |
---|---|
Host PC | Ubuntu 20.04 / 22.04 |
Python | 3.8+ |
YOLOv8 | Ultralytics >= 8.0.50 |
ONNX | 1.12+ |
RKNN Toolkit | v2.3.0+ |
RKNN Runtime | for ARM64 / Debian |
💡 Tip: You need both the RKNN Toolkit (for model conversion on PC) and RKNN Runtime (for deployment on the device).
3. Install RKNN Toolkit on Host PC
# Create virtual environment
python3 -m venv rknn_env
source rknn_env/bin/activate
# Install dependencies
pip install torch onnx onnxsim ultralytics
pip install rknn-toolkit2==2.3.0
After installation, verify by running:
python -m rknn.api.rknn --version
You should see a valid version output confirming the toolkit is correctly installed.
4. Export YOLOv8 Model to ONNX
If you’ve trained your custom YOLOv8 model (or downloaded pre-trained weights), export it with the following command:
yolo export model=yolov8n.pt format=onnx opset=12
This will produce a yolov8n.onnx file, which can now be optimized for RK3566.
Deploy YOLOv8 on RK3566 (Step-by-Step)
1. Convert ONNX to RKNN Format
Once you have the YOLOv8 ONNX model (yolov8n.onnx), the next step is converting it into Rockchip’s RKNN format, which is optimized for the NPU.
Below is a sample Python script using RKNN Toolkit 2:
from rknn.api import RKNN
rknn = RKNN()
# 1. Load ONNX model
rknn.load_onnx(model='yolov8n.onnx')
# 2. Configure preprocessing parameters
rknn.config(
mean_values=[[0, 0, 0]],
std_values=[[255, 255, 255]],
target_platform='rk3566',
quantized_dtype='asymmetric_affine-u8'
)
# 3. Build RKNN model
rknn.build(do_quantization=True, dataset='./dataset.txt')
# 4. Export model
rknn.export_rknn('yolov8n_rk3566.rknn')
Explanation:
- target_platform='rk3566' ensures compatibility with the RK3566 NPU.
- dataset.txt should contain a list of sample image paths for quantization calibration.
- The quantization step converts FP32 → INT8, significantly improving inference speed.
⚠️ Tip: Choose representative images for quantization to minimize accuracy loss.
2. Prepare the Dataset for Quantization
To generate dataset.txt:
find ./images/ -type f -name "*.jpg" > dataset.txt
Use around 100–300 images covering your main object categories and lighting variations.
The more representative your dataset, the better the quantization accuracy.
3. Verify RKNN Model on Host PC
Before deploying to the board, test the converted RKNN model locally to ensure correctness:
from rknn.api import RKNN
import cv2
rknn = RKNN()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()
img = cv2.imread('test.jpg')
outputs = rknn.inference(inputs=[img])
print(outputs)
If the model runs without errors and produces detection tensors, you’re ready to deploy it onto RK3566.
4. Deploying to RK3566 Board
Step 1. Transfer Files
Copy the following to your RK3566 device via SCP or USB:
yolov8n_rk3566.rknn
test.jpg
inference_rk3566.py
Step 2. Install Runtime
On RK3566 (Debian/Ubuntu system):
sudo apt update
sudo apt install python3-opencv
pip3 install rknn-runtime==2.3.0
Step 3. Run Inference
python3 inference_rk3566.py
Example minimal code:
from rknnlite.api import RKNNLite
import cv2
rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()
img = cv2.imread('test.jpg')
outputs = rknn.inference(inputs=[img])
# Visualize results
print("Inference output shape:", [x.shape for x in outputs])
💡 Pro Tip: RKNNLite is optimized for on-device inference and uses less memory than RKNN Toolkit.
5. Real-Time Camera Inference
For applications like smart surveillance or factory inspection, connect a USB/MIPI camera to the RK3566 device.
import cv2
from rknnlite.api import RKNNLite
rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
outputs = rknn.inference(inputs=[frame])
# TODO: Add YOLOv8 postprocessing (NMS + bbox drawing)
cv2.imshow('YOLOv8 RK3566', frame)
if cv2.waitKey(1) == 27: # ESC
break
The postprocessing stage involves decoding model output tensors and applying Non-Max Suppression (NMS) to draw bounding boxes.
6. Measuring Performance
You can use Python’s time module to benchmark inference time:
import time
t0 = time.time()
outputs = rknn.inference(inputs=[img])
print("Inference time:", (time.time() - t0)*1000, "ms")
Typical results for YOLOv8n (INT8) on RK3566:
Model | Resolution | FPS | CPU Usage | Power Consumption |
---|---|---|---|---|
YOLOv8n (INT8) | 320×320 | ~18–22 FPS | <50% | ~3.5W |
YOLOv8s (INT8) | 640×640 | ~8–10 FPS | <70% | ~4.2W |
👉 You can achieve real-time detection for small to medium models on RK3566, ideal for IoT cameras, kiosks, and embedded vision systems.
Optimization Tips
- Use Fixed Input Resolution (e.g. 320×320) Avoid dynamic resizing on-device to save CPU cycles.
- Quantization-Aware Training (QAT) If possible, retrain YOLOv8 with quantization awareness to preserve accuracy.
- Batch Normalization Folding Enable folding during conversion for better NPU compatibility.
- Use RKNN Precompiled Postprocessing Rockchip SDK provides C++ utilities for YOLO postprocessing (NMS) with NEON optimization.
Building Full Inference Pipeline on RK3566
Now that YOLOv8 is successfully running on RK3566, we’ll complete the system by implementing post-processing, real-time visualization, and deployment automation.
1. Understanding YOLOv8 Output Structure
After running inference, the RKNN model outputs one or more tensors, typically shaped like:
(1, 84, 8400)
Where:
- 8400 = total number of anchor points (depending on input size)
- 84 = (4 bbox coordinates + 80 class probabilities)
The next step is to decode these raw tensors into bounding boxes, class labels, and confidence scores.
2. Implementing YOLOv8 Post-Processing
Here’s a simplified Python example for YOLOv8 output decoding and Non-Max Suppression (NMS) on RK3566:
import numpy as np
def xywh2xyxy(x):
y = np.zeros_like(x)
y[:, 0] = x[:, 0] - x[:, 2] / 2
y[:, 1] = x[:, 1] - x[:, 3] / 2
y[:, 2] = x[:, 0] + x[:, 2] / 2
y[:, 3] = x[:, 1] + x[:, 3] / 2
return y
def nms(boxes, scores, iou_thresh=0.45):
idxs = scores.argsort()[::-1]
keep = []
while len(idxs) > 0:
i = idxs[0]
keep.append(i)
if len(idxs) == 1:
break
iou = bbox_iou(boxes[i], boxes[idxs[1:]])
idxs = idxs[1:][iou < iou_thresh]
return keep
def bbox_iou(box1, boxes):
inter_x1 = np.maximum(box1[0], boxes[:,0])
inter_y1 = np.maximum(box1[1], boxes[:,1])
inter_x2 = np.minimum(box1[2], boxes[:,2])
inter_y2 = np.minimum(box1[3], boxes[:,3])
inter_area = np.maximum(inter_x2 - inter_x1, 0) * np.maximum(inter_y2 - inter_y1, 0)
area1 = (box1[2]-box1[0])*(box1[3]-box1[1])
area2 = (boxes[:,2]-boxes[:,0])*(boxes[:,3]-boxes[:,1])
return inter_area / (area1 + area2 - inter_area)
This lightweight NMS function works efficiently on RK3566 for up to 8–10 objects per frame.
⚙️ If you need higher performance, Rockchip provides a C++ YOLO post-processing SDK (rknn_yolov5_postprocess.cc) that can be easily adapted to YOLOv8.
3. Visualizing Real-Time Detection
Integrate NMS with OpenCV to visualize detections from your live camera feed:
import cv2
from rknnlite.api import RKNNLite
def draw_boxes(img, boxes, scores, cls_ids, class_names):
for box, score, cls in zip(boxes, scores, cls_ids):
x1, y1, x2, y2 = map(int, box)
label = f"{class_names[cls]} {score:.2f}"
cv2.rectangle(img, (x1, y1), (x2, y2), (0,255,0), 2)
cv2.putText(img, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2)
return img
rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()
cap = cv2.VideoCapture(0)
class_names = open("coco.names").read().strip().split("\n")
while True:
ret, frame = cap.read()
if not ret:
break
outputs = rknn.inference(inputs=[frame])
boxes, scores, cls_ids = postprocess_yolov8(outputs)
frame = draw_boxes(frame, boxes, scores, cls_ids, class_names)
cv2.imshow('YOLOv8 Edge AI - RK3566', frame)
if cv2.waitKey(1) == 27:
break
This gives you a real-time camera detection demo running directly on RK3566’s NPU — typically reaching 15–20 FPS with YOLOv8n.
4. Deploying as a Service (Production Mode)
Once the system runs smoothly, you can automate it using systemd so that it launches automatically after power-on — suitable for kiosks, industrial cameras, or unattended IoT devices.
Step 1. Create a Service File
sudo nano /etc/systemd/system/yolov8_inference.service
[Unit]
Description=YOLOv8 Edge Inference on RK3566
After=network.target
[Service]
ExecStart=/usr/bin/python3 /home/pi/yolo/inference_rk3566.py
Restart=always
User=pi
WorkingDirectory=/home/pi/yolo/
[Install]
WantedBy=multi-user.target
Step 2. Enable & Start Service
sudo systemctl enable yolov8_inference.service
sudo systemctl start yolov8_inference.service
The service now runs automatically on boot — ensuring headless operation for real-world edge deployments.
5. Optional: Dockerized Deployment
For developers building scalable solutions (e.g., batch camera inference), containerization can simplify setup and updates.
Dockerfile Example
FROM arm64v8/python:3.9
RUN apt update && apt install -y python3-opencv
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
COPY yolov8n_rk3566.rknn /app/
COPY inference_rk3566.py /app/
WORKDIR /app
CMD ["python3", "inference_rk3566.py"]
Build & Run
docker build -t yolov8-rk3566 .
docker run --privileged --device /dev/video0:/dev/video0 yolov8-rk3566
This encapsulates all dependencies and ensures consistent runtime behavior across devices.
6. Integration Possibilities
Once YOLOv8 inference is stable on RK3566, you can expand functionality:
Integration | Description |
---|---|
MQTT / WebSocket | Stream detection results to cloud dashboard or local server. |
RTSP Video Stream | Use GStreamer or ffmpeg to output processed video. |
Edge–Cloud Hybrid | Combine RK3566 inference with cloud analytics via REST API. |
Custom Models | Replace YOLOv8n with your own trained models (e.g., defect detection). |
This flexibility makes RK3566 suitable for smart retail, factory inspection, traffic monitoring, and AIoT gateways.
Final Verification Checklist
✅ Model converted successfully (.rknn file valid)
✅ Quantization accuracy acceptable (mAP loss <3%)
✅ Real-time performance achieved (>15 FPS)
✅ Auto-start service working correctly
✅ Integration tests (MQTT, camera, Docker) passed
When all these boxes are checked, your RK3566-powered device is ready for production-grade YOLOv8 edge inference.
Summary
By following this tutorial, you’ve built a complete end-to-end YOLOv8 Edge AI pipeline on RK3566 — from training to deployment.
Key takeaways:
- RKNN Toolkit 2 simplifies ONNX → RKNN conversion and quantization.
- RKNN Runtime enables fast, low-power inference on embedded hardware.
- With proper post-processing and service automation, RK3566 becomes a reliable edge vision platform for commercial applications.