How to Deploy YOLOv8 on RK3566: Build Efficient Edge AI Inference from Scratch

ZedIoT
October 14, 2025
7:58 pm
0 comments

Learn how to deploy YOLOv8 on RK3566 step by step. This complete edge inference tutorial covers model conversion, optimization, and real-time detection setup for embedded AI systems.

Table of Contents

In recent years, edge AI has rapidly evolved from research prototypes to practical deployments. Among the most common use cases — such as smart cameras, industrial inspection, and autonomous robots — YOLOv8 has become the go-to model for real-time object detection due to its speed, accuracy, and lightweight design.

However, running YOLOv8 on resource-limited devices like RK3566 is not as straightforward as on a desktop GPU.

RK3566, equipped with a quad-core ARM Cortex-A55 CPU and a 1 TOPS NPU, offers a cost-effective and power-efficient platform for AI inference at the edge — but requires specific optimizations.

This guide provides a step-by-step deployment process — from model export to NPU acceleration — helping you deploy YOLOv8 on RK3566, and turn YOLOv8 into a fully functional edge detection system on RK3566.

Understanding the Hardware: RK3566 at a Glance

Component	Specification
CPU	Quad-core ARM Cortex-A55 (1.8GHz)
NPU	0.8–1.0 TOPS (Rockchip 3rd-gen NPU)
GPU	Mali-G52 (optional for OpenCL acceleration)
Memory	Up to 4GB LPDDR4
OS	Linux / Android (Debian, Ubuntu, or Buildroot variants)
SDK	Rockchip RKNN Toolkit 2.x

The NPU inside RK3566 is specifically designed for int8 quantized models, meaning you’ll need to convert YOLOv8 (originally in FP32) to RKNN format with proper calibration and optimization.

YOLOv8 Overview

YOLOv8, developed by Ultralytics, is the latest iteration of the popular “You Only Look Once” object detection series.

Compared with YOLOv5 and YOLOv7, YOLOv8 introduces:

Improved architecture with CSPDarknet + C2f blocks
Dynamic shape support for flexible resolutions
ONNX export compatibility for cross-platform deployment
Smaller model sizes (YOLOv8-n, YOLOv8-s) ideal for edge devices

On RK3566, the YOLOv8-n (Nano) model is recommended for achieving real-time inference while maintaining decent accuracy.

Deployment Workflow Overview

Before diving into code, let’s review the high-level process:

---
title: "YOLOv8 Deployment Workflow on RK3566"
---
graph TD
    %% ====== Styles ======
    classDef step fill:#FFFFFF,stroke:#555,stroke-width:1.6,rx:6,ry:6,color:#222;
    classDef model fill:#E8F0FE,stroke:#1A5FFF,stroke-width:2,rx:6,ry:6,color:#0B2161,font-weight:bold;
    classDef convert fill:#E0F7FA,stroke:#00838F,stroke-width:2,rx:6,ry:6,color:#004D40,font-weight:bold;
    classDef deploy fill:#E8F5E9,stroke:#0A7E07,stroke-width:2,rx:6,ry:6,color:#064C00,font-weight:bold;
    classDef output fill:#FFF3D6,stroke:#E69A00,stroke-width:2,rx:6,ry:6,color:#663C00,font-weight:bold;

    %% ====== Workflow ======
    A["Train or Download YOLOv8 Model"]:::model --> 
    B["Export to ONNX Format"]:::convert -->
    C["Convert ONNX → RKNN<br/>via RKNN Toolkit"]:::convert -->
    D["Quantize & Optimize Model<br/>(INT8/FP16)"]:::convert -->
    E["Deploy RKNN on RK3566 Board"]:::deploy -->
    F["Run Inference via<br/>Python or C++ API"]:::deploy -->
    G["Display or Stream Results<br/>in Real Time"]:::output

    %% ====== Link Style ======
    linkStyle default stroke:#555,stroke-width:1.6;

Each step corresponds to one technical stage:

Model preparation — train or download YOLOv8 weights.
ONNX export — use Ultralytics CLI or API.
Conversion — convert ONNX → RKNN using Rockchip RKNN Toolkit 2.
Deployment — run inference via RKNN runtime on the device.
Visualization — render detection boxes on camera input.

Environment Preparation

1. Hardware Requirements

RK3566 board (e.g., Radxa Zero 3W, Pine64 Quartz64, or custom industrial SBC)
5V/3A power supply
USB serial cable or SSH access
Camera (USB / MIPI)

2. Software Environment

Layer	Tool / Version
Host PC	Ubuntu 20.04 / 22.04
Python	3.8+
YOLOv8	Ultralytics >= 8.0.50
ONNX	1.12+
RKNN Toolkit	v2.3.0+
RKNN Runtime	for ARM64 / Debian

💡 Tip: You need both the RKNN Toolkit (for model conversion on PC) and RKNN Runtime (for deployment on the device).

3. Install RKNN Toolkit on Host PC

# Create virtual environment
python3 -m venv rknn_env
source rknn_env/bin/activate

# Install dependencies
pip install torch onnx onnxsim ultralytics
pip install rknn-toolkit2==2.3.0

After installation, verify by running:

python -m rknn.api.rknn --version

You should see a valid version output confirming the toolkit is correctly installed.

4. Export YOLOv8 Model to ONNX

If you’ve trained your custom YOLOv8 model (or downloaded pre-trained weights), export it with the following command:

yolo export model=yolov8n.pt format=onnx opset=12

This will produce a yolov8n.onnx file, which can now be optimized for RK3566.

Deploy YOLOv8 on RK3566 (Step-by-Step)

1. Convert ONNX to RKNN Format

Once you have the YOLOv8 ONNX model (yolov8n.onnx), the next step is converting it into Rockchip’s RKNN format, which is optimized for the NPU.

Below is a sample Python script using RKNN Toolkit 2:

from rknn.api import RKNN

rknn = RKNN()

# 1. Load ONNX model
rknn.load_onnx(model='yolov8n.onnx')

# 2. Configure preprocessing parameters
rknn.config(
    mean_values=[[0, 0, 0]],
    std_values=[[255, 255, 255]],
    target_platform='rk3566',
    quantized_dtype='asymmetric_affine-u8'
)

# 3. Build RKNN model
rknn.build(do_quantization=True, dataset='./dataset.txt')

# 4. Export model
rknn.export_rknn('yolov8n_rk3566.rknn')

Explanation:

target_platform='rk3566' ensures compatibility with the RK3566 NPU.
dataset.txt should contain a list of sample image paths for quantization calibration.
The quantization step converts FP32 → INT8, significantly improving inference speed.

⚠️ Tip: Choose representative images for quantization to minimize accuracy loss.

2. Prepare the Dataset for Quantization

To generate dataset.txt:

find ./images/ -type f -name "*.jpg" > dataset.txt

Use around 100–300 images covering your main object categories and lighting variations.

The more representative your dataset, the better the quantization accuracy.

3. Verify RKNN Model on Host PC

Before deploying to the board, test the converted RKNN model locally to ensure correctness:

from rknn.api import RKNN
import cv2

rknn = RKNN()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()

img = cv2.imread('test.jpg')
outputs = rknn.inference(inputs=[img])

print(outputs)

If the model runs without errors and produces detection tensors, you’re ready to deploy it onto RK3566.

4. Deploying to RK3566 Board

Step 1. Transfer Files

Copy the following to your RK3566 device via SCP or USB:

yolov8n_rk3566.rknn
test.jpg
inference_rk3566.py

Step 2. Install Runtime

On RK3566 (Debian/Ubuntu system):

sudo apt update
sudo apt install python3-opencv
pip3 install rknn-runtime==2.3.0

Step 3. Run Inference

python3 inference_rk3566.py

Example minimal code:

from rknnlite.api import RKNNLite
import cv2

rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()

img = cv2.imread('test.jpg')
outputs = rknn.inference(inputs=[img])

# Visualize results
print("Inference output shape:", [x.shape for x in outputs])

💡 Pro Tip: RKNNLite is optimized for on-device inference and uses less memory than RKNN Toolkit.

5. Real-Time Camera Inference

For applications like smart surveillance or factory inspection, connect a USB/MIPI camera to the RK3566 device.

import cv2
from rknnlite.api import RKNNLite

rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    outputs = rknn.inference(inputs=[frame])
    # TODO: Add YOLOv8 postprocessing (NMS + bbox drawing)
    cv2.imshow('YOLOv8 RK3566', frame)
    if cv2.waitKey(1) == 27:  # ESC
        break

The postprocessing stage involves decoding model output tensors and applying Non-Max Suppression (NMS) to draw bounding boxes.

6. Measuring Performance

You can use Python’s time module to benchmark inference time:

import time
t0 = time.time()
outputs = rknn.inference(inputs=[img])
print("Inference time:", (time.time() - t0)*1000, "ms")

Typical results for YOLOv8n (INT8) on RK3566:

Model	Resolution	FPS	CPU Usage	Power Consumption
YOLOv8n (INT8)	320×320	~18–22 FPS	<50%	~3.5W
YOLOv8s (INT8)	640×640	~8–10 FPS	<70%	~4.2W

👉 You can achieve real-time detection for small to medium models on RK3566, ideal for IoT cameras, kiosks, and embedded vision systems.

Optimization Tips

Use Fixed Input Resolution (e.g. 320×320) Avoid dynamic resizing on-device to save CPU cycles.
Quantization-Aware Training (QAT) If possible, retrain YOLOv8 with quantization awareness to preserve accuracy.
Batch Normalization Folding Enable folding during conversion for better NPU compatibility.
Use RKNN Precompiled Postprocessing Rockchip SDK provides C++ utilities for YOLO postprocessing (NMS) with NEON optimization.

Building Full Inference Pipeline on RK3566

Now that YOLOv8 is successfully running on RK3566, we’ll complete the system by implementing post-processing, real-time visualization, and deployment automation.

1. Understanding YOLOv8 Output Structure

After running inference, the RKNN model outputs one or more tensors, typically shaped like:

(1, 84, 8400)

Where:

8400 = total number of anchor points (depending on input size)
84 = (4 bbox coordinates + 80 class probabilities)

The next step is to decode these raw tensors into bounding boxes, class labels, and confidence scores.

2. Implementing YOLOv8 Post-Processing

Here’s a simplified Python example for YOLOv8 output decoding and Non-Max Suppression (NMS) on RK3566:

import numpy as np

def xywh2xyxy(x):
    y = np.zeros_like(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2
    y[:, 1] = x[:, 1] - x[:, 3] / 2
    y[:, 2] = x[:, 0] + x[:, 2] / 2
    y[:, 3] = x[:, 1] + x[:, 3] / 2
    return y

def nms(boxes, scores, iou_thresh=0.45):
    idxs = scores.argsort()[::-1]
    keep = []
    while len(idxs) > 0:
        i = idxs[0]
        keep.append(i)
        if len(idxs) == 1:
            break
        iou = bbox_iou(boxes[i], boxes[idxs[1:]])
        idxs = idxs[1:][iou < iou_thresh]
    return keep

def bbox_iou(box1, boxes):
    inter_x1 = np.maximum(box1[0], boxes[:,0])
    inter_y1 = np.maximum(box1[1], boxes[:,1])
    inter_x2 = np.minimum(box1[2], boxes[:,2])
    inter_y2 = np.minimum(box1[3], boxes[:,3])
    inter_area = np.maximum(inter_x2 - inter_x1, 0) * np.maximum(inter_y2 - inter_y1, 0)
    area1 = (box1[2]-box1[0])*(box1[3]-box1[1])
    area2 = (boxes[:,2]-boxes[:,0])*(boxes[:,3]-boxes[:,1])
    return inter_area / (area1 + area2 - inter_area)

This lightweight NMS function works efficiently on RK3566 for up to 8–10 objects per frame.

⚙️ If you need higher performance, Rockchip provides a C++ YOLO post-processing SDK (rknn_yolov5_postprocess.cc) that can be easily adapted to YOLOv8.

3. Visualizing Real-Time Detection

Integrate NMS with OpenCV to visualize detections from your live camera feed:

import cv2
from rknnlite.api import RKNNLite

def draw_boxes(img, boxes, scores, cls_ids, class_names):
    for box, score, cls in zip(boxes, scores, cls_ids):
        x1, y1, x2, y2 = map(int, box)
        label = f"{class_names[cls]} {score:.2f}"
        cv2.rectangle(img, (x1, y1), (x2, y2), (0,255,0), 2)
        cv2.putText(img, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2)
    return img

rknn = RKNNLite()
rknn.load_rknn('yolov8n_rk3566.rknn')
rknn.init_runtime()

cap = cv2.VideoCapture(0)
class_names = open("coco.names").read().strip().split("\n")

while True:
    ret, frame = cap.read()
    if not ret:
        break
    outputs = rknn.inference(inputs=[frame])
    boxes, scores, cls_ids = postprocess_yolov8(outputs)
    frame = draw_boxes(frame, boxes, scores, cls_ids, class_names)
    cv2.imshow('YOLOv8 Edge AI - RK3566', frame)
    if cv2.waitKey(1) == 27:
        break

This gives you a real-time camera detection demo running directly on RK3566’s NPU — typically reaching 15–20 FPS with YOLOv8n.

4. Deploying as a Service (Production Mode)

Once the system runs smoothly, you can automate it using systemd so that it launches automatically after power-on — suitable for kiosks, industrial cameras, or unattended IoT devices.

Step 1. Create a Service File

sudo nano /etc/systemd/system/yolov8_inference.service

[Unit]
Description=YOLOv8 Edge Inference on RK3566
After=network.target

[Service]
ExecStart=/usr/bin/python3 /home/pi/yolo/inference_rk3566.py
Restart=always
User=pi
WorkingDirectory=/home/pi/yolo/

[Install]
WantedBy=multi-user.target

Step 2. Enable & Start Service

sudo systemctl enable yolov8_inference.service
sudo systemctl start yolov8_inference.service

The service now runs automatically on boot — ensuring headless operation for real-world edge deployments.

5. Optional: Dockerized Deployment

For developers building scalable solutions (e.g., batch camera inference), containerization can simplify setup and updates.

Dockerfile Example

FROM arm64v8/python:3.9
RUN apt update && apt install -y python3-opencv
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
COPY yolov8n_rk3566.rknn /app/
COPY inference_rk3566.py /app/
WORKDIR /app
CMD ["python3", "inference_rk3566.py"]

Build & Run

docker build -t yolov8-rk3566 .
docker run --privileged --device /dev/video0:/dev/video0 yolov8-rk3566

This encapsulates all dependencies and ensures consistent runtime behavior across devices.

6. Integration Possibilities

Once YOLOv8 inference is stable on RK3566, you can expand functionality:

Integration	Description
MQTT / WebSocket	Stream detection results to cloud dashboard or local server.
RTSP Video Stream	Use GStreamer or ffmpeg to output processed video.
Edge–Cloud Hybrid	Combine RK3566 inference with cloud analytics via REST API.
Custom Models	Replace YOLOv8n with your own trained models (e.g., defect detection).

This flexibility makes RK3566 suitable for smart retail, factory inspection, traffic monitoring, and AIoT gateways.

Final Verification Checklist

✅ Model converted successfully (.rknn file valid)

✅ Quantization accuracy acceptable (mAP loss <3%)

✅ Real-time performance achieved (>15 FPS)

✅ Auto-start service working correctly

✅ Integration tests (MQTT, camera, Docker) passed

When all these boxes are checked, your RK3566-powered device is ready for production-grade YOLOv8 edge inference.

Summary

By following this tutorial, you’ve built a complete end-to-end YOLOv8 Edge AI pipeline on RK3566 — from training to deployment.

Key takeaways:

RKNN Toolkit 2 simplifies ONNX → RKNN conversion and quantization.
RKNN Runtime enables fast, low-power inference on embedded hardware.
With proper post-processing and service automation, RK3566 becomes a reliable edge vision platform for commercial applications.

Computer Vision, Deep Learning, Edge AI, Inference, Model Deployment, NPU, RK3566, RKNN Toolkit, YOLOv8

Seeking AI + IoT Development Guidance?

Contact us and we will help you analyze your requirements and tailor a suitable solution for you.

Contact us