RK3566 Can Run YOLOv8 INT8 — But Only Within These Limits

quhuanhuan
January 23, 2026
5:51 pm
0 comments

Real-time YOLOv8 INT8 on RK3566 is possible—if you manage execution paths, quantization effects, and model limits.

Table of Contents

Running YOLOv8 INT8 on RK3566 isn’t just a model conversion task—it’s a system-level alignment challenge. This article explores how quantization, operator compatibility, and detection head design define whether real-time inference is possible on RK3566, and under what strict conditions.

Why YOLOv8 INT8 Can’t Just Run on RK3566

What's the Real Issue with Combining RK3566 and YOLOv8

In many edge AI projects, RK3566 is often seen as a “cost-effective platform that can also handle some AI tasks.”

Its positioning isn’t aggressive: controllable power consumption, full peripheral support, and limited but non-zero compute power. This means one thing—it’s not designed for complex models with compute headroom.

YOLOv8 Detection sits in a delicate spot.

It’s no longer a “lightweight, run-anywhere” model but is still far from server-level detection models. In theory, it belongs to the “just about doable” category for RK3566.

In practice, this assumption often fails during deployment.

Many projects run smoothly at the model stage:

ONNX exports fine, PC-side inference is normal, and the structure doesn’t look too complex. But once converted to RKNN and deployed on-device, performance drops—unstable FPS, high CPU usage, and rapid system resource exhaustion.

This isn’t due to a single parameter being off—it’s a deeper issue:

RK3566 is not a platform that brute-forces inference. Usability depends on having a clean execution path.

Why Floating-Point Inference Has Little Value on RK3566

Running YOLOv8 Detection in FP16 or FP32 on RK3566 usually leads to predictable results:

The model runs, but runs poorly.

This isn’t an “implementation issue” of RKNN or NPU, but a design logic issue of the platform.

On RK3566, the NPU isn’t a fully independent compute unit.

If there are unsupported operators in the model, execution falls back to the CPU. Detection models often include such unsupported operations.

With floating point, the issues worsen:

Low NPU coverage for FP ops
Constant data movement between CPU and NPU
Fragmented inference, high scheduling overhead

Result:

Single-digit FPS with near-maxed system load.

In this state, optimizing FP16 further is meaningless.

It’s not under-tuned; it’s the wrong execution path for this hardware.

INT8 Isn’t a “Bonus” — It’s the Entry Point

Switching to INT8 reveals RK3566’s true nature.

INT8 isn’t just about precision—it unlocks the most stable, fully supported execution path on RK3566.

In INT8 mode:

Operator mapping success rate rises
Ops can stay in the NPU for longer
CPU acts more like a scheduler than a compute unit

Now, YOLOv8 Detection starts “really running on the NPU,” not just “partially using the NPU.”

But INT8 isn’t zero-cost.

Detection models are sensitive to quantization, especially in the Head. Poor quantization leads to missed detections or box jitter.

So the real question isn’t “should we use INT8,” but:

How far can YOLOv8 Detection go on RK3566 with INT8—and what are the boundaries?

YOLOv8 Detection’s Structure Determines RK3566 Compatibility

YOLOv8 Detection isn’t structurally complex—but its complexity concentrates in subtle areas.

Backbone is usually fine.

As long as channel counts and input sizes aren’t extreme, RK3566’s NPU handles it stably.

The real issues appear in the later stages:

Irregular scale changes during feature fusion
Unpredictable Concat and Upsample combinations
Unnecessary tensor ops in the Detection Head

These are legal in ONNX but can prevent the RKNN compiler from statically fixing the compute graph.

Most failures aren’t due to model size but structural elements the compiler can’t resolve.

If the graph can’t be fully static, NPU advantages disappear quickly.

If your model fails during RKNN compilation, the issue may lie in unsupported operators. See this ONNX opset compatibility reference for details.

Structure First — It’s a Prerequisite for INT8 Success

On RK3566, if structure doesn’t serve the execution path, quantization only helps partially.

Repeatedly validated practices include:

Fixed input size is more important than flexibility
Dynamic shapes offer less benefit than cost here
Simpler Detection Heads yield more stable INT8 results

These aren’t flashy choices—but they aim for one thing:

Keep the entire inference inside the NPU without interruption.

If the path breaks, even aggressive quantization can’t fix the overall performance.

How YOLOv8 INT8 Actually Runs on RK3566

What Happens Between Model and Device

Placing a YOLOv8 Detection model onto RK3566 isn’t determined by export, but by the in-between steps often oversimplified.

From PyTorch to on-device execution, three major changes occur:

Graph gets compressed into statically analyzable form
Data precision maps from float to fixed point
Execution path splits between NPU and CPU

Any “gray area” here nearly guarantees performance issues.

INT8 adds clarity to this path.

RK3566’s INT8 support goes beyond compute—it affects compilation, scheduling, and caching.

For a full walkthrough of exporting, converting, and deploying YOLOv8 models to RK3566, see our deployment guide.

INT8 Quantization Isn’t a One-Click Step

Many first-time users of RKNN INT8 think it’s just a switch:

Enable INT8, feed a few images, done.

But it’s more like a filtering process.

INT8 Quantization process for YOLOv8 INT8 on RK3566

Calibration data doesn’t “train” the model; it constrains value ranges.

Detection models are highly sensitive to feature distribution shifts, especially with large object size variation.

If calibration data poorly match real scenes, issues like:

Small object confidence drops
Box jitter across frames
Higher false positives in complex backgrounds

These problems usually occur in the Detection Head, not the Backbone.

Detection Head Is the Make-or-Break Point for INT8

On RK3566, bottlenecks rarely lie in the Backbone.

The real gap appears downstream.

The Detection Head has:

Frequent resolution shifts
Wide numerical ranges
Extreme precision demands

This makes it most prone to quantization distortion.

Even if Backbone and Neck quantize well, aggressive Head quantization can degrade detection.

That’s why many RK3566 failures stem not from “large models,” but Head structures misaligned with quantization.

Execution Continuity Beats Theoretical Compute

Once on-device, another critical issue surfaces:

Can operators stay on the NPU continuously?

Ideally:

Input goes into NPU
Multiple layers execute without switching
Output returns to CPU

But if an op can’t run on NPU, control switches to CPU.

On RK3566, with limited compute/bandwidth, this cost is high.

INT8 greatly increases the odds of uninterrupted NPU execution.

That’s why INT8 often yields better-than-linear performance gains on the same model.

Quantization Fails Don’t Mean Model Is Bad

Often, models that work fine on PC fail post-quantization on RK3566.

This doesn’t mean model choice was wrong. More likely:

Inadequate calibration data coverage
Input size mismatch
Detection Head too complex

In such cases, simplifying structure—like reducing branches or compressing channels—is more effective than tweaking quantization parameters.

Real-World Limits of YOLOv8 INT8 on RK3566

Test Conditions & Constraints

To avoid misleading results, all tests follow the same setup:

Hardware: RK3566 (NPU enabled)
Model: YOLOv8 Detection (no pruning)
Input Size: 640×640, fixed
Inference Mode: Single-frame, Batch=1
Post-Processing: On CPU, not on NPU

We ignore extreme tuning and special trims.
Goal: Evaluate a “reusable in production” YOLOv8 Detection on RK3566.

FPS: INT8 vs FP16 Isn’t Linearly Different

The raw numbers tell the story:

Model Precision	Inference FPS (640×640)	Stability
FP16	3 ~ 5 FPS	Unstable, high CPU load
INT8	12 ~ 18 FPS	Stable, sustained

This isn’t just “INT8 is faster”—it’s a different execution mode.

FP16 triggers frequent CPU fallback.
INT8 allows sustained NPU-only execution.

Only in INT8 mode does YOLOv8 Detection become close to real-time on RK3566.

This holds true across projects; differences arise in scale, not in conclusion.

Accuracy Loss: Localized, Not Global

FPS boost is just the start. Detection accuracy defines usability.

INT8 quantization doesn’t degrade uniformly—it shows structural patterns:

Scene Type	Accuracy Change
Medium/Large Objects	Mostly stable
Simple Background	Almost unaffected
Small/Dense Objects	Much more sensitive
Complex Textures	More false positives

Thus, INT8 doesn’t “weaken everything”—it amplifies pre-existing weak points.

If the Detection Head and calibration data match your real use case, losses are usually acceptable.

No “Sweet Spot” Between FPS & Accuracy

A common RK3566 myth: find a “perfect FPS with minimal accuracy drop.”

Reality:

Choose INT8: accept structured accuracy changes
Or use FP16: sacrifice real-time capability

It’s not a quantization flaw—it’s a hardware limit.

As model complexity nears RK3566’s ceiling, you can’t have both high FPS and full accuracy.

Turning Test Data into Engineering Judgments

Compressing the findings:

Usable FPS lower bound is ~10 FPS on RK3566
Below that, load spikes and stability plummets
INT8 is a must—but not sufficient on its own
Model structure and Head design define quantized accuracy

These hold across projects—this isn’t anecdotal.

When It’s Worth Using, When to Switch Plans

Combined performance and accuracy limits give us clear lines:

Suitable Scenarios

Single-class or few-class detection
Medium+ object sizes
FPS target of 10–15
Prioritize response speed over peak accuracy

Not Suitable

Many small objects
High precision needed
Complex post-processing
Expecting PC-level accuracy on RK3566

In the unsuitable cases, pushing RK3566 further yields little.
You must change model size, hardware, or task design.

YOLOv8 Detection Feasibility on RK3566 (INT8)

Dimension	Acceptable Range	Typical Issue When Exceeded	Takeaway
Precision Type	INT8	<5 FPS in FP16/FP32	INT8 is essential
Input Size	≤640×640	Larger → nonlinear FPS drop	Fixed input preferred
Real FPS	12–18 FPS	<10 FPS → system overload	10 FPS = lower bound
NPU Utilization	High (continuous)	Frequent CPU fallback	Path continuity > GFLOPs
Backbone	Light ~ medium	Rarely a problem	Acceptable
Detection Head	Simpler = better	Box jitter / missed detects	Decides success/failure
Small Obj. Density	Low ~ medium	High → misdetects increase	Not ideal use case
Calibration Data	Scene-aligned	Misaligned → accuracy loss	Critical for INT8
Long Runtime	Stable in INT8	FP16 fluctuates	INT8 is sustainable

Final Judgment

If only one takeaway matters, it’s this:

RK3566 can run YOLOv8 Detection—if you accept INT8 and understand its limits.

It’s not a failure platform, nor “AI-ready by default.”
When model, structure, and expectations align, RK3566 delivers stable, predictable results.
Push beyond its limits, and both performance and accuracy collapse.

Quick Decision Guide (Matrix)

Your Need	RK3566 + YOLOv8 INT8 Recommended?
Few-class detection	✅ Yes
Medium object sizes	✅ Yes
Realtime (≥10 FPS)	✅ Yes
Many small objects	❌ No
High-precision localization	❌ No
Complex post-processing	❌ No

Key Takeaways

INT8 is the starting point for YOLOv8 on RK3566
FPS gain comes from execution path shift, not compute boost
Accuracy loss centers on Detection Head and select cases
Once platform boundaries are clear, decisions become simpler

RK3566 can run YOLOv8 INT8—but only when you design within hard boundaries. From quantization to execution path planning, success depends on matching model constraints with RKNN’s capabilities and the NPU’s limited flexibility. Push past those limits, and the system fails predictably.

Looking to deploy object detection in constrained edge environments, such as the RK3566?
ZedIoT builds custom AIoT pipelines designed for real-world constraints—see our Edge AI system capabilities to explore what we deliver on RK3566 and beyond.

FAQ

Q: What is the optimal inference mode for YOLOv8 on RK3566?

A: INT8 quantization is the only viable mode. It ensures maximum NPU utilization, minimizes CPU fallback, and enables 12–18 FPS, compared to 3–5 FPS in FP16.

Q: Why is RK3566 not suited for floating-point inference?

A: Floating-point ops are only partially supported by the RK3566 NPU. Unsupported ops get routed to the CPU, leading to fragmented execution and low performance.

Q: Where does INT8 quantization most affect YOLOv8 accuracy?

A: In the Detection Head, due to frequent resolution changes and high precision needs. It’s the most fragile area post-quantization.

Q: How can I make sure my YOLOv8 model survives INT8 quantization?

A: Use calibration data that mirrors deployment scenarios, fix input resolution, and simplify the Detection Head architecture.

Q: Is there a performance-accuracy sweet spot for YOLOv8 on RK3566?

A: No. You either accept INT8’s structured accuracy loss or fall back to FP16 with unacceptably low FPS. It’s a binary choice dictated by platform limits.

Edge AI, Embedded Vision, INT8 quantization, NPU Optimization, Object Detection, RK3566, RKNN Toolkit, YOLOv8

Seeking AI + IoT Development Guidance?

Contact us and we will help you analyze your requirements and tailor a suitable solution for you.

Contact us