This is a design note for anyone bringing up a Dynamixel 2.0 servo on the CH32V006. You don’t need prior DXL or RISC-V expertise; read it top to bottom. Protocol reference throughout is the DYNAMIXEL Protocol 2.0 spec .

If you’re new here, OpenServoCore is my effort to turn cheap SG90 / MG90 servos into networked smart actuators with sensor feedback, cascade control, and a DYNAMIXEL-style TTL bus. This is the first post in a series on how the CH32V006 firmware meets DXL 2.0’s timing budget on a $0.16 chip (qty. 1000+). Later posts will be linked here as they go live.

CH32V006 Limitations: DXL 2.0 does not require any idle period between the host’s last byte and the servo’s reply. The clean way to detect the host’s wire-end is to use a chip class with proper UART hardware support, for example, STM32G0 / G4 class chips. They expose USART primitives that deliver the wire-end with zero publish latency, but they are also pricy. The V006 doesn’t have anything equivalent, so we have to somehow have to do this in software-only regime built on the two USART flags it does give us.

TL;DR. Given software-only limitations, how does a DXL servo on the CH32V006 know precisely when the host’s request ended on the wire, across every baud rate from 9600 up to 3 Mbaud (the V006’s USART ceiling)? Call that moment the wire-end. It’s the reference point the servo schedules its reply against, since the protocol’s Return Delay Time (configurable from 2 µs) is measured from it, miss it and the reply lands in another servo’s slot and causes bus contention.

The CH32V006’s USART offers two usable signals: a per-byte interrupt (RXNE) that fires exactly at each byte’s stop bit, and a per-packet interrupt (IDLE) that fires once after the line has been quiet for 9 bit-times. However, RXNE is precise but expensive (100k IRQs/second at 1 Mbaud). IDLE is cheap but its publish latency, ~1 character time, exceeds short RDTs at low baud. The answer is to pick between them based on (baud, RDT). This post details what each signal looks like, and how to construct a strategy to determine which strategy to select dynamically.


Glossary

A few protocol terms recur throughout. The V006-specific acronyms (BRR, NDTR, PFIC, HCLK, etc.) are introduced inline where they first show up.

TermMeaning
DXLDYNAMIXEL, Robotis’s smart-servo bus and protocol. We speak DXL 2.0.
RDTReturn Delay Time. Servo-side reply offset from the host’s request-end, configurable 2–508 µs in 2 µs steps.
Fast Sync / Bulk ReadDXL 2.0 read variant where the host addresses many servos at once and all replies stitch into one coalesced Status frame.
CoalesceThe zero-idle-gap stitching above. What the last servo’s fire deadline has to preserve.
Wire-endThe moment a byte’s stop bit finishes clocking out. The actual “wire just went idle” event.
IDLE / RXNE / TCUSART status flags. Line went idle / a byte arrived / TX shift register fully drained.

1. What’s on the wire

DXL is a half-duplex serial bus, driven by a single host. Servos listens to host instruction packets and produce status packets. Since they all share a single physical DATA wrie, all nodes (host and servos) have to ensure they don’t step on each other. Otherwise it will cause bus contention and garbled bytes.

The question the servo has to answer: after the host’s request finishes, how long should I wait before replying?

The answer is a per-servo setting in the control table called Return Delay Time (RDT). Range: 2 to 508 microseconds, in 2 µs steps. The host writes it once during setup; the servo honors it on every reply.

Why precision matters: in Sync Read and Bulk Read, the host addresses many servos at once and they reply back-to-back in pre-assigned time slots. A servo that’s 200 µs late steps on the next servo’s reply and corrupts the bus. RDT is a hard deadline.

In the Fast Sync/Bulk variants the slots stitch into a single coalesced Status frame with zero idle gap, tightening the budget further. That case has its own architecture and its own write-up; this post stays with the foundation it sits on top of.

The foundation is two things the servo has to do, accurately:

  1. Detect the wire-end, the moment the request’s last byte finishes (specifically, when its stop bit clocks out).
  2. Schedule its reply to start exactly RDT µs (plus any slot offset) after that moment.

(2) is mechanical once you have a precise wire-end timestamp: arm a hardware timer. The hard part is (1), getting that timestamp precisely across the full DXL baud range, and that’s what the rest of this post is about.


2. The CH32V006: what we have to work with

For a dirt-cheap MCU, what is offers is not ideal for hardware based UART timing, but enough to make it work with some effort. Let’s layout the relavent parts:

Core. QingKe V2A @ 48 MHz HCLK, single-cycle SRAM. Flash above 24 MHz takes 2 wait-states per fetch, which adds up inside a hot ISR. qingke-rt’s highcode feature adds a .highcode section that boot-copies tagged functions into SRAM; we use it for the USART and SysTick handlers.

USART1. This is the DXL bus, and we will be using the following IRQs:

  • IDLE fires when the line has been idle for 9 bit-times of mark.
  • RXNE fires when a byte’s stop bit clocks in.
  • TC fires when the last byte of a TX has fully shifted out of the register and onto wire.

All three multiplex onto the USART1 vector, so we need to demux in the ISR code.

DMA1 CH5 is wired to USART1_RX as a circular buffer. RX bytes stream into the buffer in the background, so the USART vector only has to handle IDLE / RXNE / TC, not byte copying.

SysTick. is a 32-bit free counter at 48 Mhz. We use this to schedule TX. The scheduler has to detect if the schedule is in the past, and if so, decided if it should send TX immediately, or it should just drop the response. Compare-match (CMP) mode raises an IRQ when CNT == CMP, so writing a CMP value that’s already in the past schedules the IRQ for the next 32-bit wrap, 89 seconds away at 48 MHz. So that’s something to be taken account into as well.


3. The two USART flags

The USART hardware hands us two different “the request ended” signals. Each has its own personality.

3.1 RXNE: “a byte just arrived”

Fires once for each byte received. The interrupt happens as the byte’s stop bit clocks in, so the timestamp is exactly the wire-end of that byte. Very precise.

The catch is volume: at high baud you get a lot of these. At 3 Mbaud a stream of back-to-back bytes is 300,000 IRQs per second.

(V006 quirk: with RX DMA on, the data register is read before the IRQ handler entries, so STATR.RXNE always reads 0 inside the ISR; DMA wins the clear race. PFIC’s pending bit latches per byte independently, so the IRQ still fires 1:1 per byte. Treat IRQ entry itself as “a byte arrived” and read the RX DMA cursor (NDTR, the channel’s remaining-count register) for position.)

3.2 IDLE: “the wire went quiet”

Fires once after the line has been idle for 9 bit-times. One interrupt per packet, regardless of packet length. Cheap.

The catch is latency: the interrupt fires roughly one character time after the wire actually went quiet. So if you use the IRQ-entry timestamp as the wire-end, you’re systematically a character-time late.

We fix the value by backdating: subtract 9 × BRR HCLK ticks from the IRQ-entry timestamp. That’s straightforward. But there’s a deeper problem: even with a correct wire-end value, the servo doesn’t find out about wire-end until ~1 char-time after the wire went quiet. For low bauds, this is too late to send the response back. We already missed the schedule.


4. Why one char-time of delay breaks low baud

Here’s how the reply scheduler is structured:

  1. Wait until the wire-end timestamp shows up from the end-wire timing layer.
  2. Arm SysTick CMP at wire_end + RDT.

The scheduler needs the wire-end timestamp to arrive before the deadline. With IDLE-based timing, the timestamp shows up roughly one character time after the wire-end happened. That’s the publish latency, and it’s baked into the IDLE flag; backdating fixes the value but not when you learn it.

What happens if publish latency exceeds RDT? The deadline is already in the past by the time the scheduler tries to use it. SysTick CMP can’t be scheduled in the past, so the only thing it can do is fire immediately, which technically violates the protocol.

Concrete example: 9600 baud, RDT = 250 µs

quantityvalue
char_time = 9 bits / 9600 bps937.5 µs
publish latency (IDLE)937.5 µs (one char time)
RDT (the deadline)250.0 µs
late by687.5 µs, every reply

The servo can never make this deadline in IDLE mode. At high baud the same math is fine. At 1 Mbaud the character time is 9 µs; with RDT = 250 µs the servo has ~241 µs of slack. IDLE is plenty fast.

So the answer is baud-dependent: low baud needs the per-byte (RXNE) approach, high baud is happy with the per-packet (IDLE) approach.


5. The decision rule

Use IDLE whenever it can meet the deadline; use RXNE when it can’t:

char_time_us       = 9_000_000 / baud_hz       # publish latency of IDLE
rdt_us             = return_delay_2us × 2      # the deadline budget
pipeline_margin_us                             # headroom for ISR/dispatch work

use_rxne_framing = char_time_us + pipeline_margin_us > rdt_us

The pipeline_margin_us term accounts for the work between “IDLE ISR publishes timestamp” and “scheduler reads it.” Measure on hardware and add a few µs of headroom. ~20 µs is a reasonable starting point on V006.

5.1 What does the rule pick?

With pipeline_margin_us = 0 to keep the numbers clean (a real margin nudges the boundary slightly toward more RXNE):

baudchar_timeRDT = 2 µsRDT = 250 µsRDT = 508 µs
9600937.5 µsRXNERXNERXNE
57600156.3 µsRXNEIDLEIDLE
11520078.1 µsRXNEIDLEIDLE
1 M9.0 µsRXNEIDLEIDLE
3 M3.0 µsRXNEIDLEIDLE

Quick read:

  • The minimum RDT (2 µs) always forces RXNE: no budget for publish latency.
  • At a typical RDT of 250 µs, IDLE works from 57600 baud upward.
  • Slow buses (below ~17.7 kbaud) need RXNE even at the max RDT.

The two strategies sit at opposite ends of the trade-off naturally: at low baud, bytes are slow and per-byte interrupts are cheap in total; at high baud, packets are short and per-packet interrupts are all you need.


6. End-to-end timelines

6.1 IDLE mode (typical high-baud operation)

Three bytes arrive close-packed. RXNE pulses per byte but the end-wire timing layer ignores them. After ~9 bit-times of quiet the IDLE IRQ fires; the handler backdates wire_end = now − 9·BRR and pushes (bytes_added, wire_end) onto the IDLE-stamp queue. Main loop parses; dispatcher calls request_complete(parsed_end); scheduler arms SysTick CMP at wire_end + RDT. SysTick fires → fire_now() flips TX_EN, enables DMA CH4.

One interrupt per request. Cheap and clean.

6.2 RXNE mode (low-baud or minimum-RDT operation)

At low baud the bytes are spread out, and each RXNE fires far enough apart that per-byte work is cheap. The handler overwrites a single-cell snapshot (rx_cursor, now) on every byte. Main loop parses; dispatcher reads the snapshot for parsed_end’s position to get the wire-end tick; SysTick CMP arms at wire_end + RDT.

One interrupt per byte. At 9600 baud a 14-byte request is 14 interrupts spread across 14.5 ms, barely a blip. At 1 Mbaud it would be 14 interrupts in 140 µs, far too much, which is why the rule never picks RXNE at high baud.

6.3 Fast last-servo

A third path, the Fast Sync/Bulk last-servo reply, has to snoop predecessor slots and CRC bytes it didn’t generate. It’s its own architecture and out of scope here.


7. Interrupt responsibilities

PriorityIRQWhat it handlesWhere it lives
HighUSART1IDLE + RXNE + TC (all multiplexed).highcode (SRAM)
HighSysTick CMPReply deadline → fire TX.highcode (SRAM)
LowDMA1_CH1ADC kernel pump (20 kHz)flash (cold path)

The TC branch of the USART1 vector is also where deferred state changes apply: a host-issued baud change can’t retune the USART mid-reply, so it queues up and the TC handler re-tunes BRR and re-runs the IDLE-vs-RXNE framing decision once the reply has drained.


8. The consumer side

When the dispatcher finishes parsing a request, it asks the end-wire timing layer “when did this request’s last byte hit the wire?” via request_complete(parsed_end). The lookup branches on the current end-wire timing mode.

In IDLE mode the end-wire timing layer scans the queue for the entry whose cumulative byte count matches parsed_end, returns that tick, and pops it. If no entry matches, return None: either the IDLE flag hasn’t fired yet, or the queue overflowed and the matching entry was dropped.

In RXNE mode the end-wire timing layer reads the single-slot snapshot of the most recent byte’s (position, tick). The consumer checks whether the position matches parsed_end:

  • If yes, return the tick.
  • If no (more bytes arrived between parse and lookup), return None.

Returning None is graceful degradation. Slot-timed callers (Sync, Bulk, Fast Read) must skip when the answer is unknown; they can’t safely fire without a precise timestamp. Direct unicasts may proceed with an immediate fire, accepting the timing slip.

8.1 Why a queue for IDLE but a single cell for RXNE

Different timing regimes, different simplest solution.

  • IDLE mode runs at high baud. Packets can arrive faster than the main loop polls: three Sync Reads in quick succession will pile three IDLE timestamps before the dispatcher gets to any of them. A small FIFO absorbs the burst. Depth 4 is plenty.
  • RXNE mode runs at low baud. Inter-byte time is at least ~17 µs (at 57600) and grows to ~100 µs at 9600. The dispatcher’s parse-and-lookup is sub-microsecond. By the time the next byte’s IRQ fires, the previous packet’s lookup has long since completed. No concurrent stacking, so a single cell suffices.

9. Where the V006 sits on the DXL 2.0 spec

Plain unicast and non-coalesced Sync/Bulk replies pass the spec across the full DXL 2.0 baud range, using the timing mode the decision rule picks at each (baud, RDT) pair.

baudchar_timetiming modeplain DXL reply (RDT 250 µs)
9600937.5 µsRXNEpasses
57600156.3 µsIDLEpasses
11520078.1 µsIDLEpasses
1 M9.0 µsIDLEpasses
2 M4.5 µsIDLEpasses
3 M3.0 µsIDLEpasses

The harder case, Fast Sync/Bulk Read as the last servo (zero-idle coalesce + CRC over the whole frame), runs into a separate ceiling driven by SysTick CMP → TX_EN fire latency rather than by RX detection. That case has its own write-up.