DanceOPD On-Policy Generative Field Distillation

Wei Zhou, Xiongwei Zhu, Zelin Xu, Bo Dong, Lixue Gong, Yongyuan Liang,
Meng Chu, Leigang Qu, Lingdong Kong, Wei Liu, Tat-Seng Chua
Method arXiv Code (Soon)
Core idea

Treat every source capability as a velocity field, then learn where and how to query those fields on the student's own rollout.

DanceOPD hard-routed on-policy field matching A sample is routed to one selected capability field, queried on a student rollout state, and trained by MSE field matching. Sample T2I Edit Style T2I Field Edit Field Style Field HardRoute SelectedField Off-policyStates Student Rollout(on-policy states) Semantic Query(low-noise,single query) vselected vstudent vselected vstudent vselected vstudent MSEfield matching Latent dim 1Latent dim 2 Selected field Unselected field Student rollout Semantic query(low-noise) Off-policy states Field velocity Student velocity
Animated method overview. DanceOPD uses hard routing to select one frozen capability field, queries it on a low-noise on-policy student state, and matches the selected velocity with a local velocity MSE loss.

Overview

Latest manuscript summary

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image generation, local editing, and global editing. These abilities are rarely naturally aligned: editing can degrade T2I performance, while global and local editing can interfere with each other.

DanceOPD is an on-policy generative field distillation framework for flow-matching models. Each sample is routed to one frozen capability field, one low-noise student-induced state is queried, and the student is trained with a simple velocity MSE objective. The same formulation also absorbs operator-defined fields such as classifier-free guidance.

DanceOPD overview chart exactly matching the manuscript figure
Exact interactive overview The visible figure is the byte-identical manuscript export. Hover or click individual bars and scatter points for EasyChart-style readouts without changing the underlying pixels.
Overview. Exact high-resolution manuscript overview with coordinate-aligned interactive hotspots.
5.347
GEditBench Avg in T2I + Edit Composition
0.849
GenEval Overall while preserving T2I
5.498
GEditBench Avg in Local + Global Edit Composition
5.833
Best CFG absorption diagnostic Avg

Challenge: capability synthesis is a field-query problem

Three query-induced alignment failures

Once each frozen source is viewed as a velocity field over the shared flow state space, capability synthesis depends on three choices: which field supervises a sample, where the field is queried, and how many states from a rollout are used.

1

Target-field ambiguity

Softly averaging several source fields can destroy the semantic identity of a capability query. The update may point to no real teacher behavior.

DanceOPD: hard-routed sample-wise field matching
2

State-distribution mismatch

Data states or teacher trajectories are off-policy for the student. They miss the states the deployed model actually visits at inference time.

DanceOPD: query on stop-gradient student rollout states
3

Trajectory-query correlation

Dense states from the same rollout share prompt, noise, dynamics, and history. More states can over-weight one correlated path.

DanceOPD: one low-noise semantic-side query
A

T2I + Edit Composition

Add editing ability while retaining text-to-image prompt following and visual quality.

B

Local + Global Edit Composition

Fuse preservation-heavy local editing with transformation-heavy global editing.

C

Realism / Style Absorption

Move the student toward a quality or style field while keeping base T2I behavior.

D

CFG Absorption

Internalize classifier-free guidance as an operator-defined velocity field.

Method: DanceOPD

Hard-routed semantic-side on-policy velocity matching

DanceOPD keeps each local target semantically well-defined, queries the target where the current student actually goes, and avoids dense correlated supervision. The full update is a local field-matching step on a stop-gradient rollout state.

1 · Route one sample

Keep one semantic target per sample instead of averaging teachers.

(x,c) m∼π vm
m ∼ π(m),   (x,c) ∼ Dm
um(z,t,c) = vm(z,t,c)
2 · Query on-policy

Ask the frozen field at a state from the current student rollout.

zT
z0:Tθ = Rollout(vθ; zT, c)
t = sg(ztθ)
3 · Match velocities

The selected field and student velocity meet in one local MSE.

vm vθ MSE field match
single low-noise query, K = 1
L = E || vθ(z̄t,t,c) − vm(z̄t,t,c) ||22
4 · Absorb operator fields

Classifier-free guidance is another velocity field to distill.

v vcond α vα
guided velocity field
vα = v + α(vcond − v)
A. Route one capability query

m ∼ π(m), (x,c) ∼ Dm

Each sample chooses exactly one frozen capability field. Unless stated otherwise, active capability buckets use a uniform route ratio.

B. Query on the student trajectory

z0:Tθ = Rollout(vθ; zT, c)

The target field is queried at sg(ztθ), exposing the teacher to student-visited states without backpropagating through the solver.

C. Use one semantic-side state

K = 1, low-noise query

Low-noise states concentrate edit, style, and visual-attribute signals; one query avoids within-rollout correlation.

Main results

Multi-capability synthesis under shared sources

The desired behavior is not a midpoint between specialists. A single student should strengthen the target capability while preserving the anchor capability under the same deployment model.

T2I + Edit

5.347 GEditBench / 0.849 GenEval

+8.1% over the best reproduced OPD baseline and +8.5% over the edit source on GEditBench.

Local + Global Edit

5.498 GEditBench / 0.848 GenEval

+16.1% over the best competing composition baseline and +7.9% over the local edit source.

MethodGEditBench Avg ↑GenEval Overall ↑Takeaway
Joint training4.6170.808Mixed supervision dilutes edit capability.
Weight merge0.3440.836Preserves T2I but collapses editing.
Off-policy distill.4.5280.818Teacher states leave a train–inference mismatch.
DiffusionOPD4.9470.833Improves editing but below DanceOPD.
Flow-OPD4.8540.814OPD baseline still suffers capability interference.
DanceOPD5.3470.849Best edit score and best GenEval in this block.
MethodGEditBench Avg ↑GenEval Overall ↑Takeaway
Joint training4.5460.821Conflict between preservation and transformation.
Weight merge4.7150.811Static parameter interpolation remains a compromise.
Off-policy distill.4.7360.798Target ability improves less and T2I drops.
DiffusionOPD4.6610.822Below DanceOPD on both metrics.
Flow-OPD4.6790.827Stable but not enough to fuse local/global behaviors.
DanceOPD5.4980.848Best capability synthesis in the harder conflict setting.
SourceGEditBench Avg ↑GenEval Overall ↑Role
T2I0.832Anchor generation field.
Edit4.9300.711General edit source.
Local Edit5.0950.793Preservation-heavy source.
Global Edit3.7500.808Transformation-heavy source.
Qualitative examples of DanceOPD
Qualitative examples. The composed student supports diverse text-to-image and editing behaviors while retaining strong original generative capability.

Diagnostics and ablations

Why the design choices matter

The latest ablations show that failures are not simply about loss naming or training length. They trace back to query construction: ambiguous targets, off-policy states, and correlated dense trajectory samples.

Hard routing vs. soft fusion

5.751 hard-routed MSE vs. 4.994 soft-teacher MSE. Averaging all teachers erases capability identity.

Low-noise semantic query

At 2k steps, low-t reaches 5.751, above median-t 4.649 and high-t 4.813.

Single query beats dense queries

K=1 reaches 5.751; weighted K=4 drops to 5.330, and weighted K=16 drops to 5.127.

Moderate rollout discretization is enough

At 2k steps, 8/16/20/28 rollout steps stay in a practical band; 16 steps gives 5.751 / 0.858.

Plain MSE is stable

Velocity MSE reaches 5.751, outperforming timestep weighting, KL weighting, DMD-style, SDS-style, and consistency variants.

CFG absorption composes

Training α and inference β multiply approximately. Best measured composition is 5.833; over-guided αβ=49 drops to 4.015.

Qualitative gallery

T2I, edit, local/global fusion, and training progression

The gallery follows the manuscript organization: global edits, local/global edits, additional material and style edits, pure T2I preservation, same-object transformations, and local/global training progression.

Citation

Paper available on arXiv

arXiv:2606.27377 is now available. Code will be added once it is released.

@misc{zhou2026danceopdonpolicygenerativefield,
      title={DanceOPD: On-Policy Generative Field Distillation}, 
      author={Wei Zhou and Xiongwei Zhu and Zelin Xu and Bo Dong and Lixue Gong and Yongyuan Liang and Meng Chu and Leigang Qu and Lingdong Kong and Wei Liu and Tat-Seng Chua},
      year={2026},
      eprint={2606.27377},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.27377}, 
}