Zero Order Data Selection
About 571 wordsAbout 2 min
2025-11-03
This document explains how to use the Zeroth Selector in the DataFlex framework to achieve dynamic selection of training data, thereby enhancing the performance of supervised fine-tuning (SFT). This method is an original selection approach that utilizes co-directional perturbations on the model for differential estimation, thereby obtaining the model's zeroth-order gradient to calculate the effective score of the data.
1. Method Overview
The core idea of Zeroth Selector is:Based on the SGD version of the influence function, it measures the correlation between training samples and validation samples through the similarity in zeroth-order gradient directions. It is worth noting that for both training and validation data, the perturbation noise uses the same random seed. Therefore, in the current version, the selection algorithm can directly use the product of differentials (projected gradients) to represent gradient inner product similarity. Advantages: 1. Avoids the problem of the gradient-based influence function, which can only be computed per sample and cannot be parallelized over data; 2. No backpropagation is required, saving time and GPU memory.
Mathematical Definition
Randomly sample ξ∼N(0,I) and perturb the model parameters θ,
Infzeroth(z,z′):=(2ϵf(θ+ϵξ;z)−f(θ−ϵξ;z))⋅(2ϵf(θ+ϵξ;z′)−f(θ−ϵξ;z′))
2. Implementation Steps
Step 1: Environment Installation
git clone https://github.com/OpenDCAI/DataFlex.git
cd DataFlex
pip install -e .
pip install llamafactoryStep 2: Zeroth Selector Parameter Configuration
Configuration File Path:
DataFlex/src/dataflex/configs/components.yamlExample Configuration:
less:
name: zeroth
params:
cache_dir: ../dataflex_saves/zeroth_output
seed: 42Parameter Description:
cache_dir: The path to save intermediate results, i.e., intermediate difference values.seed: Optional, random seed used as the generator seed for sampling noise.
Step 3: Dynamic Training Configuration
Configuration File Path:
DataFlex/examples/train_lora/selectors/zeroth.yamlExample Configuration:
### model
model_name_or_path: Qwen/Qwen2.5-0.5B-Instruct
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 8
#deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
### dataset
dataset: alpaca_en_demo
#dataset: flan_v2,cot_data,dolly_data,oasst1_data
#eval_dataset: mmlu_eval
template: qwen
cutoff_len: 4096
# max_samples: 100000000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 0
# disable_shuffling: true
seed: 42
### output
output_dir: /data1/xlyang/Flex/saves/zeroth/
logging_steps: 10
save_steps: 100
plot_loss: true
save_only_model: false
overwrite_output_dir: true
### swanlab
report_to: none # choices: [none, wandb, tensorboard, swanlab,
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 16
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: false
fp16: true
ddp_timeout: 180000000
### dynamic_train
train_type: dynamic_select
components_cfg_file: src/dataflex/configs/components.yaml
component_name: zeroth
warmup_step: 4
update_step: 3
update_times: 2
eval_dataset: alpaca_zh_demoParameter Description:
model_name_or_path: Model name or path for supervised fine-tuning.dataset: Training dataset.output_dir: Output directory of dynamic fine-tuning (LoRA adapter).warmup_step: Number of warmup steps before the first sample selection.update_step: Number of steps between each dynamic data selection.update_times: Total number of dynamic data selection iterations.eval_dataset: Validation dataset.
Both dataset and eval_dataset can be selected from DataFlex/data/dataset_info.json or local JSON files in ShareGPT/Alpaca format. Note: The training set size significantly affects computation cost. Total steps = warmup_step + update_step × update_times.
Step 4: Run Training
FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/zeroth.yamlThe training process automatically performs dynamic data selection and model updates.
3. Model Evaluation
It is recommended to use the DataFlow Model QA Evaluation Pipeline for systematic evaluation of the generated model.

