NICE Data Selector
About 712 wordsAbout 2 min
2025-12-17
This document introduces how to use the NICE Selector for dynamic data selection in the DataFlex framework. The method constructs gradient similarity between the training set and the validation set: the training set uses SFT loss gradients, while the validation set uses policy gradients based on a reward model. After random projection, similarities are computed to select training samples that are most aligned with the target samples. This method is based on
NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric (ICML 2025).
1. Method Overview
The core workflow of NICE Selector:
- Data normalization: Automatically supports formats such as Alpaca and ShareGPT.
- Training-set gradients: Compute gradients for each training sample and project them using TRAK.
- Reward-set gradients: Perform Monte Carlo sampling on validation data, generate responses, score them using a reward model (local vLLM or remote API), compute policy gradients toward the reward direction, and project them.
- Similarity-based selection: Align and normalize projected gradients, rank training samples by their average similarity to validation samples, and select the top-k samples for the current training round.
2. Implementation Steps
Step 1: Environment Setup
git clone https://github.com/OpenDCAI/DataFlex.git
cd DataFlex
pip install -e .
pip install llamafactoryStep 2: NICE Selector Configuration
Configuration file path:
DataFlex/src/dataflex/configs/components.yamlExample configuration:
nice:
name: nice
params:
cache_dir: ../dataflex_saves/nice_output
gradient_type: adam
proj_dim: 4096
seed: 123
save_interval: 16
reward_model_backend: local_vllm # choices: [local_vllm, api]
reward_backend_params:
local_vllm:
hf_model_name_or_path: meta-llama/Llama-3.1-8B
vllm_tensor_parallel_size: 1
vllm_temperature: 0.0
vllm_top_p: 0.9
vllm_max_tokens: 512
vllm_top_k: 40
vllm_seed: 42
vllm_max_model_len: null
vllm_gpu_memory_utilization: 0.9
api:
api_url: https://api.openai.com/v1/chat/completions
api_key: DF_API_KEY
model_name: gpt-4o
temperature: 0.0
mc_samples: 4
max_new_tokens: 512
generation_temperature: 0.7
max_prompt_length: 4096Parameter description:
cache_dir: Path to cache gradient projections and selection results; supports resuming from checkpoints.gradient_type:adam(with first- and second-moment normalization) orsgd.proj_dim: Random projection dimension, controlling the cost/accuracy trade-off of similarity computation.reward_model_backend: Reward model backend;local_vllmuses local vLLM inference,apiuses an HTTP service.reward_backend_params: Backend-specific parameters.mc_samples: Number of Monte Carlo generations per reward sample, used to stabilize policy gradient estimation.max_new_tokens/generation_temperature/max_prompt_length: Generation length and sampling strategy for the policy model.
Step 3: Dynamic Training Configuration
Configuration file path:
DataFlex/examples/train_lora/selectors/nice.yamlExample configuration:
### model
model_name_or_path: meta-llama/Llama-3.1-8B
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 8
### dataset
dataset: alpaca_en_demo
template: llama3
cutoff_len: 4096
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 0
seed: 42
### output
output_dir: ../dataflex_saves/nice_output
logging_steps: 10
save_steps: 100
plot_loss: true
save_only_model: false
overwrite_output_dir: true
### swanlab
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
# use_swanlab: true
# swanlab_project: dynamic_nice_sft
# swanlab_run_name: name
# swanlab_workspace: your_workspace
# swanlab_api_key: xxxxxxx
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### dynamic_train
train_type: dynamic_select
components_cfg_file: src/dataflex/configs/components.yaml
component_name: nice
warmup_step: 10
update_step: 10
update_times: 2
eval_dataset: alpaca_zh_demo
per_device_eval_batch_size: 1
metric_for_best_model: eval_loss
greater_is_better: false
load_best_model_at_end: true
eval_strategy: steps # choices: [no, steps, epoch]
eval_steps: 10
early_stopping_steps: 3
early_stopping_min_delta: 0.01Parameter description:
component_name: Must match thenicecomponent incomponents.yaml, determining reward backend and projection dimensions.warmup_step/update_step/update_times: Control the dynamic selection schedule; total steps =warmup_step + update_step × update_times.eval_dataset: Validation set (Alpaca/ShareGPT style); reward model is used for scoring during generation.output_dir: Path to save LoRA adapters and caches.
Step 4: Run Training
FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/nice.yamlStep 5: Model Merge and Export
Configuration file path:
DataFlex/examples/merge_lora/llama3_lora_sft.yamlExample configuration:
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: ../dataflex_saves/nice_output
template: llama3
trust_remote_code: true
export_dir: ../dataflex_saves/Llama-3.1-8B_nice_lora_sft
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: falseParameter description:
adapter_name_or_path: Path to the LoRA adapters obtained from NICE dynamic selection training.export_dir: Output directory for the merged full model.
Run the merge and export command:
llamafactory-cli export llama3_lora_sft.yamlThe merged model will be saved to:
/dataflex_saves/Llama-3.1-8B_nice_lora_sft3. Model Evaluation
It is recommended to use the DataFlowModel QA Evaluation Pipeline to systematically evaluate the generated model, and to inspect the scoring logs in cache_dir to analyze the reward model’s sensitivity to different samples.

