Data Pruning
How can we eliminate redundant or low-quality samples from large datasets without losing what matters?
A workshop on the data side of scale, exploring how better training data leads to more efficient learning, at the European Conference on Computer Vision in Malmö, Sweden, September 2026.
The ECCV 2026 Workshop on Curated Data for Efficient Learning (CDEL) seeks to advance the understanding and development of data-centric techniques that improve the efficiency of training large-scale machine learning models. As model sizes continue to grow and data requirements scale accordingly, this workshop brings attention to the increasingly critical role of data quality, selection, and synthesis in achieving high model performance with reduced computational cost.
Rather than chasing ever-larger datasets and models, CDEL focuses on curating and distilling high-value data, drawing on techniques such as dataset distillation, data pruning, synthetic data generation, and sampling optimization. These approaches aim to reduce redundancy, improve generalization, and make learning possible even when data is scarce.
We welcome submissions on any topic related to curating training data, whether in vision, language, or multimodal learning. The topics below are only suggestions; we will consider submissions on any interesting data-related topic. Submissions are handled through our OpenReview venue.
Suggested topics, not exhaustive
How can we eliminate redundant or low-quality samples from large datasets without losing what matters?
How can we use generative models to create or augment datasets, and when does it pay off?
How can we learn tiny datasets of highly-efficient synthetic samples that match the training signal of much larger ones?
How can we train models in areas where existing data is extremely scarce, sensitive, or hard to label?
What problems in data-centric AI can we expect in the near future as model and data scales continue to grow?
The topics above are just suggestions. We welcome submissions on any interesting data-related topic, even if it doesn't fit neatly into the list.
Long papers and extended abstracts, on a single submission deadline. Archival submissions should follow the ECCV submission guidelines for formatting and length. Long papers may opt into the ECCV workshop proceedings if dual-submission rules permit. Cross-submissions of work currently in review or recently accepted elsewhere are also welcome and may be presented at the workshop.
Want to volunteer as a reviewer? Sign up information will be posted here closer to the submission deadline.
Exact deadlines may shift to follow ECCV's workshop calendar. Subscribe to announcements for updates.
Subscribe to announcementsWe're inviting researchers from across academia and industry who specialize in the data side of AI.
Details will be announced as confirmations come in.
Researchers from MIT, Princeton, NUS, and CMU working across dataset distillation, synthetic data, and the data-centric foundations of efficient learning.







Questions? Contact George at gcaz@mit.edu.