Scalable dataset acquisition for data-driven lensless imaging

Department of Electrical Engineering and Computer Sciences
University of California, Berkeley

Abstract

Data-driven developments in lensless imaging, such as machine learning-based reconstruction algorithms, require large datasets. In this work, we introduce a data acquisition pipeline that can capture from multiple lensless imaging systems in parallel, paired with computational ground truth registration. Our dataset acquisition system consists of a hardware system for data acquisition and software framework for hardware control and computational processing. We provide an open-access 25,000 image dataset with two lensless imagers, a reproducible hardware setup, and open-source camera synchronization code. Experimental datasets from our system can enable data-driven developments in lensless imaging, such as machine learning-based reconstruction algorithms and end-to-end system design.

Dataset

We provide an open-source, 25,000 image dataset with two lensless imagers—the DiffuserCam and the Random Multi-focal Lenslet (RML) imager—and paired ground truth images. We used MIRFLICKR-25000 as the ground truth dataset. Our dataset consists of:

  • 25,000 DiffuserCam images
  • 25,000 RML images
  • 25,000 ground truth images
  • Experimentally captured point spread functions (PSFs)

The dataset includes a calibration PSF for each imager at an exposure that produced the best results during our tests. We also include a folder of PSFs for both imagers at varying exposures for further experimentation.

If you use our dataset in our research, we kindly ask that you cite our publication.

Access our dataset

Hardware setup

Our hardware system captures measurements from multiple lensless imaging systems and a ground truth lensed camera in parallel. We provide a step-by-step tutorial for building our system on our Hardware Setup page and encourage other researchers to build off of our setup.

Hardware setup

Software package

We developed a software package to automate image display and camera capture. For near real-time feedback during calibration and alignment, we use an image reconstruction script thatreconstructs images using 200 iterations of FISTA. Additionally, we correct for lens distortion in the lensed, ground truth camera and achieve pixel-wise alignment between the lensless and lensed images via a learned homography. All code is available on Github, with a detailed usage and calibration guide on our Software page.
Software pipeline

Collaborate with us!

We are excited to collaborate with researchers broadly in computational imaging and machine learning. If you are interested in using our dataset, have questions about our work, or would like to collaborate, please reach out to clarahung@berkeley.edu.

Related Work

Lensless Imagers

Other Lensless Datasets

We link other lensless datasets that are open-access. We welcome suggestions for additional datasets to include, as well as comparisons between our datasets and others.

BibTeX

@misc{hung2025scalabledatasetacquisitiondatadriven,
      title={Scalable dataset acquisition for data-driven lensless imaging}, 
      author={Clara S. Hung and Leyla A. Kabuli and Vasilisa Ponomarenko and Laura Waller},
      year={2025},
      eprint={2501.13334},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2501.13334}, 
}