Scalable dataset acquisition for data-driven lensless imaging

Clara S. Hung, Leyla A. Kabuli, Vasilisa Ponomarenko, Laura Waller

Department of Electrical Engineering and Computer Sciences
University of California, Berkeley

Abstract

Data-driven developments in lensless imaging, such as machine learning-based reconstruction algorithms, require large datasets. In this work, we introduce a data acquisition pipeline that can capture from multiple lensless imaging systems in parallel, paired with computational ground truth registration. Our dataset acquisition system consists of a hardware system for data acquisition and software framework for hardware control and computational processing. We provide an open-access 25,000 image dataset with two lensless imagers, a reproducible hardware setup, and open-source camera synchronization code. Experimental datasets from our system can enable data-driven developments in lensless imaging, such as machine learning-based reconstruction algorithms and end-to-end system design.

Dataset

We provide an open-source, 100,000 image dataset with two lensless imagers—the DiffuserCam and the Random Multi-focal Lenslet (RML) imager—and paired ground truth images. We used the first 100,000 images of the MIRFLICKR-1M dataset. Our dataset consists of:

100,000 DiffuserCam images
100,000 RML images
100,000 ground truth images
Experimentally captured point spread functions (PSFs)

The dataset includes a calibration PSF for each imager at an exposure that produced the best results during our tests. We also include a folder of PSFs for both imagers at varying exposures for further experimentation.

If you use our dataset in our research, we kindly ask that you cite our publication.

Access our dataset

Hardware setup

Our hardware system captures measurements from multiple lensless imaging systems and a ground truth lensed camera in parallel. We provide a step-by-step tutorial for building our system on our Hardware Setup page and encourage other researchers to build off of our setup.

Software package

We provide a software package to automate image display and camera capture. For near real-time feedback during calibration and alignment, we use an image reconstruction script that reconstructs images using FISTA. Additionally, we correct for lens distortion in the lensed, ground truth camera and achieve pixel-wise alignment between the lensless and lensed images via a learned homography. All code is available on Github, with a detailed usage and calibration guide on our Software page.

Collaborate with us!

We are excited to collaborate with researchers broadly in computational imaging and machine learning. If you are interested in using our dataset, have questions about our work, or would like to collaborate, please reach out to clarahung@berkeley.edu.

Related Work

Lensless Imagers

Other Lensless Datasets

We link other lensless datasets that are open-access. We welcome suggestions for additional datasets to include, as well as comparisons between our datasets and others.

BibTeX

This work was presented at SPIE Photonics West 2025. The paper and poster can be found in the proceedings here.

@inproceedings{hung2025scalable,
      title={Scalable dataset acquisition for data-driven lensless imaging},
      author={Hung, Clara S and Kabuli, Leyla A and Ponomarenko, Vasilisa and Waller, Laura},
      booktitle={Computational Optical Imaging and Artificial Intelligence in Biomedical Sciences II},
      volume={13333},
      pages={54--58},
      year={2025},
      organization={SPIE}
    }