$\newcommand\measurementvec{\mathbf{b}} \newcommand\measurementmtx{\mathbf{A}} \newcommand\imagevec{\mathbf{v}} \newcommand\psf{\mathbf{h}} \newcommand{\crop}{\mathbf{C}} \newcommand\full{\mathbf{A}} \newcommand{\ftpsf}{\mathbf{H}}$ Gradient descent is an iterative algorithm that finds the minimum of a convex function by following the slope "downhill" until it reaches a minimum. To solve the minimization problem \begin{equation*} \operatorname{minimize} g(\mathbf{x}), \end{equation*} we find the gradient of $g$ wrt $\mathbf{x}$, $\nabla_\mathbf{x} g$, and use the property that the gradient always points in the direction of steepest ascent. In order to minimize $g$, we go the other direction: $$\begin{align*} \mathbf{x}_0 &= \text{ initial guess} \\ \mathbf{x}_{k+1} &\leftarrow \mathbf{x}_k - \alpha_k \nabla g(\mathbf{x}_k), \end{align*}$$ where $\alpha$ is a step size that determines how far in the descent direction we go at each iteration.
Applied to our problem: $$\begin{align*} g(\imagevec) &= \frac{1}{2} \|\full\imagevec- \measurementvec \|_2^2 \\ \nabla_\imagevec g(\imagevec) &= \full^H (\full\imagevec-\measurementvec), \end{align*}$$ where $\full^H$ is the adjoint of $\full$, $\measurementvec$ is the sensor measurement and $\imagevec$ is the image of the scene.
We use more efficient variants of this algorithm, like Nesterov Momentum and FISTA, both of which are shown below.
import numpy as np
import numpy.fft as fft
import matplotlib.pyplot as plt
from IPython import display
from PIL import Image
%matplotlib inline
The code takes in two grayscale images: a point spread function (PSF) $\texttt{psfname}$ and a sensor measurement $\texttt{imgname}$. The images can be downsampled by a factor $f$, which must be a of the form $1/{2^k}$, for some non negative integer $k$ (typically between 1/2 and 1/8).
psfname = "./psf_sample.tif"
imgname = "./rawdata_hand_sample.tif"
# Downsampling factor (used to shrink images)
f = 1/8
# Number of iterations
iters = 100
def loaddata(show_im=True):
psf = Image.open(psfname)
psf = np.array(psf, dtype='float32')
data = Image.open(imgname)
data = np.array(data, dtype='float32')
"""In the picamera, there is a non-trivial background
(even in the dark) that must be subtracted"""
bg = np.mean(psf[5:15,5:15])
psf -= bg
data -= bg
"""Resize to a more manageable size to do reconstruction on.
Because resizing is downsampling, it is subject to aliasing
(artifacts produced by the periodic nature of sampling). Demosaicing is an attempt
to account for/reduce the aliasing caused. In this application, we do the simplest
possible demosaicing algorithm: smoothing/blurring the image with a box filter"""
def resize(img, factor):
num = int(-np.log2(factor))
for i in range(num):
img = 0.25*(img[::2,::2,...]+img[1::2,::2,...]+img[::2,1::2,...]+img[1::2,1::2,...])
return img
psf = resize(psf, f)
data = resize(data, f)
"""Now we normalize the images so they have the same total power. Technically not a
necessary step, but the optimal hyperparameters are a function of the total power in
the PSF (among other things), so it makes sense to standardize it"""
psf /= np.linalg.norm(psf.ravel())
data /= np.linalg.norm(data.ravel())
if show_im:
fig1 = plt.figure()
plt.imshow(psf, cmap='gray')
plt.title('PSF')
display.display(fig1)
fig2 = plt.figure()
plt.imshow(data, cmap='gray')
plt.title('Raw data')
display.display(fig2)
return psf, data
We want to calculate convolutions efficiently. To do this, we use the "fast fourier transform" $\texttt{fft2}$ which computes the Discrete Fourier Transform (DFT). The convolution theorem for DFTs only holds for circular convolutions. We can still recover a linear convolution by first padding the input images then cropping the output of the inverse DFT: \begin{equation} h*x=\mathcal{F}^{-1}[\mathcal{F}[h]\cdot\mathcal{F}[x]] = \texttt{crop}\left[\ \texttt{DFT}^{-1}\left\{\ \texttt{DFT} [\ \texttt{pad}[h]\ ]\cdot\texttt{DFT}[\ \texttt{pad}[x]\ ]\ \right\} \ \right] \end{equation}
Recovering the linear convolution correctly requires that we double the dimensions of our images. To take full advantage of the speed of the $\texttt{fft2}$ algorithm, we actually pad $\texttt{full_size}$, which is the nearest power of two that is larger than that size.
We have chosen $\texttt{full_size}$ in such a way that it provides enough padding to make circular and linear convolutions look the same after being cropped back down to $\texttt{sensor_size}$. That way, the "sensor crop" due to the sensor's finite size and the "fft crop" above are the same, and we just need one crop function.
Along with initialization, we compute $\texttt{H} = \texttt{fft2}(\texttt{hpad})$ and $\texttt{Hadj} = \texttt{H}^*$, which are constant matrices that will be needed to calculate the action of $\measurementmtx$ and $\measurementmtx^H$ at every iteration.
Lastly, we must take into account one more practical difference. In imaging, we often treat the center of the image as the origin of the coordinate system. This is theoretically convenient, but fft algorithms assume the origin of the image is the top left pixel. The magnitude of the fft doesn't change because of this distinction, but the phase does, since it is sensitive to shifts in real space. An example with the simplest function, a delta function, is displayed below. In order to correct this problem, we use $\texttt{ifftshift}$ to move the origin of an image to the top left corner and $\texttt{fftshift}$ to move the origin from the top left corner to the center.
def no_shift():
delta = np.zeros((5,5))
delta[2][2] = 1
fft_mag = np.abs(fft.fft2(delta))
fft_arg = np.angle(fft.fft2(delta))
fig, ax = plt.subplots(nrows=1, ncols=3)
fig.tight_layout()
ax[0].imshow(delta, cmap='gray')
ax[0].set_title('Delta function in \n real space')
ax[1].imshow(fft_mag,vmin=-3,vmax=3,cmap='gray')
ax[1].set_title('Magnitude of FT of \n a delta function')
ax[2].imshow(fft_arg,vmin=-3,vmax=3,cmap='gray')
ax[2].set_title('Phase of FT of \n delta function')
no_shift()
def shift():
delta = np.zeros((5,5))
delta[2][2] = 1
delta_shifted = fft.ifftshift(delta)
fft_mag = np.abs(fft.fft2(delta_shifted))
fft_arg = np.angle(fft.fft2(delta_shifted))
fig2, ax2 = plt.subplots(nrows=1, ncols=3)
fig2.tight_layout()
ax2[0].imshow(delta_shifted, cmap='gray')
ax2[0].set_title('Delta function shifted in \n real space')
ax2[1].imshow(fft_mag,vmin=-3,vmax=3,cmap='gray')
ax2[1].set_title('Magnitude of FT of a \n shifted delta function')
ax2[2].imshow(fft_arg,vmin=-3,vmax=3,cmap='gray')
ax2[2].set_title('Phase of FT of a \n shifted delta function')
shift()
For this notebook and the ADMM notebook, we follow the following convention so we don't have to worry about this issue again:
def initMatrices(h):
pixel_start = (np.max(h) + np.min(h))/2
x = np.ones(h.shape)*pixel_start
init_shape = h.shape
padded_shape = [nextPow2(2*n - 1) for n in init_shape]
starti = (padded_shape[0]- init_shape[0])//2
endi = starti + init_shape[0]
startj = (padded_shape[1]//2) - (init_shape[1]//2)
endj = startj + init_shape[1]
hpad = np.zeros(padded_shape)
hpad[starti:endi, startj:endj] = h
H = fft.fft2(fft.ifftshift(hpad), norm="ortho")
Hadj = np.conj(H)
def crop(X):
return X[starti:endi, startj:endj]
def pad(v):
vpad = np.zeros(padded_shape).astype(np.complex64)
vpad[starti:endi, startj:endj] = v
return vpad
utils = [crop, pad]
v = np.real(pad(x))
return H, Hadj, v, utils
def nextPow2(n):
return int(2**np.ceil(np.log2(n)))
The most important step in Gradient Descent is calculating the gradient $$ \nabla_\imagevec \ g(\imagevec) = \full^H (\full\imagevec-\measurementvec)$$ We do this in 2 steps:
Here, $\texttt{vk}$ is the current padded estimate of the scene and $\texttt{b}$ is the sensor measurement.
def grad(Hadj, H, vk, b, crop, pad):
Av = calcA(H, vk, crop)
diff = Av - b
return np.real(calcAHerm(Hadj, diff, pad))
We write $\full$ as: $$ \full\imagevec \iff \mathrm{crop} \left[ \mathcal{F}^{-1} \left\{\mathcal{F}(h) \cdot \mathcal{F}(v)\right\} \right]$$ In code, this becomes \begin{align*} \texttt{calcA}(\texttt{vk}) & = \texttt{crop}\ (\texttt{ifft}\ (\texttt{fft}(\texttt{hpad}) \cdot \texttt{fft}(\texttt{vk})\ )\ )\\ & = \texttt{crop}\ (\texttt{ifft}\ (\texttt{H} \cdot \texttt{Vk})) \end{align*} where $\cdot$ represents point-wise multiplication
def calcA(H, vk, crop):
Vk = fft.fft2(fft.ifftshift(vk))
return crop(fft.fftshift(fft.ifft2(H*Vk)))
We first pad $\texttt{diff}$ , giving us $\texttt{xpad}$, then we take the 2D fourier transform, $\texttt{X} = \mathcal{F}(\texttt{xpad})$. The action of the adjoint of $A$ is
$$ A^H \mathbf{x} \iff \mathcal{F}^{-1} \left\{ \mathcal{F}(\psf)^* \cdot \mathcal{F}( \operatorname{pad}\left[x\right]) \right\}$$ This becomes \begin{align*} \texttt{calcAHerm}(\texttt{xk}) &= \texttt{ifft}\ (\ (\texttt{fft}(\texttt{h}))^H \cdot \texttt{fft}\ (\texttt{pad}(\texttt{diff}))\ ) \\ & = \texttt{ifft}\ (\texttt{Hadj} \cdot \texttt{X}) \end{align*}
def calcAHerm(Hadj, diff, pad):
xpad = pad(diff)
X = fft.fft2(fft.ifftshift(xpad))
return fft.fftshift(fft.ifft2(Hadj*X))
This is the main function, which calculates the gradients and updates our estimation of the scene: \begin{align*} &\imagevec_0 = \text{ anything} \\ &\text{for } k = 0 \text{ to num_iters:}\\ &\quad \quad \imagevec_{k+1} \leftarrow \texttt{gradient_update}(\imagevec_k) \\ \end{align*}
There are different ways of doing the gradient update. The three we will show are regular GD, Nesterov momentum update, and FISTA.
To guarantee convergence, we set the step size to be \begin{align*} \alpha_k &< \frac{2}{\|\measurementmtx^H \measurementmtx\|}_2 \approx \frac{2}{\lambda_{max}(\mathbf{M}^H\mathbf{M})} \end{align*}
To calculate this, we use the property that $\mathbf{M}$ is diagonalizable by a Fourier Transform: $$\begin{align*} \mathbf{M}^H\mathbf{M} &= \left(\mathbf{F}^{-1} \mathrm{diag}(\mathbf{Fh}) \ \mathbf{F}\right)^H \ \mathbf{F}^{-1} \mathrm{diag}(\mathbf{Fh}) \ \mathbf{F} \\ &= \mathbf{F}^{-1} \mathrm{diag}(\mathbf{Fh})^* \ \mathrm{diag}(\mathbf{Fh}) \ \mathbf{F} \\ \lambda_{max}(\mathbf{M}^H\mathbf{M}) &= \max \left(\mathrm{diag}(\mathbf{Fh})^* \ \mathrm{diag}(\mathbf{Fh}) \right) \end{align*}$$ In code, we have \begin{align*} \alpha = \frac{1.8}{\texttt{max} \left(\texttt{Hadj} \cdot \texttt{H}\right)} \end{align*}
Since we are dealing with images, one constraint on the reconstructed image $\imagevec_k$ is that all the entries have to be non-negative. We do this by doing projected gradient descent. The projection function $\texttt{proj}$ we use is non-negativity, which projects $\texttt{vk}$ onto the non-negative halfspace.
def grad_descent(h, b):
H, Hadj, v, utils = initMatrices(h)
crop = utils[0]
pad = utils[1]
alpha = np.real(1.8/(np.max(Hadj * H)))
iterations = 0
def non_neg(xi):
xi = np.maximum(xi,0)
return xi
#proj = lambda x:x #Do no projection
proj = non_neg #Enforce nonnegativity at every gradient step. Comment out as needed.
parent_var = [H, Hadj, b, crop, pad, alpha, proj]
vk = v
#### uncomment for Nesterov momentum update ####
#p = 0
#mu = 0.9
################################################
#### uncomment for FISTA update ################
tk = 1
xk = v
################################################
for iterations in range(iters):
# uncomment for regular GD update
#vk = gd_update(vk, parent_var)
# uncomment for Nesterov momentum update
#vk, p = nesterov_update(vk, p, mu, parent_var)
# uncomment for FISTA update
vk, tk, xk = fista_update(vk, tk, xk, parent_var)
if iterations % 10 == 0:
image = proj(crop(vk))
f = plt.figure(1)
plt.imshow(image, cmap='gray')
plt.title('Reconstruction after iteration {}'.format(iterations))
display.display(f)
display.clear_output(wait=True)
return proj(crop(vk))
Regular gradient descent is simply following the negative of the gradient until we reach the minimum:
\begin{align*} & \texttt{gradient_update}(\imagevec_k): \\ &\quad \quad\imagevec'_{k+1} \leftarrow \imagevec_k - \alpha_k \full^H(\full\imagevec_k - \measurementvec) \\ &\quad \quad \imagevec_{k+1} \leftarrow \operatorname{proj}_{\imagevec \geq 0} (\imagevec_{k+1}') \end{align*}
def gd_update(vk, parent_var):
H, Hadj, b, crop, pad, alpha, proj = parent_var
gradient = grad(Hadj, H, vk, b, crop, pad)
vk -= alpha*gradient
vk = proj(vk)
return xk
GD works but it's slow. One way to speed it up is to consider a velocity term, $\mathbf{p}$. Each update becomes:
\begin{align*} &\texttt{gradient_update}(\imagevec_k): \\ &\qquad \mathbf{p}_{k+1} \leftarrow \mu \mathbf{p}_k - \alpha_k \texttt{grad}(\imagevec_k) \\ &\qquad \imagevec_{k+1}' \leftarrow \imagevec_k - \mu \mathbf{p}_{k} + (1+\mu)\mathbf{p}_{k+1} \\ &\qquad \imagevec_{k+1} \leftarrow \operatorname{proj}_{\imagevec \geq 0} (\imagevec_{k+1}') \end{align*}
The parameter $\mu$ is called the momentum and is strictly between 0 and 1.
def nesterov_update(vk, p, mu, parent_var):
H, Hadj, b, crop, pad, alpha, proj = parent_var
p_prev = p
gradient = grad(Hadj, H, vk, b, crop, pad)
p = mu*p - alpha*gradient
vk += -mu*p_prev + (1+mu)*p
vk = proj(vk)
return vk, p
Instead of Nesterov momentum, we can use FISTA, which speeds up the iterative process. Each iteration of the algorithm is as follows:
\begin{align*} &\texttt{gradient_update}(\imagevec_k):\\ &\qquad \imagevec_k \leftarrow \imagevec_k - \alpha_k \texttt{grad}(\imagevec_k) \\ &\qquad x_{k} \leftarrow \texttt{proj}(\imagevec_k) \\ &\qquad t_{k+1} \leftarrow \frac{1+\sqrt{1+4t_k^2}}{2} \\ &\qquad \imagevec_{k+1} \leftarrow x_{k} + \frac{t_k-1}{t_{k+1}} (x_{k}-x_{k-1}) \\ \end{align*}
def fista_update(vk, tk, xk, parent_var):
H, Hadj, b, crop, pad, alpha, proj = parent_var
x_k1 = xk
gradient = grad(Hadj, H, vk, b, crop, pad)
vk -= alpha*gradient
xk = proj(vk)
t_k1 = (1+np.sqrt(1+4*tk**2))/2
vk = xk+(tk-1)/t_k1*(xk - x_k1)
tk = t_k1
return vk, tk, xk
psf, data = loaddata()
final_im = grad_descent(psf, data)
plt.imshow(final_im, cmap='gray')
plt.title('Final reconstruction after {} iterations'.format(iters))
display.display()