Neural Radiance Field!

Overview

This project is a fun exploration of the neural field and neural radiance field. In the first part, we fit a neural field to a 2D image. In the second part, we fit a neural radiance field to a 3D scene.

Rendered Images	Rendered Depths

Part 1: Fit a Neural Field to a 2D Image

In this part, we implement and train a neural field to represent a 2D image. More mathematically, we are trying to fit a neural field that: $$ F: {x, y} \rightarrow {r, g, b} $$ where $x, y$ are the coordinates of the image and $r, g, b$ are the RGB values of the image.

Part 1.1: Implement Sinusoidal Positional Encoding for 2D images

We implement the sinusoidal positional encoding for 2D images. The positional encoding is defined as: $$ PE(x) = {x, sin(2^0\pi x), cos(2^0\pi x), sin(2^1\pi x), cos(2^1\pi x), …, sin(2^{L-1}\pi x), cos(2^{L-1}\pi x)} $$ where $L$ is highest frequency and $x$ is the input coordinate. For example, if $L = 10$, the positional encoding for a $2$ dimensional input is of size $2 \times 2 \times L + 2 = 42$.

Part 1.2: Implement a Neural Field Model for 2D images

We implement a neural field model for 2D images. The neural field model is defined as:

More specifically, the parameter $L$ is set to $10$ and the other parameters are set to the same values as in the plot shown above.

Part 1.3: Training the Neural Field Model

We train the neural field model using the Adam optimizer with a learning rate of $0.01$. The loss function is the mean squared error. The model is trained for $1000$ iterations with a batch size of $10000$.

	Image 1	Image 2
Original Image
PSNR
Iter 0
Iter 200
Iter 400
Iter 600
Iter 800
Iter 999

Part 1.4: Hyperparameter Tuning

We perform hyperparameter tuning on the neural field model. The hyperparameters we tune are the the highest frequency $L$ and the number of layers in the neural field model. We turn the highest frequency $L$ in $[6, 8, 10, 12, 14]$ and the number of layers in $[2, 4, 6, 8, 10]$. The best hyperparameters are still $L = 10$ and the number of layers is $4$ and other configurations of hyperparameters either do not differ much from the best hyperparameters or perform worse.

PSNR vs Highest Frequency L for Two Images

Part 2: Fit a Neural Radiance Field from Multi-view Images

In this part, we implement and train a neural radiance field to represent a 3D scene. More mathematically, we are trying to fit a neural radiance field that: $$ F: {x, r} \rightarrow {r, g, b, \sigma} $$ where $x$ is the 3D coordinate of the scene, $r$ is the ray direction, and $r, g, b$ are the RGB values and $\sigma$ is the density.

Part 2.1: Create Rays from Cameras

We first need to transform a point from camera to the world space. This is implemented in the function transform. This function transforms points from camera coordinates to world coordinates by applying a camera-to-world transformation matrix.

We then implement another function pixel_to_camera that transform a point from the pixel coordinate system back to the camera coordinate system using the camera intrinsic matrix.

We then implement the function pixel_to_ray that computes ray origins and directions from pixel coordinates by first mapping the pixel coordinates to camera coordinates using the camera’s intrinsic matrix. These camera coordinates are then transformed to world coordinates using the camera-to-world transformation matrix. The ray origins are set as the camera’s position in the world space, while the ray directions are computed as vectors pointing from the camera position to the transformed points in the world space.

Part 2.2 & 2.3: Sampling and Data Loading

We first need to implement the data loading class RaysData that samples rays from images. It prepares a grid of UV coordinates, offset by 0.5 to account for the pixel center, and computes ray origins and directions using the camera intrinsics and extrinsics.

We then implement the function sample_along_rays that discretizes each ray into a fixed number of points between near and far bounds. To improve coverage and prevent overfitting, we need to introduce random perturbations to the sampling locations during training. This combination of uniformly spaced and perturbed sampling ensures diverse and robust coverage of the 3D scene for the subsequent NeRF training.

Part 2.4: Neural Radiance Field

We implement the neural radiance field model. The neural radiance field model is defined as:

Neural Radiance Field Model Architecture

More specifically, the highest frequency for positional encoding of the 3D coordinates is set to $10$ and the highest frequency for positional encoding of the ray directions is set to $4$. The other parameters are set to the same values as in the plot shown above.

Part 2.5: Volume Rendering

To render the 3D scene, we implemented function volrend that computes the discrete approximation of the volume rendering equation, which is defined as: $$ \hat{C}(\mathbf{r})=\sum_{i=1}^N T_i\left(1-\exp \left(-\sigma_i \delta_i\right)\right) \mathbf{c}_i, \text { where } T_i=\exp \left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right) $$ where $c_i$ is the color of the $i$th sample and $T_i$ is the probability of a ray not terminating before sample location $i$.

With this function, we can then train the neural radiance field model using the Adam optimizer with a learning rate of $0.0005$. The loss function is the mean squared error of the real pixel values and the rendered pixel values. The model is trained for $1500$ iterations with a batch size of $10000$. The following are the rendered validation views during training.

	View 1	View 2
Iter 100
Iter 200
Iter 300
Iter 400
Iter 500
Iter 600
Iter 700
Iter 800
Iter 1500

The PSNR of the Validation Views during Training (All greater than 23 after 1500 iterations)

With the trained neural radiance field model, we can render the 3D scene on the test dataset.

A Spherical Rendering using the Provided Cameras Extrinsics in the Test Dataset

Bells & Whistles: Render the Depths Map Video

We can modify the volume rendering process to compute a depth map by replacing the rgb values with the depth of points along the ray in the volume rendering equation.

Rendered Depths of the 3D Scene with Neural Radiance Field

Overview#

Part 1: Fit a Neural Field to a 2D Image#

Part 1.1: Implement Sinusoidal Positional Encoding for 2D images#

Part 1.2: Implement a Neural Field Model for 2D images#

Part 1.3: Training the Neural Field Model#

Part 1.4: Hyperparameter Tuning#

Part 2: Fit a Neural Radiance Field from Multi-view Images#

Part 2.1: Create Rays from Cameras#

Part 2.2 & 2.3: Sampling and Data Loading#

Part 2.4: Neural Radiance Field#

Part 2.5: Volume Rendering#

Bells & Whistles: Render the Depths Map Video#