# Dropout - Regularization¶

Source files in EpyNN/epynn/dropout/.

See Appendix - Notations for mathematical conventions.

## Layer architecture¶

A Dropout layer k is an object used for regularization of a Neural Network model and aiming at diminishing the overfitting of the model to the training data. It takes a drop_prob and axis argument upon instantiation. The first describes the probability to drop - set to zero - input values forwarded to the next layer k+1 while the latter selects the axes on which the operation should be applied.

class epynn.dropout.models.Dropout(drop_prob=0.5, axis=())[source]

Definition of a dropout layer prototype.

Parameters
• drop_prob (float, optional) – Probability to drop one data point from previous layer to next layer, defaults to 0.5.

• axis (int or tuple[int], optional) – Compute and apply dropout mask along defined axis, defaults to all axis.

### Shapes¶

Dropout.compute_shapes(A)[source]
Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

def dropout_compute_shapes(layer, A):
"""Compute forward shapes and dimensions from input for layer.
"""
X = A    # Input of current layer

layer.fs['X'] = X.shape    # (m, .. )

layer.d['m'] = layer.fs['X'][0]          # Number of samples  (m)
layer.d['n'] = X.size // layer.d['m']    # Number of features (n)

# Shape for dropout mask
layer.fs['D'] = [1 if ax in layer.d['a'] else layer.fs['X'][ax]
for ax in range(X.ndim)]

return None


Within a Dropout layer, shapes of interest include:

• Input X of shape (m, …) with m equal to the number of samples. The number of input dimensions is unknown a priori.

• The number of features n per sample can still be determined formally: it is equal to the size of the input X divided by the number of samples m.

• The shape of the dropout mask D defined by the user-defined axes a.

Note that:

• The Dropout operation is applied on all axis by defaults. Given an input of shape (m, …), the shape of the dropout mask is therefore identical with default settings.

• For input X of shape (m, n) and axis=(1), the shape of D is equal to (m, 1). The Dropout operation will set to zero entire columns instead of single values but still with respect to drop_prob.

### Forward¶

Dropout.forward(A)[source]
Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray

def dropout_forward(layer, A):
"""Forward propagate signal to next layer.
"""
# (1) Initialize cache
X = initialize_forward(layer, A)

# (2) Generate dropout mask
D = layer.np_rng.uniform(0, 1, layer.fs['D'])

# (3) Apply a step function with respect to drop_prob (k)
D = layer.fc['D'] = (D > layer.d['d'])

# (4) Drop data points
A = X * D

# (5) Scale up signal
A /= (1 - layer.d['d'])

return A    # To next layer


The forward propagation function in a Dropout layer k includes:

• (1): Input X in current layer k is equal to the output A of previous layer k-1.

• (2): Generate the dropout mask D from random uniform distribution.

• (3): Evaluate each value in D against the dropout probability d. Values in D lower than d are set to 0 while others are set to 1.

• (4): A is the product between input X and dropout mask D. In words, values in A are set to 0 with respect to dropout probability d.

• (5): Values in A are scaled-up with respect to fraction of data points that were set to zero by the dropout mask D.

\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & Let&&D~be~a~matrix~of~the~same~dimensions~as~x^{k}_{mn}: \\ \\ & D &&= [d_{mn}] \in \mathbb{R}^{M \times N}, \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn} \sim \mathcal{U}[0,1] \tag{2} \\ \\ & D' &&= [d_{mn}'] \in \mathbb{R}^{M \times N}, d \in [0,1] \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn}' = \begin{cases} 1, & d_{mn} > d \\ 0, & d_{mn} \le d \end{cases} \tag{3} \\ \\ & a^{k}_{mn} &&= x^{k}_{mn} * D' \\ & &&~~~* \frac{1}{(1-d)} \tag{4} \end{alignat*}\end{split}

### Backward¶

Dropout.backward(dX)[source]
Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray

def dropout_backward(layer, dX):
"""Backward propagate error gradients to previous layer.
"""
# (1) Initialize cache
dA = initialize_backward(layer, dX)

# (2) Apply the dropout mask used in the forward pass
dX = dA * layer.fc['D']

# (3) Scale up gradients
dX /= (1 - layer.d['d'])

layer.bc['dX'] = dX

return dX    # To previous layer


The backward propagation function in a Dropout layer k includes:

• (1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

• (2): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is equal to the product between dA and the dropout mask D.

• (3): Values in dX are scaled-up with respect to fraction of data point that were set to zero by the dropout mask D.

\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mn} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mn}} \tag{1} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} * D'^{k}_{mn} * \frac{1}{(1-d)} \tag{2} \\ \end{alignat*}\end{split}

Dropout.compute_gradients()[source]
Wrapper for epynn.dropout.parameters.dropout_compute_gradients(). Dummy method, there are no gradients to compute in layer.