Dropout - Regularization

Source files in EpyNN/epynn/dropout/.

See Appendix - Notations for mathematical conventions.

Layer architecture

A Dropout layer k is an object used for regularization of a Neural Network model and aiming at diminishing the overfitting of the model to the training data. It takes a drop_prob and axis argument upon instantiation. The first describes the probability to drop - set to zero - input values forwarded to the next layer k+1 while the latter selects the axes on which the operation should be applied.

class epynn.dropout.models.Dropout(drop_prob=0.5, axis=())[source]

Bases: epynn.commons.models.Layer

Definition of a dropout layer prototype.

Parameters

drop_prob (float, optional) – Probability to drop one data point from previous layer to next layer, defaults to 0.5.
axis (int or tuple[int], optional) – Compute and apply dropout mask along defined axis, defaults to all axis.

Shapes

Dropout.compute_shapes(A)[source]
Wrapper for epynn.dropout.parameters.dropout_compute_shapes().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.
def dropout_compute_shapes(layer, A):
    """Compute forward shapes and dimensions from input for layer.
    """
    X = A    # Input of current layer

    layer.fs['X'] = X.shape    # (m, .. )

    layer.d['m'] = layer.fs['X'][0]          # Number of samples  (m)
    layer.d['n'] = X.size // layer.d['m']    # Number of features (n)

    # Shape for dropout mask
    layer.fs['D'] = [1 if ax in layer.d['a'] else layer.fs['X'][ax]
                     for ax in range(X.ndim)]

    return None
Within a Dropout layer, shapes of interest include:

Input X of shape (m, …) with m equal to the number of samples. The number of input dimensions is unknown a priori.

The number of features n per sample can still be determined formally: it is equal to the size of the input X divided by the number of samples m.

The shape of the dropout mask D defined by the user-defined axes a.

Note that:

The Dropout operation is applied on all axis by defaults. Given an input of shape (m, …), the shape of the dropout mask is therefore identical with default settings.

For input X of shape (m, n) and axis=(1), the shape of D is equal to (m, 1). The Dropout operation will set to zero entire columns instead of single values but still with respect to drop_prob.

Forward

Dropout.forward(A)[source]
Wrapper for epynn.dropout.forward.dropout_forward().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray
def dropout_forward(layer, A):
    """Forward propagate signal to next layer.
    """
    # (1) Initialize cache
    X = initialize_forward(layer, A)

    # (2) Generate dropout mask
    D = layer.np_rng.uniform(0, 1, layer.fs['D'])

    # (3) Apply a step function with respect to drop_prob (k)
    D = layer.fc['D'] = (D > layer.d['d'])

    # (4) Drop data points
    A = X * D

    # (5) Scale up signal
    A /= (1 - layer.d['d'])

    return A    # To next layer
The forward propagation function in a Dropout layer k includes:

(1): Input X in current layer k is equal to the output A of previous layer k-1.

(2): Generate the dropout mask D from random uniform distribution.

(3): Evaluate each value in D against the dropout probability d. Values in D lower than d are set to 0 while others are set to 1.

(4): A is the product between input X and dropout mask D. In words, values in A are set to 0 with respect to dropout probability d.

(5): Values in A are scaled-up with respect to fraction of data points that were set to zero by the dropout mask D.

\[\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & Let&&D~be~a~matrix~of~the~same~dimensions~as~x^{k}_{mn}: \\ \\ & D &&= [d_{mn}] \in \mathbb{R}^{M \times N}, \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn} \sim \mathcal{U}[0,1] \tag{2} \\ \\ & D' &&= [d_{mn}'] \in \mathbb{R}^{M \times N}, d \in [0,1] \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn}' = \begin{cases} 1, & d_{mn} > d \\ 0, & d_{mn} \le d \end{cases} \tag{3} \\ \\ & a^{k}_{mn} &&= x^{k}_{mn} * D' \\ & &&~~~* \frac{1}{(1-d)} \tag{4} \end{alignat*}\end{split}\]

Backward

Dropout.backward(dX)[source]
Wrapper for epynn.dropout.backward.dropout_backward().

Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray
def dropout_backward(layer, dX):
    """Backward propagate error gradients to previous layer.
    """
    # (1) Initialize cache
    dA = initialize_backward(layer, dX)

    # (2) Apply the dropout mask used in the forward pass
    dX = dA * layer.fc['D']

    # (3) Scale up gradients
    dX /= (1 - layer.d['d'])

    layer.bc['dX'] = dX

    return dX    # To previous layer
The backward propagation function in a Dropout layer k includes:

(1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

(2): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is equal to the product between dA and the dropout mask D.

(3): Values in dX are scaled-up with respect to fraction of data point that were set to zero by the dropout mask D.

\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mn} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mn}} \tag{1} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} * D'^{k}_{mn} * \frac{1}{(1-d)} \tag{2} \\ \end{alignat*}\end{split}\]

Gradients

Dropout.compute_gradients()[source]

Wrapper for epynn.dropout.parameters.dropout_compute_gradients(). Dummy method, there are no gradients to compute in layer.

The Dropout layer is not a trainable layer. It has no trainable parameters such as weight W or bias b. Therefore, there is no parameters gradients to compute.

Live examples

You may also like to browse all Network training examples provided with EpyNN.