Dropout - Regularization

Source files in EpyNN/epynn/dropout/.

See Appendix - Notations for mathematical conventions.

Layer architecture

Dropout

A Dropout layer k is an object used for regularization of a Neural Network model and aiming at diminishing the overfitting of the model to the training data. It takes a drop_prob and axis argument upon instantiation. The first describes the probability to drop - set to zero - input values forwarded to the next layer k+1 while the latter selects the axes on which the operation should be applied.

class epynn.dropout.models.Dropout(drop_prob=0.5, axis=())[source]

Bases: epynn.commons.models.Layer

Definition of a dropout layer prototype.

Parameters
  • drop_prob (float, optional) – Probability to drop one data point from previous layer to next layer, defaults to 0.5.

  • axis (int or tuple[int], optional) – Compute and apply dropout mask along defined axis, defaults to all axis.

Shapes

Dropout.compute_shapes(A)[source]

Wrapper for epynn.dropout.parameters.dropout_compute_shapes().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

def dropout_compute_shapes(layer, A):
    """Compute forward shapes and dimensions from input for layer.
    """
    X = A    # Input of current layer

    layer.fs['X'] = X.shape    # (m, .. )

    layer.d['m'] = layer.fs['X'][0]          # Number of samples  (m)
    layer.d['n'] = X.size // layer.d['m']    # Number of features (n)

    # Shape for dropout mask
    layer.fs['D'] = [1 if ax in layer.d['a'] else layer.fs['X'][ax]
                     for ax in range(X.ndim)]

    return None

Within a Dropout layer, shapes of interest include:

  • Input X of shape (m, …) with m equal to the number of samples. The number of input dimensions is unknown a priori.

  • The number of features n per sample can still be determined formally: it is equal to the size of the input X divided by the number of samples m.

  • The shape of the dropout mask D defined by the user-defined axes a.

Note that:

  • The Dropout operation is applied on all axis by defaults. Given an input of shape (m, …), the shape of the dropout mask is therefore identical with default settings.

  • For input X of shape (m, n) and axis=(1), the shape of D is equal to (m, 1). The Dropout operation will set to zero entire columns instead of single values but still with respect to drop_prob.

Forward

Dropout.forward(A)[source]

Wrapper for epynn.dropout.forward.dropout_forward().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray

def dropout_forward(layer, A):
    """Forward propagate signal to next layer.
    """
    # (1) Initialize cache
    X = initialize_forward(layer, A)

    # (2) Generate dropout mask
    D = layer.np_rng.uniform(0, 1, layer.fs['D'])

    # (3) Apply a step function with respect to drop_prob (k)
    D = layer.fc['D'] = (D > layer.d['d'])

    # (4) Drop data points
    A = X * D

    # (5) Scale up signal
    A /= (1 - layer.d['d'])

    return A    # To next layer

The forward propagation function in a Dropout layer k includes:

  • (1): Input X in current layer k is equal to the output A of previous layer k-1.

  • (2): Generate the dropout mask D from random uniform distribution.

  • (3): Evaluate each value in D against the dropout probability d. Values in D lower than d are set to 0 while others are set to 1.

  • (4): A is the product between input X and dropout mask D. In words, values in A are set to 0 with respect to dropout probability d.

  • (5): Values in A are scaled-up with respect to fraction of data points that were set to zero by the dropout mask D.

\[\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & Let&&D~be~a~matrix~of~the~same~dimensions~as~x^{k}_{mn}: \\ \\ & D &&= [d_{mn}] \in \mathbb{R}^{M \times N}, \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn} \sim \mathcal{U}[0,1] \tag{2} \\ \\ & D' &&= [d_{mn}'] \in \mathbb{R}^{M \times N}, d \in [0,1] \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn}' = \begin{cases} 1, & d_{mn} > d \\ 0, & d_{mn} \le d \end{cases} \tag{3} \\ \\ & a^{k}_{mn} &&= x^{k}_{mn} * D' \\ & &&~~~* \frac{1}{(1-d)} \tag{4} \end{alignat*}\end{split}\]

Backward

Dropout.backward(dX)[source]

Wrapper for epynn.dropout.backward.dropout_backward().

Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray

def dropout_backward(layer, dX):
    """Backward propagate error gradients to previous layer.
    """
    # (1) Initialize cache
    dA = initialize_backward(layer, dX)

    # (2) Apply the dropout mask used in the forward pass
    dX = dA * layer.fc['D']

    # (3) Scale up gradients
    dX /= (1 - layer.d['d'])

    layer.bc['dX'] = dX

    return dX    # To previous layer

The backward propagation function in a Dropout layer k includes:

  • (1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

  • (2): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is equal to the product between dA and the dropout mask D.

  • (3): Values in dX are scaled-up with respect to fraction of data point that were set to zero by the dropout mask D.

\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mn} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mn}} \tag{1} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} * D'^{k}_{mn} * \frac{1}{(1-d)} \tag{2} \\ \end{alignat*}\end{split}\]

Gradients

Dropout.compute_gradients()[source]

Wrapper for epynn.dropout.parameters.dropout_compute_gradients(). Dummy method, there are no gradients to compute in layer.

The Dropout layer is not a trainable layer. It has no trainable parameters such as weight W or bias b. Therefore, there is no parameters gradients to compute.