Dropout - Regularization
Source files in EpyNN/epynn/dropout/
.
See Appendix - Notations for mathematical conventions.
Layer architecture
A Dropout layer k is an object used for regularization of a Neural Network model and aiming at diminishing the overfitting of the model to the training data. It takes a drop_prob and axis argument upon instantiation. The first describes the probability to drop - set to zero - input values forwarded to the next layer k+1 while the latter selects the axes on which the operation should be applied.
- class epynn.dropout.models.Dropout(drop_prob=0.5, axis=())[source]
Bases:
epynn.commons.models.Layer
Definition of a dropout layer prototype.
Shapes
- Dropout.compute_shapes(A)[source]
Wrapper for
epynn.dropout.parameters.dropout_compute_shapes()
.
- Parameters
A (
numpy.ndarray
) – Output of forward propagation from previous layer.def dropout_compute_shapes(layer, A): """Compute forward shapes and dimensions from input for layer. """ X = A # Input of current layer layer.fs['X'] = X.shape # (m, .. ) layer.d['m'] = layer.fs['X'][0] # Number of samples (m) layer.d['n'] = X.size // layer.d['m'] # Number of features (n) # Shape for dropout mask layer.fs['D'] = [1 if ax in layer.d['a'] else layer.fs['X'][ax] for ax in range(X.ndim)] return NoneWithin a Dropout layer, shapes of interest include:
Input X of shape (m, …) with m equal to the number of samples. The number of input dimensions is unknown a priori.
The number of features n per sample can still be determined formally: it is equal to the size of the input X divided by the number of samples m.
The shape of the dropout mask D defined by the user-defined axes a.
Note that:
The Dropout operation is applied on all axis by defaults. Given an input of shape (m, …), the shape of the dropout mask is therefore identical with default settings.
For input X of shape (m, n) and axis=(1), the shape of D is equal to (m, 1). The Dropout operation will set to zero entire columns instead of single values but still with respect to drop_prob.
Forward
- Dropout.forward(A)[source]
Wrapper for
epynn.dropout.forward.dropout_forward()
.
- Parameters
A (
numpy.ndarray
) – Output of forward propagation from previous layer.- Returns
Output of forward propagation for current layer.
- Return type
numpy.ndarray
def dropout_forward(layer, A): """Forward propagate signal to next layer. """ # (1) Initialize cache X = initialize_forward(layer, A) # (2) Generate dropout mask D = layer.np_rng.uniform(0, 1, layer.fs['D']) # (3) Apply a step function with respect to drop_prob (k) D = layer.fc['D'] = (D > layer.d['d']) # (4) Drop data points A = X * D # (5) Scale up signal A /= (1 - layer.d['d']) return A # To next layerThe forward propagation function in a Dropout layer k includes:
(1): Input X in current layer k is equal to the output A of previous layer k-1.
(2): Generate the dropout mask D from random uniform distribution.
(3): Evaluate each value in D against the dropout probability d. Values in D lower than d are set to 0 while others are set to 1.
(4): A is the product between input X and dropout mask D. In words, values in A are set to 0 with respect to dropout probability d.
(5): Values in A are scaled-up with respect to fraction of data points that were set to zero by the dropout mask D.
\[\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & Let&&D~be~a~matrix~of~the~same~dimensions~as~x^{k}_{mn}: \\ \\ & D &&= [d_{mn}] \in \mathbb{R}^{M \times N}, \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn} \sim \mathcal{U}[0,1] \tag{2} \\ \\ & D' &&= [d_{mn}'] \in \mathbb{R}^{M \times N}, d \in [0,1] \\ & &&\forall_{m,n} \in \{1,...M\} \times \{1,...N\} \\ & && d_{mn}' = \begin{cases} 1, & d_{mn} > d \\ 0, & d_{mn} \le d \end{cases} \tag{3} \\ \\ & a^{k}_{mn} &&= x^{k}_{mn} * D' \\ & &&~~~* \frac{1}{(1-d)} \tag{4} \end{alignat*}\end{split}\]
Backward
- Dropout.backward(dX)[source]
Wrapper for
epynn.dropout.backward.dropout_backward()
.
- Parameters
dX (
numpy.ndarray
) – Output of backward propagation from next layer.- Returns
Output of backward propagation for current layer.
- Return type
numpy.ndarray
def dropout_backward(layer, dX): """Backward propagate error gradients to previous layer. """ # (1) Initialize cache dA = initialize_backward(layer, dX) # (2) Apply the dropout mask used in the forward pass dX = dA * layer.fc['D'] # (3) Scale up gradients dX /= (1 - layer.d['d']) layer.bc['dX'] = dX return dX # To previous layerThe backward propagation function in a Dropout layer k includes:
(1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.
(2): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is equal to the product between dA and the dropout mask D.
(3): Values in dX are scaled-up with respect to fraction of data point that were set to zero by the dropout mask D.
\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mn} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mn}} \tag{1} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{k}_{mn}} * D'^{k}_{mn} * \frac{1}{(1-d)} \tag{2} \\ \end{alignat*}\end{split}\]
Gradients
- Dropout.compute_gradients()[source]
Wrapper for
epynn.dropout.parameters.dropout_compute_gradients()
. Dummy method, there are no gradients to compute in layer.The Dropout layer is not a trainable layer. It has no trainable parameters such as weight W or bias b. Therefore, there is no parameters gradients to compute.
Live examples
Protein Modification - LSTM(sequence=True)-(Dense)n with Dropout
Author music - RNN(sequences=True)-Flatten-(Dense)n with Dropout
Author music - GRU(sequences=True)-Flatten-(Dense)n with Dropout
You may also like to browse all Network training examples provided with EpyNN.