Fully Connected (Dense)
Source files in EpyNN/epynn/dense/
.
See Appendix - Notations for mathematical conventions.
Layer architecture
A fully-connected or Dense layer is an object containing a number of units and provided with functions for parameters initialization and non-linear activation of inputs.
- class epynn.dense.models.Dense(units=1, activate=<function sigmoid>, initialization=<function xavier>, se_hPars=None)[source]
Bases:
epynn.commons.models.Layer
Definition of a dense layer prototype.
- Parameters
units (int, optional) – Number of units in dense layer, defaults to 1.
activate (function, optional) – Non-linear activation of units, defaults to sigmoid.
initialization (function, optional) – Weight initialization function for dense layer, defaults to xavier.
se_hPars (dict[str, str or float] or NoneType, optional) – Layer hyper-parameters, defaults to None and inherits from model.
Shapes
- Dense.compute_shapes(A)[source]
Wrapper for
epynn.dense.parameters.dense_compute_shapes()
.
- Parameters
A (
numpy.ndarray
) – Output of forward propagation from previous layer.def dense_compute_shapes(layer, A): """Compute forward shapes and dimensions from input for layer. """ X = A # Input of current layer layer.fs['X'] = X.shape # (m, n) layer.d['m'] = layer.fs['X'][0] # Number of samples (m) layer.d['n'] = layer.fs['X'][1] # Number of features (n) # Shapes for trainable parameters Units (u) layer.fs['W'] = (layer.d['n'], layer.d['u']) # (n, u) layer.fs['b'] = (1, layer.d['u']) # (1, u) return NoneWithin a Dense layer, shapes of interest include:
Input X of shape (m, n) with m equal to the number of samples and n the number of features per sample.
Weight W of shape (n, u) with n the number of features per sample and u the number of units in the current layer k.
Bias b of shape (1, u) with u the number of units in the layer.
Note that:
Parameters shape for W and b is independent from the number of samples m.
The number of features n per sample may be expressed in this context as the number of units in the previous layer k-1, even though this definition may tend to be less general.
Forward
- Dense.forward(A)[source]
Wrapper for
epynn.dense.forward.dense_forward()
.
- Parameters
A (
numpy.ndarray
) – Output of forward propagation from previous layer.- Returns
Output of forward propagation for current layer.
- Return type
numpy.ndarray
def dense_forward(layer, A): """Forward propagate signal to next layer. """ # (1) Initialize cache X = initialize_forward(layer, A) # (2) Linear activation X -> Z Z = layer.fc['Z'] = ( np.dot(X, layer.p['W']) + layer.p['b'] ) # This is the weighted sum # (3) Non-linear activation Z -> A A = layer.fc['A'] = layer.activate(Z) return A # To next layerThe forward propagation function in a Dense layer k includes:
(1): Input X in current layer k is equal to the output A of previous layer k-1.
(2): Z is computed by applying a dot product operation between X and W, on which the bias b is added.
(3): Output A is computed by applying a non-linear activation function on Z.
Note that:
Z may be referred to as the (biased) weighted sum of inputs by parameters or as the linear activation product.
A may be referred to as the non-linear activation product or simply the output of Dense layer k.
\[\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & z^{k}_{mu} &&= x^{k}_{mn} \cdot W^{k}_{nu} \\ & &&+ b^{k}_{u} \tag{2} \\ & a^{k}_{mu} &&= a_{act}(z^{k}_{mu}) \tag{3} \end{alignat*}\end{split}\]
Backward
- Dense.backward(dX)[source]
Wrapper for
epynn.dense.backward.dense_backward()
.
- Parameters
dX (
numpy.ndarray
) – Output of backward propagation from next layer.- Returns
Output of backward propagation for current layer.
- Return type
numpy.ndarray
def dense_backward(layer, dX): """Backward propagate error gradients to previous layer. """ # (1) Initialize cache dA = initialize_backward(layer, dX) # (2) Gradient of the loss with respect to Z dZ = layer.bc['dZ'] = hadamard( dA, layer.activate(layer.fc['Z'], deriv=True) ) # dL/dZ # (3) Gradient of the loss with respect to X dX = layer.bc['dX'] = np.dot(dZ, layer.p['W'].T) # dL/dX return dX # To previous layerThe backward propagation function in a Dense layer k includes:
(1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.
(2): dZ is the gradient of the loss with respect to Z. It is computed by applying element-wise multiplication between dA and the derivative of the non-linear activation function applied on Z.
(3): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is computed by applying a dot product operation between dZ and the transpose of W.
Note that:
The expression gradient of the loss with respect to is equivalent to partial derivative of the loss with respect to.
The variable dA is often referred to as the error term for layer k+1 and dX the error term for layer k.
In contrast to the forward pass, parameters are used to weight dZ with shape (m, u). Therefore, we use the transpose of W with shape (u, n) in order to compute the dot product.
\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mu} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mu}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mu}} \tag{1} \\ \\ & \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} &&= \delta^{\kp}_{mu} \\ & &&* a_{act}'(z^{k}_{mu}) \tag{2} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \cdot W^{k~{\intercal}}_{nu} \tag{3} \\ \end{alignat*}\end{split}\]
Gradients
- Dense.compute_gradients()[source]
Wrapper for
epynn.dense.parameters.dense_compute_gradients()
.def dense_compute_gradients(layer): """Compute gradients with respect to weight and bias for layer. """ X = layer.fc['X'] # Input of forward propagation dZ = layer.bc['dZ'] # Gradient of the loss with respect to Z # (1) Gradient of the loss with respect to W, b dW = layer.g['dW'] = np.dot(X.T, dZ) # (1.1) dL/dW db = layer.g['db'] = np.sum(dZ, axis=0) # (1.2) dL/db return NoneThe function to compute parameter gradients in a Dense layer k includes:
(1.1): dW is the gradient of the loss with respect to W. It is computed by applying a dot product operation between the transpose of X and dZ.
(1.2): db is the gradient of the loss with respect to b. It is computed by summing dZ along the axis corresponding to the number of samples m.
Note that:
We use the transpose of X with shape (n, m) for the dot product operation with dZ of shape (m, u).
\[\begin{split}\begin{alignat*}{2} & \frac{\partial \mathcal{L}}{\partial W^{k}_{nu}} &&= x^{k~{\intercal}}_{mn} \cdot \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.1} \\ & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{m = 1}^M \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.2} \end{alignat*}\end{split}\]
Live examples
The Dense layer is used as an output layer in every Network training examples provided with EpyNN
Examples of pure Feed-Forward Neural Networks within these examples can be directly accessed from: