Fully Connected (Dense)

Source files in EpyNN/epynn/dense/.

See Appendix - Notations for mathematical conventions.

Layer architecture

Dense

A fully-connected or Dense layer is an object containing a number of units and provided with functions for parameters initialization and non-linear activation of inputs.

class epynn.dense.models.Dense(units=1, activate=<function sigmoid>, initialization=<function xavier>, se_hPars=None)[source]

Bases: epynn.commons.models.Layer

Definition of a dense layer prototype.

Parameters
  • units (int, optional) – Number of units in dense layer, defaults to 1.

  • activate (function, optional) – Non-linear activation of units, defaults to sigmoid.

  • initialization (function, optional) – Weight initialization function for dense layer, defaults to xavier.

  • se_hPars (dict[str, str or float] or NoneType, optional) – Layer hyper-parameters, defaults to None and inherits from model.

Shapes

Dense.compute_shapes(A)[source]

Wrapper for epynn.dense.parameters.dense_compute_shapes().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

def dense_compute_shapes(layer, A):
    """Compute forward shapes and dimensions from input for layer.
    """
    X = A    # Input of current layer

    layer.fs['X'] = X.shape    # (m, n)

    layer.d['m'] = layer.fs['X'][0]    # Number of samples  (m)
    layer.d['n'] = layer.fs['X'][1]    # Number of features (n)

    # Shapes for trainable parameters              Units (u)
    layer.fs['W'] = (layer.d['n'], layer.d['u'])    # (n, u)
    layer.fs['b'] = (1, layer.d['u'])               # (1, u)

    return None

Within a Dense layer, shapes of interest include:

  • Input X of shape (m, n) with m equal to the number of samples and n the number of features per sample.

  • Weight W of shape (n, u) with n the number of features per sample and u the number of units in the current layer k.

  • Bias b of shape (1, u) with u the number of units in the layer.

Note that:

  • Parameters shape for W and b is independent from the number of samples m.

  • The number of features n per sample may be expressed in this context as the number of units in the previous layer k-1, even though this definition may tend to be less general.

_images/Dense1-01.svg

Forward

Dense.forward(A)[source]

Wrapper for epynn.dense.forward.dense_forward().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray

def dense_forward(layer, A):
    """Forward propagate signal to next layer.
    """
    # (1) Initialize cache
    X = initialize_forward(layer, A)

    # (2) Linear activation X -> Z
    Z = layer.fc['Z'] = (
        np.dot(X, layer.p['W'])
        + layer.p['b']
    )   # This is the weighted sum

    # (3) Non-linear activation Z -> A
    A = layer.fc['A'] = layer.activate(Z)

    return A    # To next layer

The forward propagation function in a Dense layer k includes:

  • (1): Input X in current layer k is equal to the output A of previous layer k-1.

  • (2): Z is computed by applying a dot product operation between X and W, on which the bias b is added.

  • (3): Output A is computed by applying a non-linear activation function on Z.

Note that:

  • Z may be referred to as the (biased) weighted sum of inputs by parameters or as the linear activation product.

  • A may be referred to as the non-linear activation product or simply the output of Dense layer k.

_images/Dense2-01.svg
\[\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & z^{k}_{mu} &&= x^{k}_{mn} \cdot W^{k}_{nu} \\ & &&+ b^{k}_{u} \tag{2} \\ & a^{k}_{mu} &&= a_{act}(z^{k}_{mu}) \tag{3} \end{alignat*}\end{split}\]

Backward

Dense.backward(dX)[source]

Wrapper for epynn.dense.backward.dense_backward().

Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray

def dense_backward(layer, dX):
    """Backward propagate error gradients to previous layer.
    """
    # (1) Initialize cache
    dA = initialize_backward(layer, dX)

    # (2) Gradient of the loss with respect to Z
    dZ = layer.bc['dZ'] = hadamard(
         dA,
         layer.activate(layer.fc['Z'], deriv=True)
    )    # dL/dZ

    # (3) Gradient of the loss with respect to X
    dX = layer.bc['dX'] = np.dot(dZ, layer.p['W'].T)   # dL/dX

    return dX    # To previous layer

The backward propagation function in a Dense layer k includes:

  • (1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

  • (2): dZ is the gradient of the loss with respect to Z. It is computed by applying element-wise multiplication between dA and the derivative of the non-linear activation function applied on Z.

  • (3): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is computed by applying a dot product operation between dZ and the transpose of W.

Note that:

  • The expression gradient of the loss with respect to is equivalent to partial derivative of the loss with respect to.

  • The variable dA is often referred to as the error term for layer k+1 and dX the error term for layer k.

  • In contrast to the forward pass, parameters are used to weight dZ with shape (m, u). Therefore, we use the transpose of W with shape (u, n) in order to compute the dot product.

_images/Dense3-01.svg
\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mu} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mu}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mu}} \tag{1} \\ \\ & \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} &&= \delta^{\kp}_{mu} \\ & &&* a_{act}'(z^{k}_{mu}) \tag{2} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \cdot W^{k~{\intercal}}_{nu} \tag{3} \\ \end{alignat*}\end{split}\]

Gradients

Dense.compute_gradients()[source]

Wrapper for epynn.dense.parameters.dense_compute_gradients().

def dense_compute_gradients(layer):
    """Compute gradients with respect to weight and bias for layer.
    """
    X = layer.fc['X']      # Input of forward propagation
    dZ = layer.bc['dZ']    # Gradient of the loss with respect to Z

    # (1) Gradient of the loss with respect to W, b
    dW = layer.g['dW'] = np.dot(X.T, dZ)       # (1.1) dL/dW
    db = layer.g['db'] = np.sum(dZ, axis=0)    # (1.2) dL/db

    return None

The function to compute parameter gradients in a Dense layer k includes:

  • (1.1): dW is the gradient of the loss with respect to W. It is computed by applying a dot product operation between the transpose of X and dZ.

  • (1.2): db is the gradient of the loss with respect to b. It is computed by summing dZ along the axis corresponding to the number of samples m.

Note that:

  • We use the transpose of X with shape (n, m) for the dot product operation with dZ of shape (m, u).

\[\begin{split}\begin{alignat*}{2} & \frac{\partial \mathcal{L}}{\partial W^{k}_{nu}} &&= x^{k~{\intercal}}_{mn} \cdot \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.1} \\ & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{m = 1}^M \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.2} \end{alignat*}\end{split}\]

Live examples

The Dense layer is used as an output layer in every Network training examples provided with EpyNN

Examples of pure Feed-Forward Neural Networks within these examples can be directly accessed from: