# Fully Connected (Dense)

Source files in EpyNN/epynn/dense/.

See Appendix - Notations for mathematical conventions.

## Layer architecture

A fully-connected or Dense layer is an object containing a number of units and provided with functions for parameters initialization and non-linear activation of inputs.

class epynn.dense.models.Dense(units=1, activate=<function sigmoid>, initialization=<function xavier>, se_hPars=None)[source]

Definition of a dense layer prototype.

Parameters
• units (int, optional) – Number of units in dense layer, defaults to 1.

• activate (function, optional) – Non-linear activation of units, defaults to sigmoid.

• initialization (function, optional) – Weight initialization function for dense layer, defaults to xavier.

• se_hPars (dict[str, str or float] or NoneType, optional) – Layer hyper-parameters, defaults to None and inherits from model.

### Shapes

Dense.compute_shapes(A)[source]
Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

def dense_compute_shapes(layer, A):
"""Compute forward shapes and dimensions from input for layer.
"""
X = A    # Input of current layer

layer.fs['X'] = X.shape    # (m, n)

layer.d['m'] = layer.fs['X'][0]    # Number of samples  (m)
layer.d['n'] = layer.fs['X'][1]    # Number of features (n)

# Shapes for trainable parameters              Units (u)
layer.fs['W'] = (layer.d['n'], layer.d['u'])    # (n, u)
layer.fs['b'] = (1, layer.d['u'])               # (1, u)

return None


Within a Dense layer, shapes of interest include:

• Input X of shape (m, n) with m equal to the number of samples and n the number of features per sample.

• Weight W of shape (n, u) with n the number of features per sample and u the number of units in the current layer k.

• Bias b of shape (1, u) with u the number of units in the layer.

Note that:

• Parameters shape for W and b is independent from the number of samples m.

• The number of features n per sample may be expressed in this context as the number of units in the previous layer k-1, even though this definition may tend to be less general.

### Forward

Dense.forward(A)[source]

Wrapper for epynn.dense.forward.dense_forward().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray

def dense_forward(layer, A):
"""Forward propagate signal to next layer.
"""
# (1) Initialize cache
X = initialize_forward(layer, A)

# (2) Linear activation X -> Z
Z = layer.fc['Z'] = (
np.dot(X, layer.p['W'])
+ layer.p['b']
)   # This is the weighted sum

# (3) Non-linear activation Z -> A
A = layer.fc['A'] = layer.activate(Z)

return A    # To next layer


The forward propagation function in a Dense layer k includes:

• (1): Input X in current layer k is equal to the output A of previous layer k-1.

• (2): Z is computed by applying a dot product operation between X and W, on which the bias b is added.

• (3): Output A is computed by applying a non-linear activation function on Z.

Note that:

• Z may be referred to as the (biased) weighted sum of inputs by parameters or as the linear activation product.

• A may be referred to as the non-linear activation product or simply the output of Dense layer k.

\begin{split}\begin{alignat*}{2} & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\ \\ & z^{k}_{mu} &&= x^{k}_{mn} \cdot W^{k}_{nu} \\ & &&+ b^{k}_{u} \tag{2} \\ & a^{k}_{mu} &&= a_{act}(z^{k}_{mu}) \tag{3} \end{alignat*}\end{split}

### Backward

Dense.backward(dX)[source]

Wrapper for epynn.dense.backward.dense_backward().

Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray

def dense_backward(layer, dX):
"""Backward propagate error gradients to previous layer.
"""
# (1) Initialize cache
dA = initialize_backward(layer, dX)

# (2) Gradient of the loss with respect to Z
dA,
layer.activate(layer.fc['Z'], deriv=True)
)    # dL/dZ

# (3) Gradient of the loss with respect to X
dX = layer.bc['dX'] = np.dot(dZ, layer.p['W'].T)   # dL/dX

return dX    # To previous layer


The backward propagation function in a Dense layer k includes:

• (1): dA the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

• (2): dZ is the gradient of the loss with respect to Z. It is computed by applying element-wise multiplication between dA and the derivative of the non-linear activation function applied on Z.

• (3): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is computed by applying a dot product operation between dZ and the transpose of W.

Note that:

• The expression gradient of the loss with respect to is equivalent to partial derivative of the loss with respect to.

• The variable dA is often referred to as the error term for layer k+1 and dX the error term for layer k.

• In contrast to the forward pass, parameters are used to weight dZ with shape (m, u). Therefore, we use the transpose of W with shape (u, n) in order to compute the dot product.

\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mu} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mu}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mu}} \tag{1} \\ \\ & \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} &&= \delta^{\kp}_{mu} \\ & &&* a_{act}'(z^{k}_{mu}) \tag{2} \\ & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \cdot W^{k~{\intercal}}_{nu} \tag{3} \\ \end{alignat*}\end{split}

def dense_compute_gradients(layer):
"""Compute gradients with respect to weight and bias for layer.
"""
X = layer.fc['X']      # Input of forward propagation
dZ = layer.bc['dZ']    # Gradient of the loss with respect to Z

# (1) Gradient of the loss with respect to W, b
dW = layer.g['dW'] = np.dot(X.T, dZ)       # (1.1) dL/dW
db = layer.g['db'] = np.sum(dZ, axis=0)    # (1.2) dL/db

return None


The function to compute parameter gradients in a Dense layer k includes:

• (1.1): dW is the gradient of the loss with respect to W. It is computed by applying a dot product operation between the transpose of X and dZ.

• (1.2): db is the gradient of the loss with respect to b. It is computed by summing dZ along the axis corresponding to the number of samples m.

Note that:

• We use the transpose of X with shape (n, m) for the dot product operation with dZ of shape (m, u).

\begin{split}\begin{alignat*}{2} & \frac{\partial \mathcal{L}}{\partial W^{k}_{nu}} &&= x^{k~{\intercal}}_{mn} \cdot \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.1} \\ & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{m = 1}^M \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.2} \end{alignat*}\end{split}

## Live examples

The Dense layer is used as an output layer in every Network training examples provided with EpyNN

Examples of pure Feed-Forward Neural Networks within these examples can be directly accessed from: