Appendix

Notations

Conventions are related to mathematical expression on EpyNN’s website. Divergences with the Python code are highlighted when applicable.

Arithmetic operators

\(+\) and \(-\): Element-wise addition/subtraction between matrices, scalar addition/subtraction to/from each element of one matrix, scalar addition/subtraction to/from another scalar.
\(*\) and \(/\): Element-wise multiplication/division between matrices (See Hadamard product (matrices) on Wikipedia), matrix multiplication/division by a scalar, scalar multiplication/division by another scalar.
\(\cdot\): Dot product between matrices (See Dot product on Wikipedia).

Names of matrices

Layers input and output:

\(X\): Input of forward propagation.
\(A\): Output of forward propagation.
\(\frac{\partial \mathcal{L}}{\partial A}\): Input of backward propagation. Referred to as dA in Python code.
\(\frac{\partial \mathcal{L}}{\partial X}\): Output of backward propagation. Referred to as dX in Python code.

Layers parameters:

\(W\): Weight applied to inputs for Dense and Convolution layers.
\(U\): Weight applied to inputs for RNN, LSTM and GRU layers.
\(V\): Weight applied to hidden cell state for RNN, LSTM and GRU layers.
\(b\): Bias added to weighted sums.

Linear and non-linear activation products:

\(Z~and~A\): For Dense and Convolution layers, \(Z\) is the weighted sum of inputs also known as linear activation product while \(A\) is the product of non-linear activation.
\(Z~and~A\): For Embedding, Pooling, Dropout and Flatten layers, \(Z\) is the result of layer processing equal to the output \(A\) of this same layer. It has no relationship with linear and non-linear activation - because there is none - but the names are kept for the purpose of homogeneity.
\(h\_~and~h\): For recurrent RNN, LSTM and GRU layers, the underscore appended to the variable name denotes the linear activation product while the underscore-free variable denotes the non-linear activation product. Note that the underscore notation also applies to partial derivatives.

Dimensions and indexing

Uppercase and lowercase letters represent dimensions and corresponding index, respectively.

In the python code, note that dimension D is stored in the layer’s .d dictionary attribute layer.d['d'] while the corresponding index d is a namespace variable such as d.

Frequently used:

\(K, k\): Number of layers in network.
\(U, u\): Number of units in layer \(k\).
\(M, m\): Number of training examples.
\(N, n\): Number of features per training example.

Note that in the case where layer \(k-1\) is a Dense layer or a recurrent layer RNN, GRU, LSTM with sequences=False, then \(N\) is equal to the number of units in layer \(k-1\).

Related to recurrent architectures:

\(S, s\): Number of steps in sequence.
\(E, e\): Number of elements for steps in sequence.

Note that in the context, it is considered that \(S * E = N\).

Related to CNN:

\(H, h\): Height of features.
\(W, w\): Width of features.
\(D, d\): Depth of features.
\(Sh, s_h\): Stride height.
\(Sw, s_w\): Stride Width.
\(Oh, o_h\): Output height.
\(Ow, o_w\): Output width.
\(Fh, f_h\): Filter height (Convolution).
\(Fw, f_w\): Filter Width (Convolution).
\(Ph, p_h\): Pool height (Pooling).
\(Pw, p_w\): Pool Width (Pooling).

Note that in the context, it is considered that \(H * W * D = N\).

Glossary

In order to not reinvent the wheel, note that definitions below may be sourced from external resources.

Activation: Function that defines how the weighted sum of the input is transformed into an output.
Bias: Additional set of parameters in one layer added to products of weight input operations with respect to units.
Cell: In the context of recurrent networks, one cell may be equivalent to one unit.
Class (Python): Prototype of an object.
CNN: Type of neural network used in image recognition and processing.
Convolution: Layer used in CNNs to merge input data with filter or kernel and to produce a feature map.
Cost: Scalar value which is some kind of average of the loss.
Dense: Fully-connected layer made of one or more nodes. Each node receives input from all nodes in the previous layer.
Dictionary (Python): Unordered collection of data organized as key: value pairs.
Dropout: Dropping out units in one layer for neural network regularization.
Embedding: Input layer in EpyNN, more generally any process or object that prepares or contain data fed to the layer coming next after the input layer.
Feed-Forward: Type of layer architecture wherein units do not contain loops.
Flatten: May refer to a reshaping layer acting forward to reduce 2D+ data into 2D data and reversing the operation backward.
Float (Python): Number that is not an integer.
Gate: Acts as a threshold to help the network to distinguish when to use normal stacked layers or an identity connection.
GRU: Recurrent layer made of one or more unit cells. Two gates and one activation (hidden cell state).
Hyperparameters: May refer to settings whose value is used to control the learning process.
Immutable (Python): Object whose internal state can not be changed.
Instance (Python): An individual object of a certain class.
Instantiate (Python): Creation of an object instance.
Instantiation (Python): The action of creating an object instance.
Integer (Python): Zero, positive or negative numbers without fractional part.
Layer: Collection of nodes or units operating together at a specific depth within a neural network.
List (Python): Mutable data type containing an ordered and indexed sequence.
Loss: Error with respect to one loss function which is computed for each training example and output probability.
LSTM: Recurrent layer made of one or more unit cells. Three gates and two activation (hidden and memory cell states).
Metrics: Function used to judge the performance of one model.
Model: A specific design of a neural network which incorporates layers of given architecture.
Mutable (Python): Object whose internal state can be changed.
Neural Network: Series of algorithms that endeavors to recognize underlying relationships in a set of data.
Neuron: May be equivalent to unit.
Node: May be equivalent to unit.
Parameters: May refer to trainable parameters within a neural network, namely weights and bias.
Pooling: Compression layer used in CNNs whose function is to reduce the spatial size of a given representation to reduce the amount of parameters and computation in the network.
Recurrent: Type of layer architecture wherein units contain loops, allowing information to be stored within one unit with respect to sequential data.
RNN: Recurrent layer made of one or more unit cells. Single activation (hidden cell state).
Set (Python): Collection which is unordered and unindexed.
String (Python): Immutable sequence data type made of characters.
Trainable: May refer to architecture layers incorporating unfrozen trainable parameters (weight, bias).
Tuple (Python): Immutable sequence data type made of any type of values.
Unit: The functional entity within a layer which is composed of a certain number of units.
Weight: Parameter within layers that transforms input data to output data.