# Appendix

## Notations

Conventions are related to mathematical expression on EpyNN’s website. Divergences with the Python code are highlighted when applicable.

### Arithmetic operators

- \(+\) and \(-\)
Element-wise addition/subtraction between matrices, scalar addition/subtraction to/from each element of one matrix, scalar addition/subtraction to/from another scalar.

- \(*\) and \(/\)
Element-wise multiplication/division between matrices (See Hadamard product (matrices) on Wikipedia), matrix multiplication/division by a scalar, scalar multiplication/division by another scalar.

- \(\cdot\)
Dot product between matrices (See Dot product on Wikipedia).

### Names of matrices

**Layers input and output:**

- \(X\)
Input of forward propagation.

- \(A\)
Output of forward propagation.

- \(\frac{\partial \mathcal{L}}{\partial A}\)
Input of backward propagation. Referred to as

`dA`

in Python code.- \(\frac{\partial \mathcal{L}}{\partial X}\)
Output of backward propagation. Referred to as

`dX`

in Python code.

**Layers parameters:**

- \(W\)
Weight applied to inputs for

*Dense*and*Convolution*layers.- \(U\)
Weight applied to inputs for

*RNN*,*LSTM*and*GRU*layers.- \(V\)
Weight applied to hidden cell state for

*RNN*,*LSTM*and*GRU*layers.- \(b\)
Bias added to weighted sums.

**Linear and non-linear activation products:**

- \(Z~and~A\)
For

*Dense*and*Convolution*layers, \(Z\) is the weighted sum of inputs also known as linear activation product while \(A\) is the product of non-linear activation.- \(Z~and~A\)
For

*Embedding*,*Pooling*,*Dropout*and*Flatten*layers, \(Z\) is the result of layer processing equal to the output \(A\) of this same layer. It has no relationship with linear and non-linear activation - because there is none - but the names are kept for the purpose of homogeneity.- \(h\_~and~h\)
For recurrent

*RNN*,*LSTM*and*GRU*layers, the underscore appended to the variable name denotes the linear activation product while the underscore-free variable denotes the non-linear activation product. Note that the underscore notation also applies to partial derivatives.

### Dimensions and indexing

Uppercase and lowercase letters represent dimensions and corresponding index, respectively.

In the python code, note that dimension *D* is stored in the layer’s `.d`

dictionary attribute `layer.d['d']`

while the corresponding index *d* is a namespace variable such as `d`

.

**Frequently used:**

- \(K, k\)
Number of layers in network.

- \(U, u\)
Number of units in layer \(k\).

- \(M, m\)
Number of training examples.

- \(N, n\)
Number of features per training example.

Note that in the case where layer \(k-1\) is a *Dense* layer or a recurrent layer *RNN, GRU, LSTM* with *sequences=False*, then \(N\) is equal to the number of units in layer \(k-1\).

**Related to recurrent architectures:**

- \(S, s\)
Number of steps in sequence.

- \(E, e\)
Number of elements for steps in sequence.

Note that in the context, it is considered that \(S * E = N\).

**Related to CNN:**

- \(H, h\)
Height of features.

- \(W, w\)
Width of features.

- \(D, d\)
Depth of features.

- \(Sh, s_h\)
Stride height.

- \(Sw, s_w\)
Stride Width.

- \(Oh, o_h\)
Output height.

- \(Ow, o_w\)
Output width.

- \(Fh, f_h\)
Filter height (Convolution).

- \(Fw, f_w\)
Filter Width (Convolution).

- \(Ph, p_h\)
Pool height (Pooling).

- \(Pw, p_w\)
Pool Width (Pooling).

Note that in the context, it is considered that \(H * W * D = N\).

## Glossary

In order to not reinvent the wheel, note that definitions below may be sourced from external resources.

- Activation
Function that defines how the weighted sum of the input is transformed into an output.

- Bias
Additional set of parameters in one layer added to products of weight input operations with respect to units.

- Cell
In the context of recurrent networks, one cell may be equivalent to one unit.

- Class (Python)
Prototype of an object.

- CNN
Type of neural network used in image recognition and processing.

- Convolution
Layer used in CNNs to merge input data with filter or kernel and to produce a feature map.

- Cost
Scalar value which is some kind of average of the loss.

- Dense
Fully-connected layer made of one or more nodes. Each node receives input from all nodes in the previous layer.

- Dictionary (Python)
Unordered collection of data organized as key: value pairs.

- Dropout
Dropping out units in one layer for neural network regularization.

- Embedding
Input layer in EpyNN, more generally any process or object that prepares or contain data fed to the layer coming next after the input layer.

- Feed-Forward
Type of layer architecture wherein units do not contain loops.

- Flatten
May refer to a reshaping layer acting forward to reduce 2D+ data into 2D data and reversing the operation backward.

- Float (Python)
Number that is not an integer.

- Gate
Acts as a threshold to help the network to distinguish when to use normal stacked layers or an identity connection.

- GRU
Recurrent layer made of one or more unit cells. Two gates and one activation (hidden cell state).

- Hyperparameters
May refer to settings whose value is used to control the learning process.

- Immutable (Python)
Object whose internal state can not be changed.

- Instance (Python)
An individual object of a certain class.

- Instantiate (Python)
Creation of an object instance.

- Instantiation (Python)
The action of creating an object instance.

- Integer (Python)
Zero, positive or negative numbers without fractional part.

- Layer
Collection of nodes or units operating together at a specific depth within a neural network.

- List (Python)
Mutable data type containing an ordered and indexed sequence.

- Loss
Error with respect to one loss function which is computed for each training example and output probability.

- LSTM
Recurrent layer made of one or more unit cells. Three gates and two activation (hidden and memory cell states).

- Metrics
Function used to judge the performance of one model.

- Model
A specific design of a neural network which incorporates layers of given architecture.

- Mutable (Python)
Object whose internal state can be changed.

- Neural Network
Series of algorithms that endeavors to recognize underlying relationships in a set of data.

- Neuron
May be equivalent to unit.

- Node
May be equivalent to unit.

- Parameters
May refer to trainable parameters within a neural network, namely weights and bias.

- Pooling
Compression layer used in CNNs whose function is to reduce the spatial size of a given representation to reduce the amount of parameters and computation in the network.

- Recurrent
Type of layer architecture wherein units contain loops, allowing information to be stored within one unit with respect to sequential data.

- RNN
Recurrent layer made of one or more unit cells. Single activation (hidden cell state).

- Set (Python)
Collection which is unordered and unindexed.

- String (Python)
Immutable sequence data type made of characters.

- Trainable
May refer to architecture layers incorporating unfrozen trainable parameters (weight, bias).

- Tuple (Python)
Immutable sequence data type made of any type of values.

- Unit
The functional entity within a layer which is composed of a certain number of units.

- Weight
Parameter within layers that transforms input data to output data.