Appendix

Notations

Conventions are related to mathematical expression on EpyNN’s website. Divergences with the Python code are highlighted when applicable.

Arithmetic operators

\(+\) and \(-\)

Element-wise addition/subtraction between matrices, scalar addition/subtraction to/from each element of one matrix, scalar addition/subtraction to/from another scalar.

\(*\) and \(/\)

Element-wise multiplication/division between matrices (See Hadamard product (matrices) on Wikipedia), matrix multiplication/division by a scalar, scalar multiplication/division by another scalar.

\(\cdot\)

Dot product between matrices (See Dot product on Wikipedia).

Names of matrices

Layers input and output:

\(X\)

Input of forward propagation.

\(A\)

Output of forward propagation.

\(\frac{\partial \mathcal{L}}{\partial A}\)

Input of backward propagation. Referred to as dA in Python code.

\(\frac{\partial \mathcal{L}}{\partial X}\)

Output of backward propagation. Referred to as dX in Python code.

Layers parameters:

\(W\)

Weight applied to inputs for Dense and Convolution layers.

\(U\)

Weight applied to inputs for RNN, LSTM and GRU layers.

\(V\)

Weight applied to hidden cell state for RNN, LSTM and GRU layers.

\(b\)

Bias added to weighted sums.

Linear and non-linear activation products:

\(Z~and~A\)

For Dense and Convolution layers, \(Z\) is the weighted sum of inputs also known as linear activation product while \(A\) is the product of non-linear activation.

\(Z~and~A\)

For Embedding, Pooling, Dropout and Flatten layers, \(Z\) is the result of layer processing equal to the output \(A\) of this same layer. It has no relationship with linear and non-linear activation - because there is none - but the names are kept for the purpose of homogeneity.

\(h\_~and~h\)

For recurrent RNN, LSTM and GRU layers, the underscore appended to the variable name denotes the linear activation product while the underscore-free variable denotes the non-linear activation product. Note that the underscore notation also applies to partial derivatives.

Dimensions and indexing

Uppercase and lowercase letters represent dimensions and corresponding index, respectively.

In the python code, note that dimension D is stored in the layer’s .d dictionary attribute layer.d['d'] while the corresponding index d is a namespace variable such as d.

Frequently used:

\(K, k\)

Number of layers in network.

\(U, u\)

Number of units in layer \(k\).

\(M, m\)

Number of training examples.

\(N, n\)

Number of features per training example.

Note that in the case where layer \(k-1\) is a Dense layer or a recurrent layer RNN, GRU, LSTM with sequences=False, then \(N\) is equal to the number of units in layer \(k-1\).

Related to recurrent architectures:

\(S, s\)

Number of steps in sequence.

\(E, e\)

Number of elements for steps in sequence.

Note that in the context, it is considered that \(S * E = N\).

Related to CNN:

\(H, h\)

Height of features.

\(W, w\)

Width of features.

\(D, d\)

Depth of features.

\(Sh, s_h\)

Stride height.

\(Sw, s_w\)

Stride Width.

\(Oh, o_h\)

Output height.

\(Ow, o_w\)

Output width.

\(Fh, f_h\)

Filter height (Convolution).

\(Fw, f_w\)

Filter Width (Convolution).

\(Ph, p_h\)

Pool height (Pooling).

\(Pw, p_w\)

Pool Width (Pooling).

Note that in the context, it is considered that \(H * W * D = N\).

Glossary

In order to not reinvent the wheel, note that definitions below may be sourced from external resources.

Activation

Function that defines how the weighted sum of the input is transformed into an output.

Bias

Additional set of parameters in one layer added to products of weight input operations with respect to units.

Cell

In the context of recurrent networks, one cell may be equivalent to one unit.

Class (Python)

Prototype of an object.

CNN

Type of neural network used in image recognition and processing.

Convolution

Layer used in CNNs to merge input data with filter or kernel and to produce a feature map.

Cost

Scalar value which is some kind of average of the loss.

Dense

Fully-connected layer made of one or more nodes. Each node receives input from all nodes in the previous layer.

Dictionary (Python)

Unordered collection of data organized as key: value pairs.

Dropout

Dropping out units in one layer for neural network regularization.

Embedding

Input layer in EpyNN, more generally any process or object that prepares or contain data fed to the layer coming next after the input layer.

Feed-Forward

Type of layer architecture wherein units do not contain loops.

Flatten

May refer to a reshaping layer acting forward to reduce 2D+ data into 2D data and reversing the operation backward.

Float (Python)

Number that is not an integer.

Gate

Acts as a threshold to help the network to distinguish when to use normal stacked layers or an identity connection.

GRU

Recurrent layer made of one or more unit cells. Two gates and one activation (hidden cell state).

Hyperparameters

May refer to settings whose value is used to control the learning process.

Immutable (Python)

Object whose internal state can not be changed.

Instance (Python)

An individual object of a certain class.

Instantiate (Python)

Creation of an object instance.

Instantiation (Python)

The action of creating an object instance.

Integer (Python)

Zero, positive or negative numbers without fractional part.

Layer

Collection of nodes or units operating together at a specific depth within a neural network.

List (Python)

Mutable data type containing an ordered and indexed sequence.

Loss

Error with respect to one loss function which is computed for each training example and output probability.

LSTM

Recurrent layer made of one or more unit cells. Three gates and two activation (hidden and memory cell states).

Metrics

Function used to judge the performance of one model.

Model

A specific design of a neural network which incorporates layers of given architecture.

Mutable (Python)

Object whose internal state can be changed.

Neural Network

Series of algorithms that endeavors to recognize underlying relationships in a set of data.

Neuron

May be equivalent to unit.

Node

May be equivalent to unit.

Parameters

May refer to trainable parameters within a neural network, namely weights and bias.

Pooling

Compression layer used in CNNs whose function is to reduce the spatial size of a given representation to reduce the amount of parameters and computation in the network.

Recurrent

Type of layer architecture wherein units contain loops, allowing information to be stored within one unit with respect to sequential data.

RNN

Recurrent layer made of one or more unit cells. Single activation (hidden cell state).

Set (Python)

Collection which is unordered and unindexed.

String (Python)

Immutable sequence data type made of characters.

Trainable

May refer to architecture layers incorporating unfrozen trainable parameters (weight, bias).

Tuple (Python)

Immutable sequence data type made of any type of values.

Unit

The functional entity within a layer which is composed of a certain number of units.

Weight

Parameter within layers that transforms input data to output data.