Dummy dataset
Find this notebook at
EpyNN/epynnlive/dummy_image/prepare_dataset.ipynb
.Regular python code at
EpyNN/epynnlive/dummy_image/prepare_dataset.py
.
Run the notebook online with Google Colab.
Level: Intermediate
This notebook is part of the series on preparing data for Neural Network regression with EpyNN.
In addition to the topic-specific content, it contains several explanations about basics or general concepts in programming that are important in the context.
Note that elements developed in the following notebooks may not be reviewed herein:
What is an image?
Instinctively, an image may resemble a 2D plane composed of WIDTH * HEIGHT colored units arranged together in a particular manner.
In computing, a 2D image is generally a 3D object which is composed of WIDTH * HEIGHT units within each plane with respect the the third dimension, which is the DEPTH of the image, therefore giving WIDTH * HEIGHT * DEPTH = N_FEATURES.
Image depth is simply the number of channels which compose the image. You are certainly aware of RGB colors, for instance. In the RGB scheme, one color is written such as rgb(int, int, int)
or rgb(255, 0, 0)
, rgb(0, 255, 0)
and rgb(0, 0, 255)
for pure red, green and blue, respectively. One RGB image would therefore have a DEPTH equal to 3, because of the three channels within.
Note that following this scheme, an image is made of Numerical data, namely integer or int
.
Why preparing a dummy dataset of images?
In addition to general considerations reviewed here, this may be a good idea to practically understand what an image is, how to build an image, and how to handle such kind of data overall.
Prepare a set of image sample features and related label
There is no specific import for this notebook since we will create images from scratch.
Imports
[1]:
# EpyNN/epynnlive/dummy_image/prepare_dataset.ipynb
# Install dependencies
!pip3 install --upgrade-strategy only-if-needed epynn
# Standard library imports
import random
# Related third party imports
import matplotlib.pyplot as plt
import numpy as np
Requirement already satisfied: epynn in /home/synthase/.local/lib/python3.9/site-packages (1.2.6)
Requirement already satisfied: numpy==1.21.2 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.21.2)
Requirement already satisfied: texttable==1.6.4 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.6.4)
Requirement already satisfied: Pygments==2.10.0 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (2.10.0)
Requirement already satisfied: Pillow==8.3.1 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (8.3.1)
Requirement already satisfied: cycler==0.10.0 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (0.10.0)
Requirement already satisfied: kiwisolver==1.3.2 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.3.2)
Requirement already satisfied: nbconvert==5.4.1 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (5.4.1)
Requirement already satisfied: pyparsing==2.4.7 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (2.4.7)
Requirement already satisfied: jupyter==1.0.0 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.0.0)
Requirement already satisfied: termcolor==1.1.0 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.1.0)
Requirement already satisfied: utilsovs-pkg in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (0.9.4)
Requirement already satisfied: python-dateutil==2.8.2 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (2.8.2)
Requirement already satisfied: tabulate==0.8.9 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (0.8.9)
Requirement already satisfied: six==1.16.0 in /usr/lib/python3/dist-packages (from epynn) (1.16.0)
Requirement already satisfied: wget==3.2 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (3.2)
Requirement already satisfied: matplotlib==3.4.3 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (3.4.3)
Requirement already satisfied: scipy==1.6.3 in /home/synthase/.local/lib/python3.9/site-packages (from epynn) (1.6.3)
Requirement already satisfied: notebook in /home/synthase/.local/lib/python3.9/site-packages (from jupyter==1.0.0->epynn) (6.4.11)
Requirement already satisfied: qtconsole in /home/synthase/.local/lib/python3.9/site-packages (from jupyter==1.0.0->epynn) (5.3.0)
Requirement already satisfied: jupyter-console in /home/synthase/.local/lib/python3.9/site-packages (from jupyter==1.0.0->epynn) (6.4.3)
Requirement already satisfied: ipywidgets in /home/synthase/.local/lib/python3.9/site-packages (from jupyter==1.0.0->epynn) (7.7.0)
Requirement already satisfied: ipykernel in /home/synthase/.local/lib/python3.9/site-packages (from jupyter==1.0.0->epynn) (6.13.0)
Requirement already satisfied: bleach in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (4.1.0)
Requirement already satisfied: entrypoints>=0.2.2 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (0.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (1.5.0)
Requirement already satisfied: testpath in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (0.6.0)
Requirement already satisfied: mistune>=0.8.1 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (0.8.4)
Requirement already satisfied: defusedxml in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (0.7.1)
Requirement already satisfied: nbformat>=4.4 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (5.3.0)
Requirement already satisfied: jinja2 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (3.0.3)
Requirement already satisfied: jupyter-core in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (4.10.0)
Requirement already satisfied: traitlets>=4.2 in /home/synthase/.local/lib/python3.9/site-packages (from nbconvert==5.4.1->epynn) (5.1.1)
Requirement already satisfied: fastjsonschema in /home/synthase/.local/lib/python3.9/site-packages (from nbformat>=4.4->nbconvert==5.4.1->epynn) (2.15.3)
Requirement already satisfied: jsonschema>=2.6 in /usr/lib/python3/dist-packages (from nbformat>=4.4->nbconvert==5.4.1->epynn) (3.2.0)
Requirement already satisfied: packaging in /home/synthase/.local/lib/python3.9/site-packages (from bleach->nbconvert==5.4.1->epynn) (21.3)
Requirement already satisfied: webencodings in /home/synthase/.local/lib/python3.9/site-packages (from bleach->nbconvert==5.4.1->epynn) (0.5.1)
Requirement already satisfied: ipython>=7.23.1 in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (8.2.0)
Requirement already satisfied: debugpy>=1.0 in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (1.6.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (7.2.2)
Requirement already satisfied: psutil in /usr/lib/python3/dist-packages (from ipykernel->jupyter==1.0.0->epynn) (5.8.0)
Requirement already satisfied: matplotlib-inline>=0.1 in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (0.1.3)
Requirement already satisfied: nest-asyncio in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (1.5.5)
Requirement already satisfied: tornado>=6.1 in /home/synthase/.local/lib/python3.9/site-packages (from ipykernel->jupyter==1.0.0->epynn) (6.1)
Requirement already satisfied: backcall in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.2.0)
Requirement already satisfied: decorator in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.18.1)
Requirement already satisfied: stack-data in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.2.0)
Requirement already satisfied: pickleshare in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.7.5)
Requirement already satisfied: setuptools>=18.5 in /usr/lib/python3/dist-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (52.0.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (3.0.29)
Requirement already satisfied: pexpect>4.3 in /home/synthase/.local/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (4.8.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /home/synthase/.local/lib/python3.9/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.8.3)
Requirement already satisfied: pyzmq>=22.3 in /home/synthase/.local/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel->jupyter==1.0.0->epynn) (22.3.0)
Requirement already satisfied: ptyprocess>=0.5 in /home/synthase/.local/lib/python3.9/site-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.7.0)
Requirement already satisfied: wcwidth in /home/synthase/.local/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.2.5)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /home/synthase/.local/lib/python3.9/site-packages (from ipywidgets->jupyter==1.0.0->epynn) (3.6.0)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /home/synthase/.local/lib/python3.9/site-packages (from ipywidgets->jupyter==1.0.0->epynn) (1.1.0)
Requirement already satisfied: ipython-genutils~=0.2.0 in /home/synthase/.local/lib/python3.9/site-packages (from ipywidgets->jupyter==1.0.0->epynn) (0.2.0)
Requirement already satisfied: terminado>=0.8.3 in /home/synthase/.local/lib/python3.9/site-packages (from notebook->jupyter==1.0.0->epynn) (0.13.3)
Requirement already satisfied: argon2-cffi in /home/synthase/.local/lib/python3.9/site-packages (from notebook->jupyter==1.0.0->epynn) (21.3.0)
Requirement already satisfied: Send2Trash>=1.8.0 in /home/synthase/.local/lib/python3.9/site-packages (from notebook->jupyter==1.0.0->epynn) (1.8.0)
Requirement already satisfied: prometheus-client in /home/synthase/.local/lib/python3.9/site-packages (from notebook->jupyter==1.0.0->epynn) (0.14.1)
Requirement already satisfied: argon2-cffi-bindings in /home/synthase/.local/lib/python3.9/site-packages (from argon2-cffi->notebook->jupyter==1.0.0->epynn) (21.2.0)
Requirement already satisfied: cffi>=1.0.1 in /home/synthase/.local/lib/python3.9/site-packages/cffi-1.15.0-py3.9-linux-x86_64.egg (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter==1.0.0->epynn) (1.15.0)
Requirement already satisfied: pycparser in /home/synthase/.local/lib/python3.9/site-packages/pycparser-2.21-py3.9.egg (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter==1.0.0->epynn) (2.21)
Requirement already satisfied: MarkupSafe>=2.0 in /home/synthase/.local/lib/python3.9/site-packages (from jinja2->nbconvert==5.4.1->epynn) (2.0.1)
Requirement already satisfied: qtpy>=2.0.1 in /home/synthase/.local/lib/python3.9/site-packages (from qtconsole->jupyter==1.0.0->epynn) (2.0.1)
Requirement already satisfied: pure-eval in /home/synthase/.local/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.2.2)
Requirement already satisfied: executing in /home/synthase/.local/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (0.8.2)
Requirement already satisfied: asttokens in /home/synthase/.local/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter==1.0.0->epynn) (2.0.5)
Seeding
[2]:
random.seed(0)
np.random.seed(0)
For reproducibility, as detailed here.
Generate features
To generate an image, we need:
Dimensions:
HEIGHT
andWIDTH
.Number of channels:
DEPTH
which is equal to1
herein because we will prepare grayscale images. This would be 3 for RGB images, 4 for CMYK images, etc.Number of tones per channel:
N_TONES
.The actual palette of shades:
GSCALE
.
The actual function to generate such image:
[3]:
def features_image(WIDTH=28, HEIGHT=28):
"""Generate dummy image features.
:param WIDTH: Image width, defaults to 28.
:type WIDTH: int
:param HEIGHT: Image height, defaults to 28.
:type HEIGHT: int
:return: Random image features of size N_FEATURES.
:rtype: :class:`numpy.ndarray`
:return: Non-random image features of size N_FEATURES.
:rtype: :class:`numpy.ndarray`
"""
# Number of channels is one for greyscale images
DEPTH = 1
# Number of features describing a sample
N_FEATURES = WIDTH * HEIGHT * DEPTH
# Number of distinct tones in features
N_TONES = 16
# Shades of grey
GSCALE = [i for i in range(N_TONES)]
# Random choice of shades for N_FEATURES iterations
features = [random.choice(GSCALE) for j in range(N_FEATURES)]
# Vectorization of features
features = np.array(features).reshape(HEIGHT, WIDTH, DEPTH)
# Masked features
mask_on_features = features.copy()
mask_on_features[np.random.randint(0, HEIGHT)] = np.zeros_like(features[0])
mask_on_features[:, np.random.randint(0, WIDTH)] = np.zeros_like(features[:, 0])
# Random choice between random image or masked image
features = random.choice([features, mask_on_features])
return features, mask_on_features
In addition to constructing one image with randomly selected tones, this function achieves a random choice between image features
and a modified version named mask_on_features
.
The latter consists of the former, with modifications: one column and one row were randomly selected and had values for corresponding data points set to zero, visually corresponding to the following.
[4]:
fig, ax = plt.subplots(2, 2)
for i in range(2):
features, mask_on_features = features_image()
title = 'Random' if np.sum(features) != np.sum(mask_on_features) else 'Non-random'
ax[0, i].imshow(features, cmap='gray')
ax[0, i].set_title(title)
ax[1, i].imshow(features - mask_on_features, cmap='gray')
ax[1, i].set_title('Diff. with Non-random')
plt.tight_layout()
plt.show()
These are image features
we retrieve for two samples (first row).
Below is the difference between features
and mask_on_features
.
When the difference renders an all black image, it means this difference is equal to zero and so the function has returned mask_on_features
from the random choice within [features, mask_on_features]
.
Said another way, when the difference between features
and mask_on_features
is zero then features = mask_on_features
(non-random image).
Generate label
Label is associated with features
depending on if the corresponding image is random or not random.
[3]:
def label_features(features, mask_on_features):
"""Prepare label associated with features.
The dummy law is:
Image is NOT random = positive
Image is random = negative
:param features: Random image features of size N_FEATURES
:type features: :class:`numpy.ndarray`
:param mask_on_features: Non-random image features of size N_FEATURES
:type mask_on_features: :class:`numpy.ndarray`
:return: Single-digit label with respect to features
:rtype: int
"""
# Single-digit positive and negative labels
p_label = 0
n_label = 1
# Test if image is not random (0)
if np.sum(features) == np.sum(mask_on_features):
label = p_label
# Test if image is random (1)
elif np.sum(features) != np.sum(mask_on_features):
label = n_label
return label
The code above is commented and self explaining.
Let’s check the function we made for a few iterations.
[6]:
for i in range(5):
features, mask_on_features = features_image()
label = label_features(features, mask_on_features)
print(label, np.sum(mask_on_features), np.sum(mask_on_features))
0 5436 5436
1 5500 5500
0 5375 5375
1 5412 5412
1 5531 5531
Prepare dataset
The generic function we use to actually prepare a full labeled dataset.
[7]:
def prepare_dataset(N_SAMPLES=100):
"""Prepare a set of dummy time sample features and label.
:param N_SAMPLES: Number of samples to generate, defaults to 100.
:type N_SAMPLES: int
:return: Set of sample features.
:rtype: tuple[:class:`numpy.ndarray`]
:return: Set of single-digit sample label.
:rtype: tuple[int]
"""
# Initialize X and Y datasets
X_features = []
Y_label = []
# Iterate over N_SAMPLES
for i in range(N_SAMPLES):
# Compute random string features
features, mask_on_features = features_image()
# Retrieve label associated with features
label = label_features(features, mask_on_features)
# Append sample features to X_features
X_features.append(features)
# Append sample label to Y_label
Y_label.append(label)
# Prepare X-Y pairwise dataset
dataset = list(zip(X_features, Y_label))
# Shuffle dataset
random.shuffle(dataset)
# Separate X-Y pairs
X_features, Y_label = zip(*dataset)
return X_features, Y_label
We can test this function.
[8]:
X_features, Y_label = prepare_dataset(N_SAMPLES=5)
for features, label in zip(X_features, Y_label):
plt.imshow(features, cmap='gray')
plt.title('Digit = label = %s' % label)
plt.show()
Note that on this run we got a deviation from the expected mean: 4 images were associated with label 0 while only one was with label 1. According to the code, we expect a balanced dataset.
Just to make sure.
[9]:
X_features, Y_label = prepare_dataset(N_SAMPLES=100)
print(Y_label.count(0))
print(Y_label.count(1))
54
46
The output of the code then fits our expectations.
Live examples
The function prepare_dataset()
presented herein is used in the following live examples:
Notebook at
EpyNN/epynnlive/dummy_image/train.ipynb
or following this link.Regular python code at
EpyNN/epynnlive/dummy_image/train.py
.