Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 54 additions & 25 deletions 31_image_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,12 @@
"return len(self.images)\n",
"```\n",
"\n",
"Complete function ```__getitem__``` - this method is needed to lety the generator know what to do to samples when calling them:\n",
"Complete function ```__getitem__``` - this method is needed to let the generator know what to do to samples when calling them:\n",
"```python\n",
"image = self.images[idx]\n",
"label = self.labels[idx]\n",
"\n",
"# Ensure the image is in the shape (H, W, C) for Albumentations library (library used for image augmentation)\n",
"# Pytorch expects images with the shape (Channels, Height, Width). However, Albumentations library (library used for image augmentation) requires them with the shape (Height, Width, Channels). Therefore, we need to use transpose to make them compatible with the library. Later in the notebook the images are converted back to the original shape by using A.ToTensorV2().\n",
"image = np.transpose(image, (1, 2, 0))\n",
"\n",
"# Apply transformations on the images\n",
Expand Down Expand Up @@ -549,6 +549,7 @@
"outputs": [],
"source": [
"import numpy as np\n",
"import torch\n",
"\n",
"class Trainer():\n",
" def __init__(self, model):\n",
Expand Down Expand Up @@ -618,8 +619,7 @@
"source": [
"### Load data\n",
"\n",
"Training and test sets are loaded using Pickle library. If you do not have the dataset already, open this [link](https://www.dropbox.com/scl/fo/p7gfb0kpgkbrrjup340pi/AAkX2u1g-W7290-Aq7gHHvo?rlkey=vdxaj6npfy09ywh17nl8f9v6e&st=8hfq9z20&dl=0) and download it.\n",
"Place it inside the data folder."
"Training and test sets are loaded using Pickle library."
]
},
{
Expand All @@ -629,12 +629,18 @@
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pathlib import Path\n",
"\n",
"# Get current directory\n",
"current_dir = Path.cwd()\n",
"\n",
"# Go up one level to /home/jovyan, then into datasets\n",
"datasets_dir = current_dir.parent / \"datasets\"\n",
"\n",
"# Sets filepaths\n",
"dataset_folder = os.path.join(\"data/CIFAR10\")\n",
"train_set_file = os.path.join(dataset_folder, \"train_set.pkl\")\n",
"test_set_file = os.path.join(dataset_folder, \"test_set.pkl\")\n",
"dataset_folder = datasets_dir / \"CIFAR10\"\n",
"train_set_file = dataset_folder / \"train_set.pkl\"\n",
"test_set_file = dataset_folder / \"test_set.pkl\"\n",
"\n",
"# Load sets\n",
"train_set = load_pickle_file(train_set_file)\n",
Expand Down Expand Up @@ -708,10 +714,10 @@
"They help the model become invariant to different orientations and scales:\n",
"\n",
"- **Scaling**: Resizes the image to a specific size, often required to match input dimensions for image classifiers.\n",
" It uses interpolation to obtain the new pixel-values.\n",
"- **Cropping**: Extracts a subregion of the image; useful for focusing on important parts or adding variability.\n",
"- **Horizontal and vertical flip**: Flips the image along the x-axis or y-axis; helps the model learn symmetry.\n",
"- **Rotation**: Rotates the image by a small angle to simulate different orientations of the objects."
" It uses interpolation to obtain the new pixel-values. Use ```cv2.resize()```\n",
"- **Cropping**: Extracts a subregion of the image; useful for focusing on important parts or adding variability. No openCV function needed.\n",
"- **Horizontal and vertical flip**: Flips the image along the x-axis or y-axis; helps the model learn symmetry. Use ```cv2.flip()```\n",
"- **Rotation**: Rotates the image by a small angle to simulate different orientations of the objects. Use ```cv2.getRotationMatrix2D()``` and ```cv2.warpAffine()```"
]
},
{
Expand Down Expand Up @@ -969,9 +975,9 @@
"Filtering helps reduce noise and enhance specific image features.\n",
"These are often used as a form of preprocessing before feeding images into a model:\n",
"\n",
"- **Average filter**: Applies a smoothing effect by replacing each pixel with the average of its neighborhood.\n",
"- **Median filter**: Reduces salt-and-pepper noise by replacing each pixel with the median of neighboring pixels.\n",
"- **Gaussian filter**: Applies a Gaussian blur to smooth the image, often used to reduce high-frequency noise."
"- **Average filter**: Applies a smoothing effect by replacing each pixel with the average of its neighborhood. Use ```cv2.blur()```.\n",
"- **Median filter**: Reduces salt-and-pepper noise by replacing each pixel with the median of neighboring pixels. Use ```cv2.medianBlur()```.\n",
"- **Gaussian filter**: Applies a Gaussian blur to smooth the image, often used to reduce high-frequency noise. Use ```cv2.GaussianBlur()```."
]
},
{
Expand Down Expand Up @@ -1130,9 +1136,9 @@
"\n",
"Photometric transformations modify the color properties of an image to simulate different lighting conditions and improve model robustness to brightness and contrast changes:\n",
"\n",
"- **Brightness**: Randomly increases or decreases the brightness of the image.\n",
"- **Contrast**: Alters the difference between light and dark regions in the image.\n",
"- **Saturation**: Modifies the intensity of the colors in the image."
"- **Brightness**: Randomly increases or decreases the brightness of the image. Use ```cv2.convertScaleAbs()```.\n",
"- **Contrast**: Alters the difference between light and dark regions in the image. Use ```cv2.convertScaleAbs()```.\n",
"- **Saturation**: Modifies the intensity of the colors in the image. Use ```cv2.cvtColor(), cv2.split(), and cv2.merge()```."
]
},
{
Expand Down Expand Up @@ -1370,7 +1376,8 @@
"- ```A.ColorJitter``` for color jittering.\n",
"\n",
"Albumentations can also be used for image normalization (```A.Normalize```), resizing (```A.Resize```), and converting images to PyTorch tensors with the (Channel, Height, Width) format using ```A.ToTensorV2```, which is required for model training.\n",
"Apply the following transformations only to the training set, as the validation set should remain as close as possible to the test set. Therefore, no transformations should be applied to it.\n",
"\n",
"**NOTE: Apply the following transformations only to the training set, as the validation set should remain as close as possible to the test set. Therefore, no transformations should be applied to it.**\n",
"\n",
"```python\n",
"A.Affine(scale = (0.2, 1.5), p = 0.1),\n",
Expand Down Expand Up @@ -1493,7 +1500,7 @@
"id": "82",
"metadata": {},
"source": [
"### Model Training Overview\n",
"#### Model Training Overview\n",
"\n",
"Model training involves a sequence of key steps.\n",
"The first step is to check which computational devices are available.\n",
Expand Down Expand Up @@ -1576,6 +1583,8 @@
"source": [
"#### Loss function\n",
"\n",
"In this notebook, we use cross entropy loss, which is the standard loss function for classification tasks. To build intuition, imagine taking a multiple-choice exam where you do not just pick a single answer, but instead assign a confidence score (probability) to every available option. Cross entropy acts as a very strict grader. If the correct answer is 'Dog', and your model is 99% confident it is a 'Dog', the grader gives a penalty (loss) close to zero. However, if the model is only 10% confident it is a 'Dog', the penalty increases because the model was unsure. Crucially, if the model is 99% confident it is a 'Cat' when it is actually a 'Dog', the penalty skyrockets. Cross entropy heavily penalizes a model for being confidently wrong.\n",
"\n",
"The cross entropy loss function is defined by:\n",
"\n",
"$$\n",
Expand Down Expand Up @@ -1610,7 +1619,21 @@
"id": "89",
"metadata": {},
"source": [
"#### Initialise model architecture"
"#### Initialise model architecture\n",
"\n",
"In this notebook, we are using a Convolutional Neural Network (CNN) as our model architecture. CNNs are deep learning models specifically designed to process visual data, like images. Their core engine is the convolution operation. Imagine sliding a small magnifying glass (called a filter) across an image, step-by-step. Instead of trying to look at the whole picture at once, these filters analyze small, localized patches to detect specific patterns.\n",
"\n",
"As the image passes through the network, a step-by-step recognition process happens: early layers detect simple lines and edges, middle layers combine those into textures or object parts (like a car tire or a dog's ear), and deep layers assemble them to recognize complex shapes.\n",
"\n",
"The overall pipeline looks like this:\n",
"\n",
"1. Start with the raw image input.\n",
"\n",
"2. Pass it through these convolutional blocks—which often shrink the data to keep only the most important details and save memory.\n",
"\n",
"3. Flatten the resulting 2D maps into a 1D vector.\n",
"\n",
"4. Feed that list into a standard classifier to make the final prediction."
]
},
{
Expand All @@ -1630,7 +1653,13 @@
"source": [
"#### Optimiser function\n",
"\n",
"In this notebook, we are using Adam optimiser (```optimizer = optim.Adam(model.parameters(), lr = LR)```) which is one of the most used optimisers in deep neural network optimisation (see [Gentle Introduction to the Adam Optimisation Algorithm for Deep Learning](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)).\n",
"In this notebook, we are using Adam (Adaptive Moment Estimation) optimiser (```optimizer = optim.Adam(model.parameters(), lr = LR)```) which is one of the most used optimisers in deep neural network optimisation (see [Gentle Introduction to the Adam Optimisation Algorithm for Deep Learning](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)).\n",
"\n",
"To understand Adam, it helps to visualize a heavy ball rolling down a bumpy hill toward a valley (the minimum loss). In standard gradient descent, the ball relies only on the gradient ($g_t$) at its exact current location. This can be inefficient, causing it to zig-zag wildly across steep ravines or slow to a crawl on flat plateaus. Adam solves this by keeping track of two historical records, named \"moments\", to guide the ball more intelligently.\n",
"\n",
"First, Adam uses momentum ($m_t$). Just like a heavy ball builds up physical momentum and barrels through tiny bumps without getting thrown off course, Adam remembers the direction of past gradients to maintain a smooth, forward-moving trajectory. Second, it uses an adaptive step size based on the terrain ($v_t$). If the gradient in a specific direction is consistently huge, Adam scales down the learning rate ($\\alpha$) for that parameter so it takes smaller, careful steps and avoids overshooting the valley. Conversely, for parameters with very small, flat gradients, it increases the step size to speed up the journey. The bias corrections ($\\hat{m}_t$ and $\\hat{v}_t$) are simply included to ensure the ball doesn't start its descent too sluggishly from a dead stop.\n",
"\n",
"Mathematically, the parameter update at each step is given by:\n",
"\n",
"The parameter update at each step is given by:\n",
"\n",
Expand Down Expand Up @@ -1725,7 +1754,7 @@
"import os\n",
"import torch\n",
"\n",
"# Model filename\n",
"# Model filename (In case you want to use the already trained model, replace this by model_path = dataset_folder / \"cnn_weights.pt\")\n",
"model_path = \"cnn_weights.pt\"\n",
"\n",
"if os.path.exists(model_path):\n",
Expand Down Expand Up @@ -1761,7 +1790,7 @@
"import pandas as pd\n",
"from matplotlib import pyplot as plt\n",
"\n",
"# Load the training log file\n",
"# Load the training log file (In case you want to use the already trained model, replace this by model_path = dataset_folder / \"training_log.txt\")\n",
"training_log = None\n",
"\n",
"plt.figure()\n",
Expand Down Expand Up @@ -2020,7 +2049,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.13.5"
}
},
"nbformat": 4,
Expand Down