Image Classification in ML

Prerequisites: Machine Learning Terminology

Semantic Gap#

In order to train a Machine Learning Model that can classify images, we need to first overcome a few challenges, commonly known as the Semantic Gap

When the computer is given an image, it reads the data as a 3d-matrix/array defined by its dimensions and RGB pixels. A typical matrix size would be [800, 600, 3], with the 800 and 600 being dimensions and the 3 representing the 3 RGB (red, green blue) channels.

Viewpoint Variation#

If the “camera” is rotated or moved when taking a picture, all pixels will change.

Illumination#

Similarly, the lighting of the photo can also affect the picture.

Deformation#

Objects can also be “deformed”, and photos can contain objects with an unusual appearance.

Occlusion#

Objects can also be occluded or blocked by other objects. For example, a cat under a couch cushion would be hard for a computer distinguish, although humans can instantly recognize that there is a cat.

Background Clutter#

If the object blends into the background or vice versa, the object may be hard for a computer to distinguish.

Intraclass Variation#

Lastly, objects such as cats can have various species of different colors, sizes, and shapes.

Convolutional Neural Network#

To counter these challenges, we use something called a Convolutional Neural Network

Stanford’s cs231n 2018 Course: “Lecture 2 | Image Classification”

cs231n slides

cs231n video

Created By: WHS Comp Sci Club Officers