Computer Vision?

Computer Vision?

How exactly do computers make sense of visual inputs?

Giving Computers Sight

Computer vision is a field of Artificial Intelligence concerned with giving computers the ability to derive meaningful information from visual inputs such as images and videos.

Essentially, computers are given the ability to 'see'. But there is a big difference between the way humans and computers perceive visual inputs.

Images Are Numerical

Lets consider an image, when we humans see images, we perceive it's colors and the shape of objects in said image. On the other hand when a computer 'sees' an image, it simply sees pixels, more specifically a matrix of pixels where each pixel is a distinct number with the value of each number (pixel) determining its brightness.

Take this numpy array below for instance, it's values increase from 0 to 24 with a corresponding increase in pixel brightness.

prog_pixels.png progressive_px.png

When numbers are put together in a meaningful way, then a corresponding meaningful image is formed. Case in point is the array below where a letter 'T' is formed by elements with a value of 1 outling the shape of a 'T'.

letter_t-1.png
letter_t.png

A computer doesn't know this is a letter T, all it sees are a bunch of zeros and ones and it renders them accordingly. We humans on the other hand know what that image is because we have precived it's shape.

Biological Learning and Machine Learning

The human ability to recognise images is down to past experience, for instance a two year old would not know what a letter T looks like until after seeing it a couple of times and being told exactly what it is. Same can be done to computers via machine learning, if we take different images of a T and tell it exactly what it is, it'll be able to learn patterns from these images and gain the ability to recognise it in future. That's essentially what Computer Vision is all about.