Image Representation and Fundamentals
Understanding how digital images are represented, stored, and processed
What is an Image?
In the broadest sense, an image is a two-dimensional function f(x, y) of spatial coordinates x and y,
where the value of f(x, y) (often referred to as intensity or gray level) represents the brightness or color at each point.
When we talk about digital images, we specifically mean that this continuous function has been discretized into
a finite grid of pixels (picture elements). Each pixel corresponds to a specific spatial location, and its numerical value(s)
encode intensity or color information.
From a practical standpoint, digital images underlie numerous applications, ranging from general photography to highly specialized
fields like medical imaging and satellite remote sensing. Understanding how these pixel values are stored, manipulated, and interpreted
is essential for designing and implementing effective image processing techniques.
Types of Digital Images
Digital images vary not only in resolution or size but also in how they store color and intensity data.
Below are the most common types you will encounter:
-
Grayscale Images:
Each pixel has a single intensity value. For an 8-bit grayscale image, this value typically ranges from 0 (black) to 255 (white).
Grayscale images are common in medical imaging (e.g., X-rays, CT scans) and are frequently used in image processing algorithms
because they simplify computations by focusing on intensity rather than color channels.
-
Binary Images (Black & White):
Pixels take on only two values, commonly 0 and 1 (or 0 and 255 in an 8-bit representation).
Binary images are instrumental in tasks like segmentation, thresholding, and pattern recognition where
a scene is divided into foreground and background.
-
RGB (Color) Images:
Each pixel consists of three intensity values representing the Red, Green, and Blue channels.
Combining these channels in different proportions reproduces a wide range of colors (additive color mixing).
Color images provide richer information about scenes but also increase the complexity of processing and storage.
-
Indexed Images (Paletted Images):
Instead of storing full color information for each pixel, indexed images store integer indices that map to an entry in a predefined colormap or palette.
This can reduce the storage requirements when the number of distinct colors in an image is limited.
Image Resolution
The term resolution can refer to multiple concepts in digital imaging, all of which influence
how well details or changes in intensity can be represented:
-
Spatial Resolution:
Defines the total number of pixels along the width and height of an image, often stated as width × height
(e.g., 1920 × 1080). Higher spatial resolution allows for more detailed imagery, but also increases file size and computational overhead.
-
Bit Depth:
Refers to the number of bits used per pixel to encode intensity or color information.
An 8-bit grayscale image can represent 256 (28) possible intensity levels,
while a 16-bit grayscale image can represent up to 65,536 (216) levels,
providing finer gradations that are especially useful in medical or scientific applications where subtle intensity differences matter.
-
Temporal Resolution (Video or Sequential Images):
In the context of videos or time-lapse imaging, temporal resolution refers to the number of frames captured per second (fps).
Higher frame rates (e.g., 60 fps) provide smoother motion but require more data storage and higher processing power.
Color Spaces
A color space defines how color information is structured and interpreted. Various color spaces
offer different advantages for processing, visualization, and hardware implementation:
-
RGB (Red, Green, Blue):
The most common color space in digital displays and computer graphics.
Each color is formed by an additive mixture of the Red, Green, and Blue primary components.
-
HSV (Hue, Saturation, Value):
Also referred to as HSB (Hue, Saturation, Brightness).
This space separates the color into “Hue” (which wavelength or color family),
“Saturation” (intensity or purity of the color), and “Value” (overall brightness).
HSV is often more intuitive for interactive color adjustments.
-
CMYK (Cyan, Magenta, Yellow, Key/Black):
A subtractive color model typically used in printing.
Instead of adding light, inks absorb (subtract) portions of the light spectrum.
-
Grayscale:
A special case of color spaces that encodes only intensity, making it effectively a single-channel image.
Image File Formats
Images are stored in a variety of file formats, each optimized for particular use-cases regarding compression, quality, and metadata:
-
JPEG (Joint Photographic Experts Group):
Uses lossy compression to significantly reduce file size while maintaining reasonable image quality.
Commonly used for photographs and web images but not ideal where high fidelity is critical (e.g., diagnostic medical images).
-
PNG (Portable Network Graphics):
Provides lossless compression, preserving exact pixel values.
Useful for graphics, logos, and images requiring transparency or exact reproduction.
-
BMP (Bitmap):
A raw, uncompressed format that directly stores the color of each pixel.
While simple to parse, BMP files can be quite large and are less common in modern workflows.
-
DICOM (Digital Imaging and Communications in Medicine):
The standard format in medical imaging, containing not only pixel data but also extensive metadata (patient ID, modality, acquisition parameters, etc.).
Its standardized structure facilitates interoperability between medical devices and systems.
Image Representation in Programming
In most programming environments, images are represented as multi-dimensional arrays.
For grayscale images, a 2D array (height × width) of intensity values is sufficient.
For color images, the array often has an additional dimension for channels (height × width × channels).
Below are simple code examples illustrating how images might be loaded, displayed, and inspected in Python and MATLAB.
import cv2
import numpy as np
# Load an image in grayscale
image = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
# Check if image is loaded correctly
if image is None:
print("Error: Could not load image.")
else:
# Get image dimensions
height, width = image.shape
print(f"Height: {height}, Width: {width}")
# Display the image
cv2.imshow('Grayscale Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
% Load an image in grayscale
image = imread('image.jpg');
gray_image = rgb2gray(image);
% Get image dimensions
[height, width] = size(gray_image);
disp(['Height: ', num2str(height), ', Width: ', num2str(width)]);
% Display the image
imshow(gray_image);
title('Grayscale Image');
Image Processing Fundamentals
Once images are digitized, a variety of fundamental operations can be applied to analyze, enhance, or transform them.
These building blocks serve as the basis for more complex algorithms in computer vision, deep learning, and beyond:
-
Filtering:
Includes operations such as smoothing (e.g., Gaussian blur to reduce noise), sharpening (e.g., unsharp masking),
and edge detection (e.g., Sobel or Canny filters).
Filtering manipulates pixel values to highlight or suppress specific frequencies or features in the image.
-
Geometric Transformations:
Operations like rotation, scaling (resizing), and translation (shifting) that re-map pixel locations in an image.
These transformations are common in image registration, alignment, or to correct for perspective distortions.
-
Segmentation:
The process of partitioning an image into meaningful regions (e.g., distinguishing foreground from background).
Segmentation is critical in medical imaging, object detection, and other advanced applications where precise localization of structures or objects is required.
Beyond these fundamentals, image processing also encompasses morphological operations, frequency domain analysis (Fourier transforms),
feature extraction, pattern recognition, and machine learning-based methods.
Mastery of these core topics provides a strong foundation for tackling advanced research or industrial projects in imaging.
Further Learning Resources
Below are some useful references and tools to deepen your understanding of image representation and processing: