MNIST-classification

Analyze the classic MNIST dataset using DNN, t-SNE, UMAP (VAE maybe coming up later).

MNIST Dataset Overview

MNIST is a dataset of 70,000 grayscale images of handwritten digits (0–9), each of size 28×28 pixels → 784 dimensions . It’s often used for benchmarking machine learning algorithms, especially in visualization tasks.

Sample images:

Goal of Dimensionality Reduction

We want to reduce the high-dimensional data (784D) into 2D or 3D so we can visualize it, while preserving important structure like: - Clusters of similar digits - Separation between different digit classes

This is where t-SNE and UMAP come in.

t-SNE: t-Distributed Stochastic Neighbor Embedding

t-SNE focuses on preserving local structure — meaning it tries to keep nearby points in high-dimensional space nearby in low-dimensional space.

🔍 How it works (simplified): Converts distances between points in high-dimensional space into probabilities (similar to similarities). Does the same in low-dimensional space. Minimizes the difference between these probability distributions using gradient descent . Uses a t-distribution in the low-dimensional space to avoid the "crowding problem". 📊 Applied to MNIST: After applying t-SNE to MNIST, you typically get a 2D plot where: Each point represents a digit image. Points are colored by their true label (0–9). Similar digits cluster together.

⚠️ Pros & Cons:

PROS	CONS
Excellent at preserving local neighborhoods	Computationally expensive
Good for visualizing clusters	Not good at preserving global structure
Random initialization can affect results	Not deterministic

Note: t-SNE tends to create well-separated, tight clusters but may distort the relative distances between clusters.

UMAP: Uniform Manifold Approximation and Projection

UMAP also preserves local structure, but also tries to preserve some global structure, making it better for understanding overall relationships in the data.

🔍 How it works (intuitively): Assumes data lies on a manifold (a curved surface embedded in high-dimensional space). Constructs a topological representation (graph) of the data. Finds a similar graph in low-dimensional space that minimizes differences. UMAP is faster than t-SNE and scales better to larger datasets.

📊 Applied to MNIST: Like t-SNE, UMAP reduces MNIST to 2D/3D for visualization. Digits form clusters with clearer separation and more meaningful spacing between digit classes. Global relationships (e.g., digit 0 far from digit 1, closer to 6 or 9) are often better preserved.

⚠️ Pros & Cons:

PROS	CONS
Faster than t-SNE	Slightly newer and less mature
Preserves both local and some global structure	More hyperparameters to tune
Scalable to large datasets	Can be harder to interpret in edge cases

📈 Visual Comparison on MNIST Here's what you'd typically see when plotting both methods:

METHOD	CLUSTER SHAPE	GLOBAL STRUCTURE	SPEED
t-SNE	Compact, isolated clusters	Poorly preserved	Slow
UMAP	More spread out, connected clusters	Better preserved	Fast

For example, UMAP might show a smooth transition between 5 and 3 if they appear similar in some samples, while t-SNE might treat them as fully separate.

Code examples

See details in notebook

Both t-SNE and UMAP are great tools, but UMAP is often preferred nowadays due to its speed and better preservation of global structure.