Handwritten-Digit-Classification

Handwritten Digit Classification with Neural Network

CI/CD Pipeline Python 3.8+ License: MIT

A professional, modular, and extensible implementation of a neural network for classifying handwritten digits from the MNIST dataset. This project demonstrates best practices in machine learning engineering, including proper code organization, testing, logging, and CI/CD integration.

Table of Contents

Features

Core Functionality

Model Capabilities

Evaluation & Visualization

Engineering Best Practices

Project Structure

Handwritten-Digit-Classification/
├── .github/
│   └── workflows/
│       └── ci.yml                 # GitHub Actions CI/CD pipeline
├── src/
│   └── mnist_classifier/
│       ├── __init__.py            # Package initialization
│       ├── config.py              # Configuration management
│       ├── data_loader.py         # Data loading and preprocessing
│       ├── model.py               # Neural network model definition
│       ├── trainer.py             # Training logic and callbacks
│       ├── evaluator.py           # Model evaluation and metrics
│       ├── visualization.py       # Plotting and visualization utilities
│       └── cli.py                 # Command-line interface
├── tests/
│   ├── __init__.py
│   ├── test_config.py             # Configuration tests
│   ├── test_data_loader.py        # Data loading tests
│   ├── test_model.py              # Model tests
│   └── test_trainer.py            # Training tests
├── models/                        # Saved models (gitignored)
├── results/
│   ├── plots/                     # Generated visualizations
│   └── logs/                      # Training logs and history
├── data/                          # Dataset cache (gitignored)
├── train.py                       # Main training script
├── predict.py                     # Prediction script
├── requirements.txt               # Production dependencies
├── requirements-dev.txt           # Development dependencies
├── setup.py                       # Package installation script
├── pytest.ini                     # Pytest configuration
├── .gitignore                     # Git ignore rules
├── LICENSE                        # MIT License
└── README.md                      # This file

Installation

Prerequisites

Basic Installation

  1. Clone the repository:
    git clone https://github.com/pyenthusiasts/Handwritten-Digit-Classification.git
    cd Handwritten-Digit-Classification
    
  2. Install dependencies:
    pip install -r requirements.txt
    

Development Installation

For development with testing and code quality tools:

pip install -r requirements-dev.txt

Package Installation

Install as a Python package:

pip install -e .

This enables you to use the package from anywhere and provides command-line entry points.

Quick Start

Train a Model with Default Settings

python train.py

This will:

Make Predictions

python predict.py --model-path models/best_model.h5 --num-samples 10

Usage

Training a Model

Basic Training

python train.py

Custom Hyperparameters

python train.py \
    --epochs 20 \
    --batch-size 256 \
    --learning-rate 0.001 \
    --hidden-units 256 128 64 \
    --dropout-rate 0.3

Enable Early Stopping

python train.py --early-stopping

Verbose Output

python train.py --verbose

Making Predictions

Predict Random Samples

python predict.py --model-path models/best_model.h5 --num-samples 20

Show Only Misclassified Samples

python predict.py \
    --model-path models/best_model.h5 \
    --show-misclassified \
    --num-samples 10

Save Prediction Visualizations

python predict.py \
    --model-path models/best_model.h5 \
    --save-predictions \
    --output-dir results/predictions

Advanced Options

Training Options

Option Description Default
--epochs Number of training epochs 10
--batch-size Batch size for training 128
--learning-rate Learning rate for optimizer 0.001
--validation-split Fraction of data for validation 0.2
--hidden-units Hidden layer sizes (space-separated) 128 64
--dropout-rate Dropout rate for regularization 0.2
--early-stopping Enable early stopping False
--no-callbacks Disable training callbacks False
--no-plots Disable plot generation False
--model-dir Directory to save models models
--results-dir Directory to save results results
--seed Random seed for reproducibility 42
--verbose Enable verbose output False

Prediction Options

Option Description Default
--model-path Path to trained model (required) -
--num-samples Number of samples to predict 10
--show-misclassified Show only misclassified samples False
--save-predictions Save prediction visualizations False
--output-dir Directory to save outputs results/predictions
--verbose Enable verbose output False

API Reference

Using the Package in Python

from mnist_classifier import (
    Config,
    MNISTDataLoader,
    MNISTModel,
    ModelTrainer,
    ModelEvaluator,
    Visualizer,
)

# Create configuration
config = Config(
    epochs=15,
    batch_size=256,
    hidden_units=(256, 128, 64),
    dropout_rate=0.3
)

# Load data
data_loader = MNISTDataLoader(normalize=True, categorical=True)
(X_train, y_train), (X_test, y_test) = data_loader.load_data()

# Build and compile model
model = MNISTModel(config)
model.build()
model.compile()

# Train model
trainer = ModelTrainer(model, config)
history = trainer.train(X_train, y_train)

# Evaluate model
evaluator = ModelEvaluator(model)
metrics = evaluator.evaluate(X_test, y_test)

# Visualize results
visualizer = Visualizer(config)
visualizer.plot_training_history(history)

# Save model
model.save("my_model.h5")

Key Classes

Config

Configuration dataclass for all hyperparameters and settings.

MNISTDataLoader

Handles loading and preprocessing the MNIST dataset.

Methods:

MNISTModel

Neural network model wrapper.

Methods:

ModelTrainer

Handles model training with callbacks and history tracking.

Methods:

ModelEvaluator

Model evaluation and metrics computation.

Methods:

Visualizer

Visualization utilities for plots and reports.

Methods:

Development

Setting Up Development Environment

# Clone the repository
git clone https://github.com/pyenthusiasts/Handwritten-Digit-Classification.git
cd Handwritten-Digit-Classification

# Install development dependencies
pip install -r requirements-dev.txt

# Install package in editable mode
pip install -e .

Code Quality

Format Code with Black

black src/ tests/ train.py predict.py

Sort Imports with isort

isort src/ tests/ train.py predict.py

Lint with flake8

flake8 src/ tests/ train.py predict.py --max-line-length=120

Type Checking with mypy

mypy src/

Testing

Run All Tests

pytest

Run with Coverage

pytest --cov=src/mnist_classifier --cov-report=html

Run Specific Test File

pytest tests/test_model.py -v

Run Tests in Parallel

pytest -n auto

Results

Expected Performance

With default settings, the model typically achieves:

Output Files

After training, you’ll find:

Models:

Visualizations:

Logs:

Contributing

Contributions are welcome! Here’s how you can help:

Steps to Contribute

  1. Fork the repository
    # Click the 'Fork' button on GitHub
    
  2. Clone your fork
    git clone https://github.com/your-username/Handwritten-Digit-Classification.git
    cd Handwritten-Digit-Classification
    
  3. Create a feature branch
    git checkout -b feature/your-feature-name
    
  4. Make your changes
    • Write clean, documented code
    • Add tests for new functionality
    • Ensure all tests pass: pytest
    • Format code: black . and isort .
    • Lint code: flake8 .
  5. Commit your changes
    git add .
    git commit -m "Add: description of your changes"
    
  6. Push to your fork
    git push origin feature/your-feature-name
    
  7. Create a Pull Request
    • Go to the original repository on GitHub
    • Click “New Pull Request”
    • Select your feature branch
    • Describe your changes

Development Guidelines

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Citation

If you use this project in your research or work, please cite:

@software{mnist_digit_classification,
  title = {MNIST Digit Classification with Neural Network},
  author = {MNIST Classifier Team},
  year = {2024},
  url = {https://github.com/pyenthusiasts/Handwritten-Digit-Classification}
}

Support


Made with passion for clean code and machine learning