CNN Implementation Comparison

TensorFlow vs PyTorch for MNIST Classification

TensorFlow TensorFlow PyTorch PyTorch

This page compares two different implementations of a Convolutional Neural Network (CNN) for the MNIST digit classification task using pure TensorFlow and PyTorch. Both implementations achieve high accuracy while showcasing the unique features of each framework.

CNN Implementations

TensorFlow VS PyTorch
PyTorch vs TensorFlow Comparison

A detailed comparison of CNN implementations using TensorFlow and PyTorch for MNIST digit classification.

TensorFlow TensorFlow Implementation

Key Features

Pure TensorFlow

Implementation using low-level TensorFlow operations without Keras, providing full control over the architecture

Architecture

2 Convolutional layers (32 and 64 filters) with max pooling, followed by 2 fully connected layers

Training

Custom training loop with gradient tape for automatic differentiation

cnn_pure_tensorflow.py
Pure TensorFlow

                        // Code will be loaded via JavaScript
                    

Implementation Analysis

Data Preprocessing

  • Normalizes pixel values to [0, 1] range
  • Reshapes data to include channel dimension
  • Converts labels to one-hot encoding
  • Handles batching for efficient training

Model Architecture

  • 2 Convolutional layers (32 and 64 filters)
  • Max pooling for spatial dimension reduction
  • 2 Fully connected layers (128 neurons, 10 output)
  • ReLU activation functions

Training Process

  • Custom gradient computation with GradientTape
  • Adam optimizer with 0.001 learning rate
  • Batch size of 128 for stable training
  • 10 epochs of training

PyTorch PyTorch Implementation

Key Features

Object-Oriented Design

Clean, modular implementation using PyTorch's nn.Module for better code organization

Dynamic Computation

Dynamic computational graphs for flexible model definition and easier debugging

Built-in Tools

Leverages PyTorch's DataLoader, optimizers, and loss functions

pytorch_cnn.py
PyTorch

Implementation Analysis

Data Management

  • Uses torchvision.datasets for MNIST loading
  • Custom normalization (μ=0.1307, σ=0.3081)
  • Parallel data loading with num_workers=2
  • Efficient batch processing with DataLoader

Network Structure

  • Identical architecture to TensorFlow version
  • Added dropout (0.25) for regularization
  • Dynamic input size calculation
  • Cleaner forward pass definition

Training Configuration

  • Smaller batch size (64) for better generalization
  • Same learning rate (0.001)
  • Explicit train/eval mode switching
  • Progress tracking with tqdm

Training Results

MNIST Dataset

MNIST Dataset Visualization - Sample training images showing digit variety

Model Architecture

Model Architecture Summary - Layer-by-layer network structure

Training Process

Training Progress - Epoch-wise training metrics

Training Metrics

Loss and Accuracy Curves - Training convergence visualization

Model Predictions

Model Predictions - Test set prediction examples

Framework Comparison

TensorFlow

TensorFlow Advantages

  • Fine-grained control over operations
  • Explicit gradient computation
  • Efficient static graphs
  • Production-ready serving
  • Extensive visualization with TensorBoard
PyTorch

PyTorch Advantages

  • Intuitive object-oriented design
  • Dynamic computational graphs
  • Native Python integration
  • Easier debugging
  • More pythonic coding style

Code Comparison

TensorFlow Model

# TensorFlow CNN Model Definition
class CNNModel(tf.keras.Model):
    def __init__(self):
        super(CNNModel, self).__init__()
        # First Convolutional Block
        self.conv1 = tf.keras.layers.Conv2D(32, 3, activation='relu')
        self.pool1 = tf.keras.layers.MaxPool2D((2, 2))
        self.dropout1 = tf.keras.layers.Dropout(0.25)
        
        # Second Convolutional Block
        self.conv2 = tf.keras.layers.Conv2D(64, 3, activation='relu')
        self.pool2 = tf.keras.layers.MaxPool2D((2, 2))
        self.dropout2 = tf.keras.layers.Dropout(0.25)
        
        # Dense Layers
        self.flatten = tf.keras.layers.Flatten()
        self.dense1 = tf.keras.layers.Dense(128, activation='relu')
        self.dropout3 = tf.keras.layers.Dropout(0.5)
        self.dense2 = tf.keras.layers.Dense(10, activation='softmax')
    
    def call(self, x, training=False):
        # First Block
        x = self.conv1(x)
        x = self.pool1(x)
        if training:
            x = self.dropout1(x)
        
        # Second Block
        x = self.conv2(x)
        x = self.pool2(x)
        if training:
            x = self.dropout2(x)
        
        # Dense Layers
        x = self.flatten(x)
        x = self.dense1(x)
        if training:
            x = self.dropout3(x)
        return self.dense2(x)
                                
PyTorch Model

# PyTorch CNN Model Definition
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        # First Convolutional Block
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.25)
        )
        
        # Second Convolutional Block
        self.conv2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.25)
        )
        
        # Dense Layers
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 5 * 5, 128),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(128, 10),
            nn.Softmax(dim=1)
        )
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return self.fc(x)
                                
TensorFlow Data Loading

# TensorFlow Data Loading and Preprocessing
def load_and_preprocess_data():
    # Load MNIST dataset
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    
    # Normalize and reshape data
    x_train = x_train.astype('float32') / 255.0
    x_test = x_test.astype('float32') / 255.0
    
    # Add channel dimension
    x_train = x_train[..., tf.newaxis]
    x_test = x_test[..., tf.newaxis]
    
    # Create data pipeline
    train_ds = tf.data.Dataset.from_tensor_slices(
        (x_train, y_train)
    ).shuffle(10000).batch(32).prefetch(tf.data.AUTOTUNE)
    
    test_ds = tf.data.Dataset.from_tensor_slices(
        (x_test, y_test)
    ).batch(32).prefetch(tf.data.AUTOTUNE)
    
    return train_ds, test_ds

# Data Augmentation
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomRotation(0.1),
    tf.keras.layers.RandomZoom(0.1),
])

# Usage Example
train_ds, test_ds = load_and_preprocess_data()
for images, labels in train_ds:
    # Apply augmentation during training
    augmented_images = data_augmentation(images, training=True)
    # Train step...
                                
PyTorch Data Loading

# PyTorch Data Loading and Preprocessing
def load_and_preprocess_data():
    # Define transformations
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # Data augmentation for training
    train_transform = transforms.Compose([
        transforms.RandomRotation(10),
        transforms.RandomAffine(0, translate=(0.1, 0.1)),
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # Load MNIST dataset
    train_dataset = datasets.MNIST(
        './data', 
        train=True,
        download=True,
        transform=train_transform
    )
    
    test_dataset = datasets.MNIST(
        './data', 
        train=False,
        transform=transform
    )
    
    # Create data loaders
    train_loader = DataLoader(
        train_dataset,
        batch_size=32,
        shuffle=True,
        num_workers=2,
        pin_memory=True
    )
    
    test_loader = DataLoader(
        test_dataset,
        batch_size=32,
        shuffle=False,
        num_workers=2,
        pin_memory=True
    )
    
    return train_loader, test_loader

# Usage Example
train_loader, test_loader = load_and_preprocess_data()
for batch_idx, (data, target) in enumerate(train_loader):
    # Move to GPU if available
    data, target = data.to(device), target.to(device)
    # Train step...
                                
TensorFlow Training

```python
# TensorFlow Training Loop
with tf.GradientTape() as tape:
    predictions = model(images, training=True)
    loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, 
                            model.trainable_variables))
```
                                
PyTorch Training

```python
# PyTorch Training Loop
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
```
                                

Performance Analysis

Training Speed

Comparison of training time per epoch

Memory Usage

GPU memory consumption during training

Learning Resources

Hamdi Abdeljawed

H.Abdeljawed
№ 2024

"Bridging the gap between TensorFlow and PyTorch, one implementation at a time."