Updated On : Apr-26,2022 Time Investment : ~30 mins

PyTorch: Conv1D For Text Classification Tasks

When working with text data for machine learning tasks, it has been proven that recurrent neural networks (RNNs) perform better compared to any other network type. The common reason behind this is that text data has a sequence of a kind (words appearing in a particular sequence according to grammar) and RNNs like vanilla RNN, LSTM and GRU are good at capturing this sequence compared to other network types. But recent studies have shown that 1D convolution is also good at capturing sequence. Apart from that, CNNs with 1D convolution requires fewer parameters to train and train faster compared to RNNs. Hence, we'll be concentrating on 1D convolution for text classification tasks in this tutorial.

As a part of this tutorial, we have explained how we can use 1D convolution layers in neural networks designed using PyTorch for text classification tasks. We have used word embeddings approach to encoding text data before giving it to the convolution layer (see example image explaining word embeddings below). We have tried different approaches to using 1D convolution layers for comparison purposes. After trying different approaches, we have also evaluated their performance by calculating various ML metrics and explained their predictions by generating SHAP values.

PyTorch: Conv1D For Text Classification Tasks

Below, we have listed important sections of tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Build Vocabulary
  2. Approach 1: Single Conv1D Layer Network (Max Tokens=50, Embedding Length=128, Conv Output Channels=32)
    • Load Dataset And Create Data Loaders
    • Define Network
    • Train Network
    • Evaluate Network Performance
    • Explain Predictions using SHAP Values
  3. Approach 2: Single Conv1D Layer Network (Max Tokens=50, Embedding Length=256, Conv Output Channels=64)
  4. Approach 3: Multiple Conv1D Layers Network (Max Tokens=50, Embedding Length=256, Conv Output Channels=32,32
  5. Results Summary And Further Recommendations

Below, we have imported the necessary python libraries that we have used in our tutorial and printed the versions as well.

import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.9.1+cpu
import torchtext

print("TorchText Version : {}".format(torchtext.__version__))
TorchText Version : 0.10.1
import shap

print("SHAP Version : {}".format(shap.__version__))
SHAP Version : 0.40.0
shap.initjs()

PyTorch: Conv1D For Text Classification Tasks

1. Build Vocabulary

In this section, we have prepared a vocabulary that will be used to map each token of data with a unique index. This index will then be used by the embedding layer to generate the embedding of that token.

A vocabulary is a simple dictionary that has mapping from tokens (words, punctuations, etc) to a unique integer index. Each token appearing in data is assigned a unique integer index starting from 0.

We'll be using AG NEWS dataset available from torchtext library for our tutorial. The dataset has text documents for 4 different news categories (["World", "Sports", "Business", "Sci/Tech"]). Below, we have the first loaded dataset. The dataset is already divided into train and test sets.

train_dataset, test_dataset = torchtext.datasets.AG_NEWS()
train.csv: 29.5MB [00:00, 78.3MB/s]
test.csv: 1.86MB [00:00, 51.4MB/s]

Below, we have included code to populate vocabulary. First, we have defined a tokenizer. We have loaded simple tokenizer using get_tokenizer() function available from data sub-module of torchtext module. The tokenizer is a simple function that takes a text document as input and returns a list of tokens in that text document.

After defining tokenizer, we have populated a vocabulary by calling build_vocab_from_iterator() function available from vocab sub-module of torchtext module. The function takes an iterator as input which returns a list of tokens on each call. We have defined a simple iterator named build_vocabulary() which takes a list of datasets as input. It then loops through each dataset and each text example of those datasets yielding a list of tokens for each text example. Each call to iterator returns tokens of a single text example.

After populating a vocabulary, we have also printed the size of the vocabulary. We have also explained with a simple example how we can tokenize a text document and retrieve the indexes of tokens using vocabulary.

from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

tokenizer = get_tokenizer("basic_english")

def build_vocabulary(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, test_dataset]), min_freq=1, specials=["<UNK>",])
vocab.set_default_index(vocab["<UNK>"])
print("Vocabulary Size : {}".format(len(vocab)))
Vocabulary Size : 98635
tokens = tokenizer("Hello, how are you?, Welcome to CoderzColumn!!")
indexes = vocab(tokens)

tokens, indexes
(['hello',
  ',',
  'how',
  'are',
  'you',
  '?',
  ',',
  'welcome',
  'to',
  'coderzcolumn',
  '!',
  '!'],
 [12388, 3, 355, 42, 164, 80, 3, 3298, 4, 0, 747, 747])
vocab["<UNK>"] ## Coderzcolumn word is mapped to unknown as it's new and not present in vocabulary
0

Approach 1: Single Conv1D Layer Network (Max Tokens=50, Embedding Length=128, Conv Output Channels=32)

Our first approach designs a simple neural network that consists of a single 1D convolution layer. The network consists of one embedding layer, one 1D convolution layer, and one dense layer. We'll be treating the embedding dimension as the channels dimension when applying convolution operation.

Load Dataset And Create Data Loaders

In this section, we have loaded datasets and created data loaders (train and test) from them. We have set a batch size of 1024 when creating data loaders which will inform them to return a batch of 1024 text examples on each call. We also have provided a vectorization function that will be responsible for encoding text data. The vectorization function takes a batch of data which consists of text examples and their respective target labels as input. It then tokenizes each text example and retrieves respective indexes of tokens using vocabulary. After retrieving indexes of tokens, we have made sure that each text example consists of maximum of 50 tokens. We have appended text example that has less than 50 tokens with 0s and truncated text examples that has more than 50 tokens.

These data loaders will be used during the training process to retrieve data in batches. On each call, it'll return a list of indexes of text examples and their respective target labels.

from torch.utils.data import DataLoader
from torchtext.data.functional import to_map_style_dataset

train_dataset, test_dataset = torchtext.datasets.AG_NEWS()
train_dataset, test_dataset = to_map_style_dataset(train_dataset), to_map_style_dataset(test_dataset)

target_classes = ["World", "Sports", "Business", "Sci/Tech"]

max_tokens = 50

def vectorize_text(batch):
    Y, X = list(zip(*batch))
    X = [vocab(tokenizer(text)) for text in X]
    X = [tokens+([0]* (max_tokens-len(tokens))) if len(tokens)<max_tokens else tokens[:max_tokens] for tokens in X] ## Bringing all samples to max_tokens length.

    return torch.tensor(X, dtype=torch.int32), torch.tensor(Y) - 1 ## We have deducted 1 from target names to get them in range [0,1,2,3] from [1,2,3,4]

train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_text, shuffle=True)
test_loader  = DataLoader(test_dataset,  batch_size=1024, collate_fn=vectorize_text)
for X, Y in train_loader:
    print(X.shape, Y.shape)
    break
torch.Size([1024, 50]) torch.Size([1024])

Define Network

In this section, we have defined a simple network that we'll be using for our text classification task. The network consists of 3 layers.

  • Embedding Layer
  • Conv1D Layer
  • Linear Layer

The first layer is the embedding layer which takes as an input list of indexes of tokens and returns their respective embeddings. We have created an embedding layer using Embedding() constructor. The number of embeddings is the same as the length of vocabulary and the length of individual embeddings is 128. When we create an embedding layer, it creates a weight matrix of shape (vocab_len, embed_len) which has embeddings for each token of vocabulary. The embedding layer simply maps the index of tokens to their respective embeddings. The input to embedding layer is of shape (batch_size, max_tokens) and the returned output has shape (batch_size, max_tokens, embed_len). It'll simply return the embeddings of each token.

The output of embedding layer is reshaped in forward() method from shape (batch_size, max_tokens, embed_len) = (batch_size, 50, 128) to (batch_size, embed_len, max_tokens) = (batch_size, 128, 50). The reason behind doing this operation is that we want to treat embedding length dimension as channels dimension on which Conv1D layer will work and PyTorch requires that channels dimension should be first dimension after batch size dimension.

The reshaped output is given to Conv1D layer. The Conv1D layer has been initialized with 32 channels and kernel size of 7. The 1D convolution layer will translate data from shape (batch_size, embed_len, max_tokens) = (batch_size, 128, 50) to (batch_size, 32, max_tokens) = (batch_size, 32, 50) by applying convolution operation. We have then applied relu activation function to the output of Conv1D layer.

Then, we have performed max operation on the relu output at the last dimension. Hence, we get an output of shape (batch_size, 32) which is given to the linear layer as input. The linear layer has 4 output units which are the same as the number of target classes.

After defining the network, we initialized it, printed the shape of weights/biases of layers, and performed a forward pass to make predictions for verification purposes.

If you are new to PyTorch and want to learn how to create neural networks using it then we recommend that you go through the below links which will help you get started.

from torch import nn
from torch.nn import functional as F

embed_len = 128

class Conv1DClassifier(nn.Module):
    def __init__(self):
        super(Conv1DClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.conv1 = nn.Conv1d(embed_len, 32, kernel_size=7, padding="same")
        self.linear = nn.Linear(32, len(target_classes))

    def forward(self, X_batch):
        x = self.embedding_layer(X_batch)
        x = x.reshape(len(x), embed_len, max_tokens) ## Embedding Length needs to be treated as channel dimension

        x = F.relu(self.conv1(x))

        x, _ = x.max(dim=-1)

        x = self.linear(x)

        return x
conv_classifier = Conv1DClassifier()

conv_classifier
Conv1DClassifier(
  (embedding_layer): Embedding(98635, 128)
  (conv1): Conv1d(128, 32, kernel_size=(7,), stride=(1,), padding=same)
  (linear): Linear(in_features=32, out_features=4, bias=True)
)
for layer in conv_classifier.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()
Layer : Embedding(98635, 128)
Parameters :
torch.Size([98635, 128])

Layer : Conv1d(128, 32, kernel_size=(7,), stride=(1,), padding=same)
Parameters :
torch.Size([32, 128, 7])
torch.Size([32])

Layer : Linear(in_features=32, out_features=4, bias=True)
Parameters :
torch.Size([4, 32])
torch.Size([4])

out = conv_classifier(torch.randint(0, len(vocab), (1024, max_tokens)))

out.shape
torch.Size([1024, 4])

Train Network

In this section, we are training the neural network we defined in the previous section. We have defined a function that we'll use for training purposes. The function takes model, loss function, optimizer, train data loader, validation data loader, and a number of epochs as input. It then executes a training loop number of epochs time. For each epoch, it loops through whole training data in batches. For each batch of data, it performs a forward pass to make predictions, calculates loss, calculates gradients, and updates network parameters. It prints the average loss of all batches at the end of each epoch as well. After completion of the epoch, we have calculated validation loss and accuracy using another helper function.

from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def CalcValLossAndAccuracy(model, loss_fn, val_loader):
    with torch.no_grad():
        Y_shuffled, Y_preds, losses = [],[],[]
        for X, Y in val_loader:
            preds = model(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())

            Y_shuffled.append(Y)
            Y_preds.append(preds.argmax(dim=-1))

        Y_shuffled = torch.cat(Y_shuffled)
        Y_preds = torch.cat(Y_preds)

        print("Valid Loss : {:.3f}".format(torch.tensor(losses).mean()))
        print("Valid Acc  : {:.3f}".format(accuracy_score(Y_shuffled.detach().numpy(), Y_preds.detach().numpy())))


def TrainModel(model, loss_fn, optimizer, train_loader, val_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X)

            loss = loss_fn(Y_preds, Y)
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))
        CalcValLossAndAccuracy(model, loss_fn, val_loader)

Below, we are actually training our network by calling the training function. First, we have initialized a number of epochs to 15 and the learning rate to 0.001. Then, we have initialized cross entropy loss, our text classification network, and Adam optimizer. At last, we have called our training routing with the necessary parameters to perform the training process. We can notice from the loss and accuracy value getting printed after each epoch that our model is doing a good job at the text classification task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
conv_classifier = Conv1DClassifier()
optimizer = Adam(conv_classifier.parameters(), lr=learning_rate)

TrainModel(conv_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:40<00:00,  2.93it/s]
Train Loss : 1.338
Valid Loss : 1.231
Valid Acc  : 0.461
100%|██████████| 118/118 [00:38<00:00,  3.08it/s]
Train Loss : 1.016
Valid Loss : 0.835
Valid Acc  : 0.674
100%|██████████| 118/118 [00:37<00:00,  3.11it/s]
Train Loss : 0.622
Valid Loss : 0.577
Valid Acc  : 0.783
100%|██████████| 118/118 [00:37<00:00,  3.12it/s]
Train Loss : 0.428
Valid Loss : 0.485
Valid Acc  : 0.821
100%|██████████| 118/118 [00:37<00:00,  3.14it/s]
Train Loss : 0.327
Valid Loss : 0.438
Valid Acc  : 0.844
100%|██████████| 118/118 [00:38<00:00,  3.10it/s]
Train Loss : 0.261
Valid Loss : 0.416
Valid Acc  : 0.853
100%|██████████| 118/118 [00:38<00:00,  3.06it/s]
Train Loss : 0.212
Valid Loss : 0.407
Valid Acc  : 0.859
100%|██████████| 118/118 [00:38<00:00,  3.07it/s]
Train Loss : 0.173
Valid Loss : 0.402
Valid Acc  : 0.862
100%|██████████| 118/118 [00:38<00:00,  3.10it/s]
Train Loss : 0.141
Valid Loss : 0.405
Valid Acc  : 0.865
100%|██████████| 118/118 [00:38<00:00,  3.07it/s]
Train Loss : 0.113
Valid Loss : 0.411
Valid Acc  : 0.866
100%|██████████| 118/118 [00:38<00:00,  3.04it/s]
Train Loss : 0.091
Valid Loss : 0.421
Valid Acc  : 0.863
100%|██████████| 118/118 [00:38<00:00,  3.08it/s]
Train Loss : 0.072
Valid Loss : 0.429
Valid Acc  : 0.867
100%|██████████| 118/118 [00:38<00:00,  3.09it/s]
Train Loss : 0.056
Valid Loss : 0.445
Valid Acc  : 0.867
100%|██████████| 118/118 [00:38<00:00,  3.05it/s]
Train Loss : 0.043
Valid Loss : 0.455
Valid Acc  : 0.868
100%|██████████| 118/118 [00:39<00:00,  3.02it/s]
Train Loss : 0.033
Valid Loss : 0.467
Valid Acc  : 0.868

Evaluate Network Performance

In this section, we have evaluated the performance of our trained network by calculating accuracy score, classification report (precision, recall, and f1-score per class) and confusion matrix metrics on test predictions. Below, We have first defined a simple function for making predictions on a given input data loader. Then, we have calculated the ML metrics using various functions available from scikit-learn.

If you want to learn about various ML metrics available from sklearn in-depth then we recommend that you go through the below link.

After calculating metrics, we have also plotted the confusion matrix using the Python scikit-plot library. We can notice from the confusion matrix plot that our model is doing a good job for categories Sports and World compared to categories Sci/Tech and Business.

Scikit-plot has an implementation for many ML metric plots. Please feel free to check the below link if you want to learn about various plots.

def MakePredictions(model, loader):
    Y_shuffled, Y_preds = [], []
    for X, Y in loader:
        preds = model(X)
        Y_preds.append(preds)
        Y_shuffled.append(Y)
    gc.collect()
    Y_preds, Y_shuffled = torch.cat(Y_preds), torch.cat(Y_shuffled)

    return Y_shuffled.detach().numpy(), F.softmax(Y_preds, dim=-1).argmax(dim=-1).detach().numpy()

Y_actual, Y_preds = MakePredictions(conv_classifier, test_loader)
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8676315789473684

Classification Report :
              precision    recall  f1-score   support

       World       0.88      0.88      0.88      1900
      Sports       0.95      0.94      0.94      1900
    Business       0.83      0.83      0.83      1900
    Sci/Tech       0.82      0.82      0.82      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix :
[[1673   55   92   80]
 [  56 1787   21   36]
 [  86   20 1570  224]
 [  93   25  218 1564]]
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Reds",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

PyTorch: Conv1D For Text Classification Tasks

Explain Predictions using SHAP Values

In this section, we have tried to explain the predictions made by our network using SHAP values. We have then created a visualization using these SHAP values which highlights important words in the text that contributed to the prediction of a particular target label.

Please feel free to go through the below links if you are someone new to the concept of SHAP values and interested in learning about them in-depth.

Below, we have simply retrieved text examples from test datasets. In order to generate visualization using SHAP values, we need to follow a list of steps which we have explained next.

X_test_text, Y_test = [], []
for Y, X in test_dataset:
    X_test_text.append(X)
    Y_test.append(Y-1)

len(X_test_text)
7600

In order to generate shap values, we first need to create an instance of Explainer. The explainer requires a prediction function and masker.

First, we have declared a prediction function that takes a list of text examples and returns their prediction probabilities generated by the model. Then, we have declared masker using regular expression. The masker is simply used to hide tokens that don't contribute to predictions which are spaces generally. After defining the prediction function and masker, we have created Explainer using them as parameters. We'll use this explainer later to generate SHAP values.

def make_predictions(X_batch_text):
    with torch.no_grad():
        X_batch = [vocab(tokenizer(text)) for text in X_batch_text]
        X_batch = [tokens+([0]* (max_tokens-len(tokens))) if len(tokens)<max_tokens else tokens[:max_tokens] for tokens in X_batch] ## Bringing all samples to max_tokens length.
        logits_preds = conv_classifier(torch.tensor(X_batch, dtype=torch.int32))
        return F.softmax(logits_preds, dim=-1)

masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)

explainer
<shap.explainers._partition.Partition at 0x7f1cb49b1090>

Below, we have first retrieved two text examples from the test set and made predictions on them using our trained network. We can notice that our network correctly predicts the target label for one text example out of two.

After printing the predicted labels, we have generated SHAP values for the selected text examples by calling our Explainer object with those two examples. Next, we'll visualize those SHAP values to see which words contributed to predicting a particular target label.

X_batch_text = X_test_text[3:5]
X_batch = [vocab(tokenizer(text)) for text in X_batch_text]
X_batch = [tokens+([0]* (max_tokens-len(tokens))) if len(tokens)<max_tokens else tokens[:max_tokens] for tokens in X_batch] ## Bringing all samples to max_tokens length.
logits_preds = conv_classifier(torch.tensor(X_batch, dtype=torch.int32))
preds_proba = F.softmax(logits_preds, dim=-1)
preds = preds_proba.argmax(axis=1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[3:5]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(axis=1).values))

shap_values = explainer(X_batch_text)
Actual    Target Values : ['Sci/Tech', 'Sci/Tech']
Predicted Target Values : ['World', 'Sci/Tech']
Predicted Probabilities : tensor([0.7268, 0.9564], grad_fn=<MaxBackward0>)

Below, we have generated text plot by giving SHAP values to text_plot() method. For the first example, words like 'colorful', 'maps', 'lightning', 'strike', 'forecast', 'wildfires', etc contributed to predicting target label as World and for second example, words like 'calif', 'aims, 'bovine', 'dairy', etc are contributing to the predicting target label as Sci/Tech.

shap.text_plot(shap_values)

PyTorch: Conv1D For Text Classification Tasks

Approach 2: Single Conv1D Layer Network (Max Tokens=50, Embedding Length=256, Conv Output Channels=64)

Our approach in this section is almost exactly the same as our approach in the previous section with minor changes in a few parameters. We are again using a network with only one convolution layer. We have increased embedding length from 128 to 256 and convolution layer output channels from 32 to 64. The majority of the code in this section is the same as in the previous section with only a change in parameter values mentioned.

Define Network

Below, we have defined the network that we'll use for our task in this section. It has exactly the same code as the previous section with only a change in embedding length and convolution layer output channels.

from torch import nn
from torch.nn import functional as F

embed_len = 256

class Conv1DClassifier(nn.Module):
    def __init__(self):
        super(Conv1DClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.conv1 = nn.Conv1d(embed_len, 64, kernel_size=7, padding="same")
        self.linear = nn.Linear(64, len(target_classes))

    def forward(self, X_batch):
        x = self.embedding_layer(X_batch)
        x = x.reshape(len(x), embed_len, max_tokens) ## Embedding Length needs to be treated as channel dimension
        x = F.relu(self.conv1(x))
        x, _ = x.max(dim=-1)
        x = self.linear(x)
        return x

Train Network

Below, we have trained our network using exactly the same settings which we had used in the previous section. We can notice from the loss and accuracy values that our model in this section is also doing a good job at the text classification task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
conv_classifier = Conv1DClassifier()
optimizer = Adam(conv_classifier.parameters(), lr=learning_rate)

TrainModel(conv_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [01:37<00:00,  1.21it/s]
Train Loss : 1.236
Valid Loss : 0.961
Valid Acc  : 0.633
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.678
Valid Loss : 0.548
Valid Acc  : 0.800
100%|██████████| 118/118 [01:33<00:00,  1.27it/s]
Train Loss : 0.377
Valid Loss : 0.440
Valid Acc  : 0.842
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.249
Valid Loss : 0.395
Valid Acc  : 0.862
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.174
Valid Loss : 0.389
Valid Acc  : 0.868
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.121
Valid Loss : 0.383
Valid Acc  : 0.868
100%|██████████| 118/118 [01:33<00:00,  1.27it/s]
Train Loss : 0.085
Valid Loss : 0.401
Valid Acc  : 0.866
100%|██████████| 118/118 [01:34<00:00,  1.25it/s]
Train Loss : 0.055
Valid Loss : 0.397
Valid Acc  : 0.872
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.036
Valid Loss : 0.407
Valid Acc  : 0.873
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.024
Valid Loss : 0.420
Valid Acc  : 0.875
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.016
Valid Loss : 0.431
Valid Acc  : 0.874
100%|██████████| 118/118 [01:34<00:00,  1.25it/s]
Train Loss : 0.012
Valid Loss : 0.439
Valid Acc  : 0.875
100%|██████████| 118/118 [01:33<00:00,  1.26it/s]
Train Loss : 0.010
Valid Loss : 0.451
Valid Acc  : 0.876
100%|██████████| 118/118 [01:33<00:00,  1.27it/s]
Train Loss : 0.008
Valid Loss : 0.456
Valid Acc  : 0.875
100%|██████████| 118/118 [01:34<00:00,  1.25it/s]
Train Loss : 0.007
Valid Loss : 0.468
Valid Acc  : 0.875

Evaluate Network Performance

In this section, we have evaluated the performance of our trained network by calculating accuracy score, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy score that it is a little better compared to our previous approach. We have also plotted the confusion matrix which shows that overall accuracy for all target categories is good.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(conv_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8751315789473684

Classification Report :
              precision    recall  f1-score   support

       World       0.89      0.88      0.89      1900
      Sports       0.94      0.95      0.94      1900
    Business       0.82      0.84      0.83      1900
    Sci/Tech       0.85      0.83      0.84      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix :
[[1677   54  103   66]
 [  38 1796   34   32]
 [  98   24 1600  178]
 [  71   35  216 1578]]
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Reds",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

PyTorch: Conv1D For Text Classification Tasks

Explain Predictions using SHAP Values

In this section, we have explained the predictions made by our network using SHAP values. This time our network is making a correct prediction for both selected text examples. It correctly predicts both as Sci/Tech category. From the visualization, we can notice that words like 'prediction', 'unit', 'forecast', 'wildfire', 'blur', 'charts', etc are contributing to predicting target label as Sci/Tech whereas for second example words like 'emissions', 'california', smog', 'pollution','southern', etc are contributing to predicting target label Sci/Tech.

X_batch_text = X_test_text[3:5]
X_batch = [vocab(tokenizer(text)) for text in X_batch_text]
X_batch = [tokens+([0]* (max_tokens-len(tokens))) if len(tokens)<max_tokens else tokens[:max_tokens] for tokens in X_batch] ## Bringing all samples to max_tokens length.
logits_preds = conv_classifier(torch.tensor(X_batch, dtype=torch.int32))
preds_proba = F.softmax(logits_preds, dim=-1)
preds = preds_proba.argmax(axis=1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[3:5]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(axis=1).values))
Actual    Target Values : ['Sci/Tech', 'Sci/Tech']
Predicted Target Values : ['Sci/Tech', 'Sci/Tech']
Predicted Probabilities : tensor([0.8907, 0.9993], grad_fn=<MaxBackward0>)
masker = shap.maskers.Text(tokenizer=r"\W+")

explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)

shap_values = explainer(X_batch_text)

shap.text_plot(shap_values)

PyTorch: Conv1D For Text Classification Tasks

Approach 3: Multiple Conv1D Layers Network (Max Tokens=50, Embedding Length=256, Conv Output Channels=[32,32])

Our approach in this section uses multi convolution layers network for the text classification task. The majority of the other settings are the same as previous approaches with the only difference that we are using two 1d convolution layers in the network instead of one as in our previous approaches. The maximum tokens are set at 50 and the embedding length is set at 256.

Define Network

Below, we have defined the network that we have used in this section. The network consists of one embedding layer, two convolution layers, and one linear layer. As usual, we have first applied the embedding layer to input data. Then, we applied the first convolution layer followed by max-pooling and the second convolution layer. At last, we have applied a linear layer. Both convolution layer has 32 output channels and a kernel size of 7.

from torch import nn
from torch.nn import functional as F

embed_len = 256

class Conv1DClassifier(nn.Module):
    def __init__(self):
        super(Conv1DClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.conv1 = nn.Conv1d(embed_len, 32, kernel_size=7, padding="same")
        self.conv2 = nn.Conv1d(32, 32, kernel_size=7, padding="same")
        self.pooling = nn.MaxPool1d(2)
        self.linear = nn.Linear(32, len(target_classes))

    def forward(self, X_batch):
        x = self.embedding_layer(X_batch)
        x = x.reshape(len(x), embed_len, max_tokens) ## Embedding Length needs to be treated as channel dimension
        x = F.relu(self.conv1(x))
        x = self.pooling(x)
        x = F.relu(self.conv2(x))
        x, _ = x.max(dim=-1)
        x = self.linear(x)
        return x

Train Network

Here, we have trained our network using the same settings we have used for all our previous approaches. We can notice from the loss and accuracy values getting printed that our network is doing a good job at the classification task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
conv_classifier = Conv1DClassifier()
optimizer = Adam(conv_classifier.parameters(), lr=learning_rate)

TrainModel(conv_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [01:18<00:00,  1.50it/s]
Train Loss : 1.254
Valid Loss : 0.900
Valid Acc  : 0.628
100%|██████████| 118/118 [01:14<00:00,  1.59it/s]
Train Loss : 0.604
Valid Loss : 0.489
Valid Acc  : 0.817
100%|██████████| 118/118 [01:16<00:00,  1.55it/s]
Train Loss : 0.343
Valid Loss : 0.400
Valid Acc  : 0.858
100%|██████████| 118/118 [01:15<00:00,  1.57it/s]
Train Loss : 0.239
Valid Loss : 0.363
Valid Acc  : 0.873
100%|██████████| 118/118 [01:14<00:00,  1.58it/s]
Train Loss : 0.173
Valid Loss : 0.360
Valid Acc  : 0.876
100%|██████████| 118/118 [01:15<00:00,  1.57it/s]
Train Loss : 0.123
Valid Loss : 0.373
Valid Acc  : 0.877
100%|██████████| 118/118 [01:15<00:00,  1.55it/s]
Train Loss : 0.086
Valid Loss : 0.385
Valid Acc  : 0.881
100%|██████████| 118/118 [01:16<00:00,  1.55it/s]
Train Loss : 0.054
Valid Loss : 0.411
Valid Acc  : 0.879
100%|██████████| 118/118 [01:15<00:00,  1.56it/s]
Train Loss : 0.032
Valid Loss : 0.445
Valid Acc  : 0.878
100%|██████████| 118/118 [01:16<00:00,  1.55it/s]
Train Loss : 0.018
Valid Loss : 0.472
Valid Acc  : 0.880
100%|██████████| 118/118 [01:16<00:00,  1.54it/s]
Train Loss : 0.011
Valid Loss : 0.499
Valid Acc  : 0.878
100%|██████████| 118/118 [01:17<00:00,  1.53it/s]
Train Loss : 0.008
Valid Loss : 0.524
Valid Acc  : 0.879
100%|██████████| 118/118 [01:16<00:00,  1.55it/s]
Train Loss : 0.006
Valid Loss : 0.550
Valid Acc  : 0.878
100%|██████████| 118/118 [01:16<00:00,  1.55it/s]
Train Loss : 0.005
Valid Loss : 0.560
Valid Acc  : 0.880
100%|██████████| 118/118 [01:15<00:00,  1.56it/s]
Train Loss : 0.004
Valid Loss : 0.575
Valid Acc  : 0.879

Evaluate Network Performance

In this section, we have evaluated the performance of our trained network by calculating accuracy score, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy score that it is more than our previous approaches. We have also plotted the confusion matrix for reference purposes.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(conv_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8794736842105263

Classification Report :
              precision    recall  f1-score   support

       World       0.88      0.88      0.88      1900
      Sports       0.95      0.94      0.94      1900
    Business       0.85      0.83      0.84      1900
    Sci/Tech       0.84      0.86      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix :
[[1679   52   90   79]
 [  58 1783   19   40]
 [ 105   15 1579  201]
 [  57   30  170 1643]]
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Reds",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

PyTorch: Conv1D For Text Classification Tasks

Explain Predictions using SHAP Values

In this section, we have explained predictions made by the network by generating SHAP values. Our network correctly predicts the target label as Sci/Tech for both selected text examples. We can notice from the visualization that words like 'prediction', 'wildfires', 'dawn', 'charts', 'blur', 'colorful', etc are contributing to predicting target label Sci/Tech for first example and words like 'bovine', 'limit', 'pollution', 'emission', etc are contributing to predicting target label Sci/Tech for second text example.

X_batch_text = X_test_text[3:5]
X_batch = [vocab(tokenizer(text)) for text in X_batch_text]
X_batch = [tokens+([0]* (max_tokens-len(tokens))) if len(tokens)<max_tokens else tokens[:max_tokens] for tokens in X_batch] ## Bringing all samples to max_tokens length.
logits_preds = conv_classifier(torch.tensor(X_batch, dtype=torch.int32))
preds_proba = F.softmax(logits_preds, dim=-1)
preds = preds_proba.argmax(axis=1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[3:5]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(axis=1).values))
Actual    Target Values : ['Sci/Tech', 'Sci/Tech']
Predicted Target Values : ['Sci/Tech', 'Sci/Tech']
Predicted Probabilities : tensor([0.9888, 0.9998], grad_fn=<MaxBackward0>)
masker = shap.maskers.Text(tokenizer=r"\W+")

explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)

shap_values = explainer(X_batch_text)

shap.text_plot(shap_values)

PyTorch: Conv1D For Text Classification Tasks

5. Results Summary And Further Recommendations

Below, we have listed a summary of approaches tried and their respective accuracy on the test set.

Approach Max Tokens Embedding Length Conv1D Output Channels Test Accuracy (%)
Single Conv1D Layer Network 50 128 32 86.76
Single Conv1D Layer Network 50 256 64 87.51
Multiple Conv1D Layers Network 50 256 32,32 87.94

Further Suggestions

Below we have given some suggestions which can be tried to further improve network accuracy.

  • Train network for more epochs.
  • Initialize the network with different weight initialization methods.
  • Try different max tokens per text example.
  • Try different embedding lengths.
  • Try different convolution layers.
  • Try average pooling instead of max pooling.
  • Add more linear layers after convolution layers.
  • Try learning rate schedulers

This ends our small tutorial explaining how we can use 1D convolution layers in creating neural networks (for capturing text sequences) using PyTorch that are used for NLP tasks like text classification. Please feel free to let us know your views in the comments section.

References

Below, we have listed some other important tutorials that use different approaches to text classification tasks.

Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.