Updated On : Mar-29,2022 Tags word-embeddings, pytorch…
Word Embeddings for PyTorch Text Classification Networks

Word Embeddings for PyTorch Text Classification Networks

The traditional text vectorization approaches like word frequency or Tf-IDF (Term Frequency - Inverse Document Frequency) use one float value to represent one word/token. These approaches work well for many NLP tasks like text classification, etc. But as they use only one value to represent a word, it can not capture much information about the word. It can not capture context or some information about the meaning of the word. To solve this problem, word embeddings were invented. Word embeddings uses the list of float values generally referred to as a vector to represent a single word/token. With word embeddings, we can compare words whether they are the same or not by calculating distances between them as they are vectors now. The words with similar meanings generally tend to be near to each other. Also, words commonly used in any context will also appear near to one another. To give an example, let's say words appearing commonly in articles related to computer science will near to each other i.e. distance between vectors used to represent those words will be less compared to words used in other fields like medical science, space, etc.

Word Embeddings for PyTorch Text Classification Networks

As a part of this tutorial, we'll be training a PyTorch neural networks on AG NEWS text dataset to classify text documents into one of the 4 categories they belong (["World", "Sports", "Business", "Sci/Tech"]). To do that, we'll be using word embedding approaches to vectorize words from text documents. We have explained a few different approaches to using word embeddings. The network will initialize word embeddings with random numbers initially and it'll update them with meaningful values as we train the network on data.

Below, we have listed important sections of tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Prepare Dataset
    • 1.1 Load Dataset
    • 1.2 Tokenize Text Data And Build Vocabulary
    • 1.3 Create Data Loaders (Vectorize Text Data)
  2. Approach 1: Word Embeddings
    • 2.1 Define Model
    • 2.2 Train Model
    • 2.3 Evaluate Model Performance
    • 2.4 Explain Predictions Using SHAP Values
  3. Approach 2: Word Embeddings With More Embeddings
  4. Approach 3: Average Word Embeddings
  5. Approach 4: PyTorch EmbeddingBag Layer (Averaged Embeddings)
  6. Approach 5: PyTorch EmbeddingBag Layer (Summed Embeddings)

Below, we have imported important libraries and printed the versions that we have used in our tutorial. We have used SHAP python library to explain predictions made by our network. We have called initjs() method on it below to initialize it as well.

In [1]:
import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.9.1+cpu
In [2]:
import torchtext

print("Torch Text Version : {}".format(torchtext.__version__))
Torch Text Version : 0.10.1
In [3]:
import shap

print("SHAP Version : {}".format(shap.__version__))
SHAP Version : 0.40.0
In [ ]:
shap.initjs()

Word Embeddings for PyTorch Text Classification Networks

1. Prepare Dataset

In this section, we have prepared our dataset to be fed into a neural network for training. In order to prepare the dataset, we have loaded the dataset, populated vocabulary with the words of text documents, and then created data loaders that will map words to their indexes according to vocabulary. Later on, when we give this list of word indexes (according to vocabulary) as input to the network, the embedding layer will map indexes to their respective embeddings.

1.1 Load Dataset

In this section, we have simply loaded AG NEWS dataset available from torchtext library. It has text documents for 4 different categories (["World", "Sports", "Business", "Sci/Tech"]). The dataset is already divided into train and test sets.

In [5]:
from torch.utils.data import DataLoader

train_dataset, test_dataset  = torchtext.datasets.AG_NEWS()
#train_dataset, test_dataset  = torchtext.datasets.YelpReviewFull()
train.csv: 29.5MB [00:00, 91.7MB/s]
test.csv: 1.86MB [00:00, 48.2MB/s]

1.2 Tokenize Text Data And Build Vocabulary

In this section, we have first created a tokenizer using get_tokenizer() function. We have asked it to create a simple tokenizer that separates words from the sentence. It takes text as input and returns a list of tokens/words as output.

Then, we have built vocabulary using build_vocab_from_iterator() function. This function returns an instance of Vocab that has a mapping from word to their indexes according. The vocabulary simply maps words/tokens to their respective indexes. In order to create vocabulary using build_vocab_from_iterator() function, we need to provide it with a function that yields a list of tokens/words. We have created a small function that takes an input list of datasets and then loops through all datasets and their respective text documents. For each text document, it yields a list of tokens/words using the tokenizer function we created.

After populating a vocabulary, we have also printed the size of the vocabulary as well as we have explained one example of how vocabulary will map words/tokens to their indexes. We'll be giving these indexes as input to our neural networks.

In [6]:
from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
import re

def tokenizer(inp_str): ## This method is one way of creating tokenizer that looks for word tokens
    return re.findall(r"\w+", inp_str)

tokenizer = get_tokenizer("basic_english") ## We'll use tokenizer available from PyTorch

def build_vocab(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

vocab = build_vocab_from_iterator(build_vocab([train_dataset, test_dataset]), specials=["<UNK>"])
vocab.set_default_index(vocab["<UNK>"])
In [7]:
len(vocab)
Out[7]:
98635
In [8]:
tokens = tokenizer("Hello how are you?")
indexes = vocab(tokens)

tokens, indexes
Out[8]:
(['hello', 'how', 'are', 'you', '?'], [12388, 355, 42, 164, 80])

1.3 Create Data Loaders (Vectorize Text Data)

In this section, we have created train and test data loaders that we'll use during the training process to go through data. To create data loaders, we have loaded train and test datasets again and given them to DataLoader() constructor as input. In order to tokenize and vectorize text documents, we have created a helper function that is given to collate_fn argument of the DataLoader() constructor. This function will be applied to each batch when we loop through data using data loaders. The function loops through each text document, tokenize them, and then vectorizes tokens/words using vocabulary. We have decided to keep a maximum of 50 words per document. To handle that condition, we have truncated words from documents that have more than 50 words and padded documents (with zeros) to documents that have less than 50 words. The number of words to keep per document is one of the hyperparameters to train. We have kept it at 50 but different values can be tried to check whether any helps improve the accuracy of the model.

After the batch passes through this function, it returns a list of indexes (of length 50) per text document and their respective target labels. The target labels of four categories (["World", "Sports", "Business", "Sci/Tech"]) are in the range [1,4] which we have mapped to [0,3] for simplicity. We have kept batch size as 1024 hence for each batch, we'll get data of shape [1024,50] (1024=batch size, 50=tokens per text document) and 1024 target labels.

In [9]:
from torch.utils.data import DataLoader
from torchtext.data.functional import to_map_style_dataset

def vectorize_batch(batch):
    Y, X = list(zip(*batch))
    X = [vocab(tokenizer(sample)) for sample in X]
    X = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X] ## Bringing all samples to 50 length.
    return torch.tensor(X, dtype=torch.int32), torch.tensor(Y) - 1 ## We have deducted 1 from target names to get them in range [0,1,2,3,5] from [1,2,3,4,5]

train_dataset, test_dataset  = torchtext.datasets.AG_NEWS()
train_dataset, test_dataset = to_map_style_dataset(train_dataset), to_map_style_dataset(test_dataset)

target_classes = ["World", "Sports", "Business", "Sci/Tech"]

train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_batch)
test_loader  = DataLoader(test_dataset, batch_size=1024, collate_fn=vectorize_batch)
In [10]:
for X, Y in train_loader:
    print(X.shape, Y.shape)
    break
torch.Size([1024, 50]) torch.Size([1024])

2. Approach 1: Word Embeddings

In this example, we have explained our first word embeddings approach that uses 25 embeddings per word and flattens the embeddings of words of text example before giving it to linear layers.

2.1 Define Model

Below, we have defined the neural network that we'll use to classify text documents. The network has one embedding layer and 3 linear layers.

  • Embedding Layer - The embedding layer has shape [vocab_len, 25]. This will create a weight of the same shape hence each word will be mapped to 25 embeddings (a float vector of length 25). The embedding layer takes a number in the range [0,vocab_len] as input and maps each one to their respective embeddings (float vectors). The output of the embedding layer is flattened before giving to the linear layer.
  • First linear layer has 1250 input units and 128 output units. The input units length is 25 (word embeddings) multiplied by 50 (word per text example). We have applied relu activation to the output of the linear layer.
  • The second linear layer has 128 input units and 64 output units. The relu activation is applied to the output of the second linear layer as well.
  • The third linear layer has 64 input units and 4 output units (number of target classes/labels).

We have created a network using Sequential API of PyTorch. Please feel free to check the below tutorial if you want to learn about how to design a neural network using **PyTorch.

In [11]:
from torch import nn
from torch.nn import functional as F

class EmbeddingClassifier(nn.Module):
    def __init__(self):
        super(EmbeddingClassifier, self).__init__()
        self.seq = nn.Sequential(
            nn.Embedding(num_embeddings=len(vocab), embedding_dim=25),

            nn.Flatten(),

            nn.Linear(25*50, 128), ## 25 = embeding length, 50 = words we kept per text example
            nn.ReLU(),

            nn.Linear(128,64),
            nn.ReLU(),

            nn.Linear(64, len(target_classes)),
        )

    def forward(self, X_batch):
        return self.seq(X_batch)

2.2 Train Model

In this section, we have trained the network we defined in the previous section. In order to train the network, we have defined a simple helper function. The function takes the model, loss function, train loader, validation loader, and a number of epochs as input. It then executes the training loop number of epochs times. For each epoch, it loops through training data in batches using the training data loader. During each batch, it performs a forward pass to make predictions, calculates loss, calculates gradients using backpropagation, and updates network weights using gradients. It keeps track of loss for each batch and prints the average loss value after the completion of each epoch. We have also created another helper function that calculates validation loss and accuracy and prints it. It loops through the validation loader to calculate validation loss and accuracy.

In [12]:
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def CalcValLossAndAccuracy(model, loss_fn, val_loader):
    with torch.no_grad():
        Y_shuffled, Y_preds, losses = [],[],[]
        for X, Y in val_loader:
            preds = model(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())

            Y_shuffled.append(Y)
            Y_preds.append(preds.argmax(dim=-1))

        Y_shuffled = torch.cat(Y_shuffled)
        Y_preds = torch.cat(Y_preds)

        print("Valid Loss : {:.3f}".format(torch.tensor(losses).mean()))
        print("Valid Acc  : {:.3f}".format(accuracy_score(Y_shuffled.detach().numpy(), Y_preds.detach().numpy())))


def TrainModel(model, loss_fn, optimizer, train_loader, val_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X)

            loss = loss_fn(Y_preds, Y)
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))
        CalcValLossAndAccuracy(model, loss_fn, val_loader)

Below, we have actually trained our network by calling the training routine we designed in the previous cell. We have initialized a number of epochs to 15 and the learning rate to 0.001. Then, we have initialized cross entropy loss function, our classifier, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform the training process.

We can notice from the loss and accuracy getting printed after each epoch that our model is doing a good job. It's not that appealing as our model seems to be making many mistakes but still has good accuracy.

In [13]:
from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)

TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:15<00:00,  7.56it/s]
Train Loss : 1.214
Valid Loss : 1.033
Valid Acc  : 0.562
100%|██████████| 118/118 [00:15<00:00,  7.64it/s]
Train Loss : 0.849
Valid Loss : 0.785
Valid Acc  : 0.687
100%|██████████| 118/118 [00:15<00:00,  7.76it/s]
Train Loss : 0.604
Valid Loss : 0.660
Valid Acc  : 0.752
100%|██████████| 118/118 [00:15<00:00,  7.68it/s]
Train Loss : 0.448
Valid Loss : 0.593
Valid Acc  : 0.786
100%|██████████| 118/118 [00:15<00:00,  7.81it/s]
Train Loss : 0.335
Valid Loss : 0.575
Valid Acc  : 0.801
100%|██████████| 118/118 [00:15<00:00,  7.69it/s]
Train Loss : 0.246
Valid Loss : 0.604
Valid Acc  : 0.805
100%|██████████| 118/118 [00:15<00:00,  7.67it/s]
Train Loss : 0.177
Valid Loss : 0.646
Valid Acc  : 0.809
100%|██████████| 118/118 [00:15<00:00,  7.65it/s]
Train Loss : 0.132
Valid Loss : 0.701
Valid Acc  : 0.812
100%|██████████| 118/118 [00:15<00:00,  7.74it/s]
Train Loss : 0.111
Valid Loss : 1.015
Valid Acc  : 0.774
100%|██████████| 118/118 [00:15<00:00,  7.60it/s]
Train Loss : 0.113
Valid Loss : 0.769
Valid Acc  : 0.819
100%|██████████| 118/118 [00:15<00:00,  7.73it/s]
Train Loss : 0.081
Valid Loss : 0.847
Valid Acc  : 0.818
100%|██████████| 118/118 [00:15<00:00,  7.75it/s]
Train Loss : 0.052
Valid Loss : 0.973
Valid Acc  : 0.808
100%|██████████| 118/118 [00:15<00:00,  7.65it/s]
Train Loss : 0.040
Valid Loss : 1.136
Valid Acc  : 0.799
100%|██████████| 118/118 [00:15<00:00,  7.72it/s]
Train Loss : 0.035
Valid Loss : 1.046
Valid Acc  : 0.822
100%|██████████| 118/118 [00:15<00:00,  7.76it/s]
Train Loss : 0.029
Valid Loss : 1.106
Valid Acc  : 0.824

2.3 Evaluate Network Performance

In this section, we have evaluated the performance of our network by calculating accuracy, classification report and confusion matrix metrics on train predictions. We have created a small helper function to make predictions that take model and data loader as input and returns predictions. We have calculated the ML metrics using functions available from scikit-learn.

If you want to learn about various ML metrics available through scikit-learn then please check the below link that covers the majority of them in detail.

After calculating metrics, we have also plotted the confusion matrix using scikit-plot. We can notice from the plot that our model is doing good for categories [Sports, World] compared to [Business, Sci/Tech]. Please feel free to check the below link if you are interested in learning about scikit-plot. It provides visualization for many commonly used ML metrics.

In [14]:
def MakePredictions(model, loader):
    Y_shuffled, Y_preds = [], []
    for X, Y in loader:
        preds = model(X)
        Y_preds.append(preds)
        Y_shuffled.append(Y)
    gc.collect()
    Y_preds, Y_shuffled = torch.cat(Y_preds), torch.cat(Y_shuffled)

    return Y_shuffled.detach().numpy(), F.softmax(Y_preds, dim=-1).argmax(dim=-1).detach().numpy()

Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
In [15]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8238157894736842

Classification Report :
              precision    recall  f1-score   support

       World       0.87      0.79      0.83      1900
      Sports       0.91      0.92      0.91      1900
    Business       0.74      0.82      0.78      1900
    Sci/Tech       0.79      0.76      0.78      1900

    accuracy                           0.82      7600
   macro avg       0.83      0.82      0.82      7600
weighted avg       0.83      0.82      0.82      7600


Confusion Matrix :
[[1508   93  172  127]
 [  60 1745   50   45]
 [  88   35 1557  220]
 [  72   46  331 1451]]
In [ ]:
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Word Embeddings for PyTorch Text Classification Networks

2.4 Explain Predictions Using SHAP Values

In this section, we have tried to explain predictions made by our network by generating SHAP values using shap python library. In order to use shap library, we need to import it and initialize it by calling initjs() function on it which we did at the beginning of the tutorial.

In order to explain a prediction using SHAP values, we need to create Explainer first. Then, we need to give text examples that we want to explain to the explainer instance to create Explanation object (SHAP values). At last, we can call text_plot() method by giving SHAP values to it to visualize explanations created for text examples.

Below, we have first created an explainer object. The explainer object requires us to provide a function that takes as an input batch of text examples and returns probabilities for each target class for the whole batch. We have created a simple function that takes as an input batch of text samples. It then tokenizes them and creates indexes for tokens using vocabulary. It then assures that each text sample has a length of 50 as required by our network. Then, it gives vectorized batch data to the network to make predictions. As our network returns logits, we have converted them to probabilities using softmax activation function. We have also given target class names when creating explainer instances.

If you do not have a background on SHAP library then we suggest the below-mentioned tutorials that can be very helpful to learn it.

In [17]:
def make_predictions(X_batch_text):
    X_batch = [vocab(tokenizer(sample)) for sample in X_batch_text]
    X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
    X_batch = torch.tensor(X_batch, dtype=torch.int32)
    logits_preds = embed_classifier(X_batch)
    return F.softmax(logits_preds, dim=-1).detach().numpy()

masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)

explainer
Out[17]:
<shap.explainers._partition.Partition at 0x7f53245674d0>

Below, we have first retrieved test samples from the test dataset. Then, we have selected the first two test samples and made predictions about them. We have printed actual labels and predicted labels for both samples. We can notice that our model correctly predicts labels as Business and Sci/Tech.

In [18]:
X_test, Y_test = [], []
for Y, X in test_dataset: ## Selecting first 1024 samples from test data
    X_test.append(X)
    Y_test.append(Y-1) ## Please make a Note that we have subtracted 1 from target values to start index from 0 instead of 1.

X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)


print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Actual    Target Values : ['Business', 'Sci/Tech']
Predicted Target Values : ['Business', 'Sci/Tech']
Predicted Probabilities : torch.return_types.max(
values=tensor([0.9999, 0.9985], grad_fn=<MaxBackward0>),
indices=tensor([2, 3]))

Below, we have first generated Explanation object (shap values) by giving the first two text examples from test data as input to the explainer object. Then, we have called text_plot() function with SHAP values to visualize explanation results.

We can notice from the visualization that words like 'talks', 'federal', 'mogul', etc contributed to predicted label as 'Business' for first text example. For second text example, words like 'launch', 'space', 'spaceflight', 'rocket', 'manned', 'suborbital', etc contributed to predicting category 'Sci/Tech'.

In [ ]:
shap_values = explainer(X_test[:2])

shap.text_plot(shap_values)

Word Embeddings for PyTorch Text Classification Networks

3. Approach 2: Word Embeddings With More Embeddings

Our approach in this example is almost exactly the same as our approach in the previous example with the only difference being the size of the embedding. In our previous example, we had an embedding length of 25 whereas, in this section, we have kept the embedding length at 40. The code is exactly the same as in our previous section.

3.1 Define Model

Below, we have defined our network again but this time with an embedding length of 40. The rest of the network structure is the same as in the previous section.

In [20]:
from torch import nn
from torch.nn import functional as F

class EmbeddingClassifier(nn.Module):
    def __init__(self):
        super(EmbeddingClassifier, self).__init__()
        self.seq = nn.Sequential(
            nn.Embedding(num_embeddings=len(vocab), embedding_dim=40),

            nn.Flatten(),

            nn.Linear(40*50, 128), ## 40 = embeding length, 50 = words we kept per sample
            nn.ReLU(),

            nn.Linear(128,64),
            nn.ReLU(),

            nn.Linear(64, len(target_classes)),
        )

    def forward(self, X_batch):
        return self.seq(X_batch)

3.2 Train Model

Below, we have trained our network for 15 epochs using almost all settings same as in the previous section. We can notice from the loss and accuracy getting printed after each epoch that our model has done a good job. The validation accuracy has improved a bit compared to our previous example.

In [21]:
from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)

TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:16<00:00,  6.94it/s]
Train Loss : 1.153
Valid Loss : 0.926
Valid Acc  : 0.620
100%|██████████| 118/118 [00:16<00:00,  7.03it/s]
Train Loss : 0.752
Valid Loss : 0.686
Valid Acc  : 0.728
100%|██████████| 118/118 [00:16<00:00,  7.28it/s]
Train Loss : 0.513
Valid Loss : 0.596
Valid Acc  : 0.777
100%|██████████| 118/118 [00:16<00:00,  7.00it/s]
Train Loss : 0.361
Valid Loss : 0.583
Valid Acc  : 0.797
100%|██████████| 118/118 [00:16<00:00,  7.19it/s]
Train Loss : 0.250
Valid Loss : 0.632
Valid Acc  : 0.805
100%|██████████| 118/118 [00:16<00:00,  7.10it/s]
Train Loss : 0.171
Valid Loss : 0.719
Valid Acc  : 0.805
100%|██████████| 118/118 [00:16<00:00,  7.35it/s]
Train Loss : 0.130
Valid Loss : 0.742
Valid Acc  : 0.815
100%|██████████| 118/118 [00:16<00:00,  7.21it/s]
Train Loss : 0.120
Valid Loss : 0.802
Valid Acc  : 0.819
100%|██████████| 118/118 [00:16<00:00,  7.25it/s]
Train Loss : 0.104
Valid Loss : 0.796
Valid Acc  : 0.830
100%|██████████| 118/118 [00:16<00:00,  7.05it/s]
Train Loss : 0.080
Valid Loss : 0.914
Valid Acc  : 0.813
100%|██████████| 118/118 [00:16<00:00,  7.22it/s]
Train Loss : 0.064
Valid Loss : 0.916
Valid Acc  : 0.828
100%|██████████| 118/118 [00:16<00:00,  7.22it/s]
Train Loss : 0.050
Valid Loss : 0.941
Valid Acc  : 0.837
100%|██████████| 118/118 [00:16<00:00,  6.98it/s]
Train Loss : 0.040
Valid Loss : 0.970
Valid Acc  : 0.844
100%|██████████| 118/118 [00:16<00:00,  7.16it/s]
Train Loss : 0.034
Valid Loss : 1.052
Valid Acc  : 0.842
100%|██████████| 118/118 [00:16<00:00,  6.97it/s]
Train Loss : 0.026
Valid Loss : 1.092
Valid Acc  : 0.842

3.3 Evaluate Model Performance

Below, we have evaluated the performance of our network as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. The test accuracy is a little better compared to our previous approach.

In [22]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8421052631578947

Classification Report :
              precision    recall  f1-score   support

       World       0.84      0.86      0.85      1900
      Sports       0.90      0.94      0.92      1900
    Business       0.82      0.77      0.80      1900
    Sci/Tech       0.80      0.79      0.80      1900

    accuracy                           0.84      7600
   macro avg       0.84      0.84      0.84      7600
weighted avg       0.84      0.84      0.84      7600


Confusion Matrix :
[[1633   99   87   81]
 [  64 1790   17   29]
 [ 125   48 1470  257]
 [ 124   61  208 1507]]
In [ ]:
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Word Embeddings for PyTorch Text Classification Networks

3.4 Explain Predictions Using SHAP Values

Below, we have explained the predictions made by our model for the first two test samples. Our model correctly predicts labels for both samples as 'Business' and 'Sci/Tech' respectively.

We can notice from the visualization that words like 'pension', 'unions', 'workers', 'federal', 'mogul', etc contributed to predicting category as 'Business' for first text example. For second text example, words like 'spaceflight', 'space', 'launch', 'human', etc contributed to predicting category as 'Sci/Tech'.

In [24]:
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Actual    Target Values : ['Business', 'Sci/Tech']
Predicted Target Values : ['Business', 'Sci/Tech']
Predicted Probabilities : torch.return_types.max(
values=tensor([0.9578, 1.0000], grad_fn=<MaxBackward0>),
indices=tensor([2, 3]))
In [ ]:
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)

Word Embeddings for PyTorch Text Classification Networks

4. Approach 3: Average Word Embeddings

In this section, we have used a little different approach compared to our previous two approaches. Till now, we were keeping embeddings for all words/tokens of the text example by laying them next to each other, but in this section, we have taken the average of embeddings of all tokens/words per text example. The majority of the code is exactly the same as our previous sections with only a change in handling embeddings.

4.1 Define Model

Below, we have defined a network that we'll use in this example. The network has almost the same structure as our network from the first example with a minor change in the forward pass method. We have used word embeddings length of 25 again in this network. The main change in the forward pass is that the output of the word embeddings layer is averaged for each token/word of text example.

In [26]:
from torch import nn
from torch.nn import functional as F

class EmbeddingClassifier(nn.Module):
    def __init__(self):
        super(EmbeddingClassifier, self).__init__()
        self.word_embeddings = nn.Embedding(num_embeddings=len(vocab), embedding_dim=25)
        self.linear1 = nn.Linear(25, 128) ## 25 = embeding length, 50 = words we kept per sample
        self.linear2 = nn.Linear(128,64)
        self.linear3 = nn.Linear(64, len(target_classes))

    def forward(self, X_batch):
        x = self.word_embeddings(X_batch)
        x = x.mean(dim=1) ## Averaging embeddings

        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        logits = F.relu(self.linear3(x))

        return logits

4.2 Train Model

Below, we have trained our network with exactly the same settings that we have used in our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model seems to be doing a good job. The accuracy is quite high compared to our previous approaches.

In [27]:
from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)

TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:13<00:00,  8.86it/s]
Train Loss : 1.307
Valid Loss : 1.072
Valid Acc  : 0.555
100%|██████████| 118/118 [00:13<00:00,  8.91it/s]
Train Loss : 0.854
Valid Loss : 0.713
Valid Acc  : 0.714
100%|██████████| 118/118 [00:13<00:00,  8.73it/s]
Train Loss : 0.614
Valid Loss : 0.569
Valid Acc  : 0.784
100%|██████████| 118/118 [00:13<00:00,  9.03it/s]
Train Loss : 0.493
Valid Loss : 0.490
Valid Acc  : 0.818
100%|██████████| 118/118 [00:13<00:00,  8.95it/s]
Train Loss : 0.421
Valid Loss : 0.440
Valid Acc  : 0.843
100%|██████████| 118/118 [00:13<00:00,  8.91it/s]
Train Loss : 0.372
Valid Loss : 0.408
Valid Acc  : 0.858
100%|██████████| 118/118 [00:13<00:00,  8.87it/s]
Train Loss : 0.336
Valid Loss : 0.385
Valid Acc  : 0.867
100%|██████████| 118/118 [00:13<00:00,  9.02it/s]
Train Loss : 0.308
Valid Loss : 0.368
Valid Acc  : 0.875
100%|██████████| 118/118 [00:13<00:00,  8.88it/s]
Train Loss : 0.284
Valid Loss : 0.355
Valid Acc  : 0.880
100%|██████████| 118/118 [00:13<00:00,  8.95it/s]
Train Loss : 0.265
Valid Loss : 0.345
Valid Acc  : 0.883
100%|██████████| 118/118 [00:13<00:00,  9.04it/s]
Train Loss : 0.248
Valid Loss : 0.338
Valid Acc  : 0.887
100%|██████████| 118/118 [00:13<00:00,  9.00it/s]
Train Loss : 0.233
Valid Loss : 0.332
Valid Acc  : 0.891
100%|██████████| 118/118 [00:13<00:00,  9.02it/s]
Train Loss : 0.219
Valid Loss : 0.327
Valid Acc  : 0.893
100%|██████████| 118/118 [00:12<00:00,  9.27it/s]
Train Loss : 0.207
Valid Loss : 0.324
Valid Acc  : 0.894
100%|██████████| 118/118 [00:12<00:00,  9.27it/s]
Train Loss : 0.195
Valid Loss : 0.322
Valid Acc  : 0.895

4.3 Evaluate Model Performance

Below, we have evaluated network performance by calculating accuracy, classification report and confusion matrix metrics on test predictions as usual. We can notice from the accuracy that it's quite better compared to our previous approaches. Our model is doing a pretty good job at classifying documents of each category.

In [28]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8951315789473684

Classification Report :
              precision    recall  f1-score   support

       World       0.89      0.91      0.90      1900
      Sports       0.95      0.96      0.95      1900
    Business       0.86      0.86      0.86      1900
    Sci/Tech       0.89      0.85      0.87      1900

    accuracy                           0.90      7600
   macro avg       0.89      0.90      0.89      7600
weighted avg       0.89      0.90      0.89      7600


Confusion Matrix :
[[1720   58   78   44]
 [  42 1830   20    8]
 [  91   20 1639  150]
 [  89   26  171 1614]]
In [ ]:
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Word Embeddings for PyTorch Text Classification Networks

4.4 Explain Predictions Using SHAP Values

Below, we have explained predictions made by our model from this section by generating SHAP values. We have explained the predictions made for the first two test examples. The network correctly predicts labels as 'Business' and 'Sci/Tech' for them respectively.

The words like 'pension', 'unions', 'workers', 'firm', 'federal', 'mogul', etc are contributing to predicting category as 'Business' for first text example. For second text example, words like 'space', 'launch', 'manned', 'rocket', 'flight', etc are contributing to predicting category 'Sci/Tech'.

In [30]:
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Actual    Target Values : ['Business', 'Sci/Tech']
Predicted Target Values : ['Business', 'Sci/Tech']
Predicted Probabilities : torch.return_types.max(
values=tensor([0.6612, 0.9984], grad_fn=<MaxBackward0>),
indices=tensor([2, 3]))
In [ ]:
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)

Word Embeddings for PyTorch Text Classification Networks

5. Approach 4: PyTorch EmbeddingBag Layer (Averaged Embeddings)

Our approach in this section is almost the same as our approach from the previous section with the only difference that we have implemented the approach using EmbeddingBag layer. We have again averaged embeddings of text examples.

5.1 Define Model

Below, we have defined a model that uses the EmbeddingBag layer as the first layer. We have provided it with the same embedding length of 25 that we had used in our previous approach. We have provided one more extra parameter named 'mode' with value 'mean' this time.

The EmbeddingBag layer will work exactly like Embedding layer with the only difference that it'll apply the function specified through mode parameter to the output of Embedding layer. It'll first generate embeddings and then take an average for all tokens/words of a single text example.

The EmbeddingBag layer accepts three values as mode parameter.

  • mean
  • max
  • sum
In [32]:
from torch import nn
from torch.nn import functional as F

class EmbeddingClassifier(nn.Module):
    def __init__(self):
        super(EmbeddingClassifier, self).__init__()
        self.seq = nn.Sequential(
            nn.EmbeddingBag(num_embeddings=len(vocab), embedding_dim=25, mode="mean"),

            nn.Linear(25, 128), ## 25 = embeding length, 50 = words we kept per sample
            nn.ReLU(),

            nn.Linear(128, 64),
            nn.ReLU(),

            nn.Linear(64, len(target_classes)),
        )

    def forward(self, X_batch):
        return self.seq(X_batch)

5.2 Train Model

Below, we have trained our network using exactly the same settings that we have been using for all our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model has performed quite well.

In [33]:
from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)

TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:07<00:00, 15.18it/s]
Train Loss : 1.306
Valid Loss : 1.074
Valid Acc  : 0.558
100%|██████████| 118/118 [00:07<00:00, 15.97it/s]
Train Loss : 0.848
Valid Loss : 0.698
Valid Acc  : 0.725
100%|██████████| 118/118 [00:07<00:00, 15.63it/s]
Train Loss : 0.596
Valid Loss : 0.558
Valid Acc  : 0.789
100%|██████████| 118/118 [00:07<00:00, 15.33it/s]
Train Loss : 0.483
Valid Loss : 0.486
Valid Acc  : 0.819
100%|██████████| 118/118 [00:07<00:00, 15.40it/s]
Train Loss : 0.415
Valid Loss : 0.441
Valid Acc  : 0.836
100%|██████████| 118/118 [00:07<00:00, 15.59it/s]
Train Loss : 0.368
Valid Loss : 0.410
Valid Acc  : 0.849
100%|██████████| 118/118 [00:07<00:00, 15.60it/s]
Train Loss : 0.333
Valid Loss : 0.386
Valid Acc  : 0.859
100%|██████████| 118/118 [00:07<00:00, 15.46it/s]
Train Loss : 0.305
Valid Loss : 0.368
Valid Acc  : 0.868
100%|██████████| 118/118 [00:07<00:00, 15.28it/s]
Train Loss : 0.282
Valid Loss : 0.354
Valid Acc  : 0.873
100%|██████████| 118/118 [00:07<00:00, 15.42it/s]
Train Loss : 0.263
Valid Loss : 0.343
Valid Acc  : 0.878
100%|██████████| 118/118 [00:07<00:00, 15.62it/s]
Train Loss : 0.246
Valid Loss : 0.335
Valid Acc  : 0.883
100%|██████████| 118/118 [00:07<00:00, 15.55it/s]
Train Loss : 0.231
Valid Loss : 0.328
Valid Acc  : 0.887
100%|██████████| 118/118 [00:07<00:00, 15.06it/s]
Train Loss : 0.218
Valid Loss : 0.323
Valid Acc  : 0.889
100%|██████████| 118/118 [00:07<00:00, 16.18it/s]
Train Loss : 0.205
Valid Loss : 0.319
Valid Acc  : 0.891
100%|██████████| 118/118 [00:07<00:00, 16.04it/s]
Train Loss : 0.194
Valid Loss : 0.317
Valid Acc  : 0.892

5.3 Evaluate Model Performance

Here, we have evaluated network performance as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy that it is almost the same as our accuracy from the previous approach.

In [34]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8921052631578947

Classification Report :
              precision    recall  f1-score   support

       World       0.90      0.89      0.89      1900
      Sports       0.94      0.96      0.95      1900
    Business       0.85      0.87      0.86      1900
    Sci/Tech       0.88      0.85      0.87      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600


Confusion Matrix :
[[1696   59  103   42]
 [  29 1827   25   19]
 [  82   21 1645  152]
 [  86   35  167 1612]]
In [ ]:
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Word Embeddings for PyTorch Text Classification Networks

5.4 Explain Predictions Using SHAP Values

Below, we have explained predictions made by our model for the first two test examples using SHAP values. Our model correctly predicts labels as 'Business' and 'Sci/Tech' for them respectively.

We can notice from generated visualization that words like 'pension', 'unions', 'workers', 'firm', 'federal', 'mogul', etc are contributing to predicting category as 'Business' for first example and words like 'space', 'spaceflight', 'manned', 'rocket', etc are contributing to predicting category as 'Sci/Tech' for second text example.

In [36]:
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Actual    Target Values : ['Business', 'Sci/Tech']
Predicted Target Values : ['Business', 'Sci/Tech']
Predicted Probabilities : torch.return_types.max(
values=tensor([0.6977, 0.9995], grad_fn=<MaxBackward0>),
indices=tensor([2, 3]))
In [ ]:
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)

Word Embeddings for PyTorch Text Classification Networks

6. Approach 5: PyTorch EmbeddingBag Layer (Summed Embeddings)

our approach in this section again uses EmbeddingBag layer but this time it sums embeddings of text examples. The only difference in our approach from this section compared to the previous two approaches is that we are summing up embeddings instead of averaging this time.

6.1 Define Model

Below, we have defined a network that we'll use in this section. It has the same structure as our network from the previous section with the only change that mode parameter of EmbeddingBag is set to 'sum' value.

In [38]:
from torch import nn
from torch.nn import functional as F

class EmbeddingClassifier(nn.Module):
    def __init__(self):
        super(EmbeddingClassifier, self).__init__()
        self.seq = nn.Sequential(
            nn.EmbeddingBag(num_embeddings=len(vocab), embedding_dim=25, mode="sum"),

            nn.Linear(25, 128), ## 25 = embeding length, 50 = words we kept per sample
            nn.ReLU(),

            nn.Linear(128, 64),
            nn.ReLU(),

            nn.Linear(64, len(target_classes)),
        )

    def forward(self, X_batch):
        return self.seq(X_batch)

6.2 Train Network

Below we have trained our network using exactly the same settings that we have used for all our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model is doing a good job at the text classification task.

In [39]:
from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)

TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
100%|██████████| 118/118 [00:07<00:00, 15.45it/s]
Train Loss : 1.249
Valid Loss : 1.086
Valid Acc  : 0.546
100%|██████████| 118/118 [00:07<00:00, 16.11it/s]
Train Loss : 0.935
Valid Loss : 0.804
Valid Acc  : 0.681
100%|██████████| 118/118 [00:07<00:00, 15.73it/s]
Train Loss : 0.698
Valid Loss : 0.639
Valid Acc  : 0.758
100%|██████████| 118/118 [00:07<00:00, 16.06it/s]
Train Loss : 0.560
Valid Loss : 0.544
Valid Acc  : 0.798
100%|██████████| 118/118 [00:07<00:00, 15.80it/s]
Train Loss : 0.473
Valid Loss : 0.486
Valid Acc  : 0.823
100%|██████████| 118/118 [00:07<00:00, 16.09it/s]
Train Loss : 0.412
Valid Loss : 0.447
Valid Acc  : 0.839
100%|██████████| 118/118 [00:07<00:00, 16.11it/s]
Train Loss : 0.367
Valid Loss : 0.419
Valid Acc  : 0.851
100%|██████████| 118/118 [00:07<00:00, 15.63it/s]
Train Loss : 0.332
Valid Loss : 0.397
Valid Acc  : 0.860
100%|██████████| 118/118 [00:07<00:00, 15.66it/s]
Train Loss : 0.303
Valid Loss : 0.383
Valid Acc  : 0.867
100%|██████████| 118/118 [00:07<00:00, 15.87it/s]
Train Loss : 0.279
Valid Loss : 0.372
Valid Acc  : 0.873
100%|██████████| 118/118 [00:07<00:00, 15.99it/s]
Train Loss : 0.258
Valid Loss : 0.364
Valid Acc  : 0.878
100%|██████████| 118/118 [00:07<00:00, 15.64it/s]
Train Loss : 0.240
Valid Loss : 0.359
Valid Acc  : 0.881
100%|██████████| 118/118 [00:07<00:00, 15.16it/s]
Train Loss : 0.224
Valid Loss : 0.354
Valid Acc  : 0.883
100%|██████████| 118/118 [00:07<00:00, 15.96it/s]
Train Loss : 0.209
Valid Loss : 0.351
Valid Acc  : 0.886
100%|██████████| 118/118 [00:07<00:00, 16.14it/s]
Train Loss : 0.195
Valid Loss : 0.349
Valid Acc  : 0.889

6.3 Evaluate Network Performance

Here, we have evaluated network performance as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy that it's a little less compared to our previous approach but better than our first two approaches.

In [40]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
Test Accuracy : 0.8890789473684211

Classification Report :
              precision    recall  f1-score   support

       World       0.92      0.87      0.89      1900
      Sports       0.92      0.97      0.94      1900
    Business       0.84      0.87      0.86      1900
    Sci/Tech       0.87      0.85      0.86      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600


Confusion Matrix :
[[1660   90   96   54]
 [  23 1836   25   16]
 [  59   23 1651  167]
 [  68   38  184 1610]]
In [ ]:
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Word Embeddings for PyTorch Text Classification Networks

6.4 Explain Network Predictions Using SHAP Values

Below, we have explained predictions made by our network for the first two test examples using SHAP values. Our model correctly predicts categories as Business and Sci/Tech for text examples respectively.

In [42]:
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)

print("Actual    Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Actual    Target Values : ['Business', 'Sci/Tech']
Predicted Target Values : ['Business', 'Sci/Tech']
Predicted Probabilities : torch.return_types.max(
values=tensor([0.8228, 0.9989], grad_fn=<MaxBackward0>),
indices=tensor([2, 3]))
In [ ]:
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)

Word Embeddings for PyTorch Text Classification Networks

This ends our small tutorial explaining how we can use word embeddings with PyTorch network for text classification tasks. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to let us know in the comments section below (Guest Comments are allowed). We appreciate and value your feedbacks.

If you like our work please give a thumbs-up to our article in the comments section below. You can also support us with a small contribution by clicking on Support Us link in the footer section.