The traditional text vectorization approaches like word frequency or Tf-IDF (Term Frequency - Inverse Document Frequency) use one float value to represent one word/token. These approaches work well for many NLP tasks like text classification, etc. But as they use only one value to represent a word, it can not capture much information about the word. It can not capture context or some information about the meaning of the word. To solve this problem, word embeddings were invented. Word embeddings uses the list of float values generally referred to as a vector to represent a single word/token. With word embeddings, we can compare words whether they are the same or not by calculating distances between them as they are vectors now. The words with similar meanings generally tend to be near to each other. Also, words commonly used in any context will also appear near to one another. To give an example, let's say words appearing commonly in articles related to computer science will near to each other i.e. distance between vectors used to represent those words will be less compared to words used in other fields like medical science, space, etc.
As a part of this tutorial, we'll be training a PyTorch neural networks on AG NEWS text dataset to classify text documents into one of the 4 categories they belong (["World", "Sports", "Business", "Sci/Tech"]). To do that, we'll be using word embedding approaches to vectorize words from text documents. We have explained a few different approaches to using word embeddings. The network will initialize word embeddings with random numbers initially and it'll update them with meaningful values as we train the network on data.
Below, we have listed important sections of tutorial to give an overview of the material covered.
Below, we have imported important libraries and printed the versions that we have used in our tutorial. We have used SHAP python library to explain predictions made by our network. We have called initjs() method on it below to initialize it as well.
import torch
print("PyTorch Version : {}".format(torch.__version__))
import torchtext
print("Torch Text Version : {}".format(torchtext.__version__))
import shap
print("SHAP Version : {}".format(shap.__version__))
shap.initjs()
In this section, we have prepared our dataset to be fed into a neural network for training. In order to prepare the dataset, we have loaded the dataset, populated vocabulary with the words of text documents, and then created data loaders that will map words to their indexes according to vocabulary. Later on, when we give this list of word indexes (according to vocabulary) as input to the network, the embedding layer will map indexes to their respective embeddings.
In this section, we have simply loaded AG NEWS dataset available from torchtext library. It has text documents for 4 different categories (["World", "Sports", "Business", "Sci/Tech"]). The dataset is already divided into train and test sets.
from torch.utils.data import DataLoader
train_dataset, test_dataset = torchtext.datasets.AG_NEWS()
#train_dataset, test_dataset = torchtext.datasets.YelpReviewFull()
In this section, we have first created a tokenizer using get_tokenizer() function. We have asked it to create a simple tokenizer that separates words from the sentence. It takes text as input and returns a list of tokens/words as output.
Then, we have built vocabulary using build_vocab_from_iterator() function. This function returns an instance of Vocab that has a mapping from word to their indexes according. The vocabulary simply maps words/tokens to their respective indexes. In order to create vocabulary using build_vocab_from_iterator() function, we need to provide it with a function that yields a list of tokens/words. We have created a small function that takes an input list of datasets and then loops through all datasets and their respective text documents. For each text document, it yields a list of tokens/words using the tokenizer function we created.
After populating a vocabulary, we have also printed the size of the vocabulary as well as we have explained one example of how vocabulary will map words/tokens to their indexes. We'll be giving these indexes as input to our neural networks.
from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
import re
def tokenizer(inp_str): ## This method is one way of creating tokenizer that looks for word tokens
return re.findall(r"\w+", inp_str)
tokenizer = get_tokenizer("basic_english") ## We'll use tokenizer available from PyTorch
def build_vocab(datasets):
for dataset in datasets:
for _, text in dataset:
yield tokenizer(text)
vocab = build_vocab_from_iterator(build_vocab([train_dataset, test_dataset]), specials=["<UNK>"])
vocab.set_default_index(vocab["<UNK>"])
len(vocab)
tokens = tokenizer("Hello how are you?")
indexes = vocab(tokens)
tokens, indexes
In this section, we have created train and test data loaders that we'll use during the training process to go through data. To create data loaders, we have loaded train and test datasets again and given them to DataLoader() constructor as input. In order to tokenize and vectorize text documents, we have created a helper function that is given to collate_fn argument of the DataLoader() constructor. This function will be applied to each batch when we loop through data using data loaders. The function loops through each text document, tokenize them, and then vectorizes tokens/words using vocabulary. We have decided to keep a maximum of 50 words per document. To handle that condition, we have truncated words from documents that have more than 50 words and padded documents (with zeros) to documents that have less than 50 words. The number of words to keep per document is one of the hyperparameters to train. We have kept it at 50 but different values can be tried to check whether any helps improve the accuracy of the model.
After the batch passes through this function, it returns a list of indexes (of length 50) per text document and their respective target labels. The target labels of four categories (["World", "Sports", "Business", "Sci/Tech"]) are in the range [1,4] which we have mapped to [0,3] for simplicity. We have kept batch size as 1024 hence for each batch, we'll get data of shape [1024,50] (1024=batch size, 50=tokens per text document) and 1024 target labels.
from torch.utils.data import DataLoader
from torchtext.data.functional import to_map_style_dataset
def vectorize_batch(batch):
Y, X = list(zip(*batch))
X = [vocab(tokenizer(sample)) for sample in X]
X = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X] ## Bringing all samples to 50 length.
return torch.tensor(X, dtype=torch.int32), torch.tensor(Y) - 1 ## We have deducted 1 from target names to get them in range [0,1,2,3,5] from [1,2,3,4,5]
train_dataset, test_dataset = torchtext.datasets.AG_NEWS()
train_dataset, test_dataset = to_map_style_dataset(train_dataset), to_map_style_dataset(test_dataset)
target_classes = ["World", "Sports", "Business", "Sci/Tech"]
train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_batch)
test_loader = DataLoader(test_dataset, batch_size=1024, collate_fn=vectorize_batch)
for X, Y in train_loader:
print(X.shape, Y.shape)
break
In this example, we have explained our first word embeddings approach that uses 25 embeddings per word and flattens the embeddings of words of text example before giving it to linear layers.
Below, we have defined the neural network that we'll use to classify text documents. The network has one embedding layer and 3 linear layers.
We have created a network using Sequential API of PyTorch. Please feel free to check the below tutorial if you want to learn about how to design a neural network using **PyTorch.
from torch import nn
from torch.nn import functional as F
class EmbeddingClassifier(nn.Module):
def __init__(self):
super(EmbeddingClassifier, self).__init__()
self.seq = nn.Sequential(
nn.Embedding(num_embeddings=len(vocab), embedding_dim=25),
nn.Flatten(),
nn.Linear(25*50, 128), ## 25 = embeding length, 50 = words we kept per text example
nn.ReLU(),
nn.Linear(128,64),
nn.ReLU(),
nn.Linear(64, len(target_classes)),
)
def forward(self, X_batch):
return self.seq(X_batch)
In this section, we have trained the network we defined in the previous section. In order to train the network, we have defined a simple helper function. The function takes the model, loss function, train loader, validation loader, and a number of epochs as input. It then executes the training loop number of epochs times. For each epoch, it loops through training data in batches using the training data loader. During each batch, it performs a forward pass to make predictions, calculates loss, calculates gradients using backpropagation, and updates network weights using gradients. It keeps track of loss for each batch and prints the average loss value after the completion of each epoch. We have also created another helper function that calculates validation loss and accuracy and prints it. It loops through the validation loader to calculate validation loss and accuracy.
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc
def CalcValLossAndAccuracy(model, loss_fn, val_loader):
with torch.no_grad():
Y_shuffled, Y_preds, losses = [],[],[]
for X, Y in val_loader:
preds = model(X)
loss = loss_fn(preds, Y)
losses.append(loss.item())
Y_shuffled.append(Y)
Y_preds.append(preds.argmax(dim=-1))
Y_shuffled = torch.cat(Y_shuffled)
Y_preds = torch.cat(Y_preds)
print("Valid Loss : {:.3f}".format(torch.tensor(losses).mean()))
print("Valid Acc : {:.3f}".format(accuracy_score(Y_shuffled.detach().numpy(), Y_preds.detach().numpy())))
def TrainModel(model, loss_fn, optimizer, train_loader, val_loader, epochs=10):
for i in range(1, epochs+1):
losses = []
for X, Y in tqdm(train_loader):
Y_preds = model(X)
loss = loss_fn(Y_preds, Y)
losses.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))
CalcValLossAndAccuracy(model, loss_fn, val_loader)
Below, we have actually trained our network by calling the training routine we designed in the previous cell. We have initialized a number of epochs to 15 and the learning rate to 0.001. Then, we have initialized cross entropy loss function, our classifier, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform the training process.
We can notice from the loss and accuracy getting printed after each epoch that our model is doing a good job. It's not that appealing as our model seems to be making many mistakes but still has good accuracy.
from torch.optim import Adam
epochs = 15
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)
TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
In this section, we have evaluated the performance of our network by calculating accuracy, classification report and confusion matrix metrics on train predictions. We have created a small helper function to make predictions that take model and data loader as input and returns predictions. We have calculated the ML metrics using functions available from scikit-learn.
If you want to learn about various ML metrics available through scikit-learn then please check the below link that covers the majority of them in detail.
After calculating metrics, we have also plotted the confusion matrix using scikit-plot. We can notice from the plot that our model is doing good for categories [Sports, World] compared to [Business, Sci/Tech]. Please feel free to check the below link if you are interested in learning about scikit-plot. It provides visualization for many commonly used ML metrics.
def MakePredictions(model, loader):
Y_shuffled, Y_preds = [], []
for X, Y in loader:
preds = model(X)
Y_preds.append(preds)
Y_shuffled.append(Y)
gc.collect()
Y_preds, Y_shuffled = torch.cat(Y_preds), torch.cat(Y_shuffled)
return Y_shuffled.detach().numpy(), F.softmax(Y_preds, dim=-1).argmax(dim=-1).detach().numpy()
Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
normalize=True,
title="Confusion Matrix",
cmap="Purples",
hide_zeros=True,
figsize=(5,5)
);
plt.xticks(rotation=90);
In this section, we have tried to explain predictions made by our network by generating SHAP values using shap python library. In order to use shap library, we need to import it and initialize it by calling initjs() function on it which we did at the beginning of the tutorial.
In order to explain a prediction using SHAP values, we need to create Explainer first. Then, we need to give text examples that we want to explain to the explainer instance to create Explanation object (SHAP values). At last, we can call text_plot() method by giving SHAP values to it to visualize explanations created for text examples.
Below, we have first created an explainer object. The explainer object requires us to provide a function that takes as an input batch of text examples and returns probabilities for each target class for the whole batch. We have created a simple function that takes as an input batch of text samples. It then tokenizes them and creates indexes for tokens using vocabulary. It then assures that each text sample has a length of 50 as required by our network. Then, it gives vectorized batch data to the network to make predictions. As our network returns logits, we have converted them to probabilities using softmax activation function. We have also given target class names when creating explainer instances.
If you do not have a background on SHAP library then we suggest the below-mentioned tutorials that can be very helpful to learn it.
def make_predictions(X_batch_text):
X_batch = [vocab(tokenizer(sample)) for sample in X_batch_text]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
X_batch = torch.tensor(X_batch, dtype=torch.int32)
logits_preds = embed_classifier(X_batch)
return F.softmax(logits_preds, dim=-1).detach().numpy()
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
explainer
Below, we have first retrieved test samples from the test dataset. Then, we have selected the first two test samples and made predictions about them. We have printed actual labels and predicted labels for both samples. We can notice that our model correctly predicts labels as Business and Sci/Tech.
X_test, Y_test = [], []
for Y, X in test_dataset: ## Selecting first 1024 samples from test data
X_test.append(X)
Y_test.append(Y-1) ## Please make a Note that we have subtracted 1 from target values to start index from 0 instead of 1.
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)
print("Actual Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
Below, we have first generated Explanation object (shap values) by giving the first two text examples from test data as input to the explainer object. Then, we have called text_plot() function with SHAP values to visualize explanation results.
We can notice from the visualization that words like 'talks', 'federal', 'mogul', etc contributed to predicted label as 'Business' for first text example. For second text example, words like 'launch', 'space', 'spaceflight', 'rocket', 'manned', 'suborbital', etc contributed to predicting category 'Sci/Tech'.
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)
Our approach in this example is almost exactly the same as our approach in the previous example with the only difference being the size of the embedding. In our previous example, we had an embedding length of 25 whereas, in this section, we have kept the embedding length at 40. The code is exactly the same as in our previous section.
Below, we have defined our network again but this time with an embedding length of 40. The rest of the network structure is the same as in the previous section.
from torch import nn
from torch.nn import functional as F
class EmbeddingClassifier(nn.Module):
def __init__(self):
super(EmbeddingClassifier, self).__init__()
self.seq = nn.Sequential(
nn.Embedding(num_embeddings=len(vocab), embedding_dim=40),
nn.Flatten(),
nn.Linear(40*50, 128), ## 40 = embeding length, 50 = words we kept per sample
nn.ReLU(),
nn.Linear(128,64),
nn.ReLU(),
nn.Linear(64, len(target_classes)),
)
def forward(self, X_batch):
return self.seq(X_batch)
Below, we have trained our network for 15 epochs using almost all settings same as in the previous section. We can notice from the loss and accuracy getting printed after each epoch that our model has done a good job. The validation accuracy has improved a bit compared to our previous example.
from torch.optim import Adam
epochs = 15
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)
TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
Below, we have evaluated the performance of our network as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. The test accuracy is a little better compared to our previous approach.
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
normalize=True,
title="Confusion Matrix",
cmap="Purples",
hide_zeros=True,
figsize=(5,5)
);
plt.xticks(rotation=90);
Below, we have explained the predictions made by our model for the first two test samples. Our model correctly predicts labels for both samples as 'Business' and 'Sci/Tech' respectively.
We can notice from the visualization that words like 'pension', 'unions', 'workers', 'federal', 'mogul', etc contributed to predicting category as 'Business' for first text example. For second text example, words like 'spaceflight', 'space', 'launch', 'human', etc contributed to predicting category as 'Sci/Tech'.
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)
print("Actual Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)
In this section, we have used a little different approach compared to our previous two approaches. Till now, we were keeping embeddings for all words/tokens of the text example by laying them next to each other, but in this section, we have taken the average of embeddings of all tokens/words per text example. The majority of the code is exactly the same as our previous sections with only a change in handling embeddings.
Below, we have defined a network that we'll use in this example. The network has almost the same structure as our network from the first example with a minor change in the forward pass method. We have used word embeddings length of 25 again in this network. The main change in the forward pass is that the output of the word embeddings layer is averaged for each token/word of text example.
from torch import nn
from torch.nn import functional as F
class EmbeddingClassifier(nn.Module):
def __init__(self):
super(EmbeddingClassifier, self).__init__()
self.word_embeddings = nn.Embedding(num_embeddings=len(vocab), embedding_dim=25)
self.linear1 = nn.Linear(25, 128) ## 25 = embeding length, 50 = words we kept per sample
self.linear2 = nn.Linear(128,64)
self.linear3 = nn.Linear(64, len(target_classes))
def forward(self, X_batch):
x = self.word_embeddings(X_batch)
x = x.mean(dim=1) ## Averaging embeddings
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x))
logits = F.relu(self.linear3(x))
return logits
Below, we have trained our network with exactly the same settings that we have used in our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model seems to be doing a good job. The accuracy is quite high compared to our previous approaches.
from torch.optim import Adam
epochs = 15
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)
TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
Below, we have evaluated network performance by calculating accuracy, classification report and confusion matrix metrics on test predictions as usual. We can notice from the accuracy that it's quite better compared to our previous approaches. Our model is doing a pretty good job at classifying documents of each category.
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
normalize=True,
title="Confusion Matrix",
cmap="Purples",
hide_zeros=True,
figsize=(5,5)
);
plt.xticks(rotation=90);
Below, we have explained predictions made by our model from this section by generating SHAP values. We have explained the predictions made for the first two test examples. The network correctly predicts labels as 'Business' and 'Sci/Tech' for them respectively.
The words like 'pension', 'unions', 'workers', 'firm', 'federal', 'mogul', etc are contributing to predicting category as 'Business' for first text example. For second text example, words like 'space', 'launch', 'manned', 'rocket', 'flight', etc are contributing to predicting category 'Sci/Tech'.
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)
print("Actual Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)
Our approach in this section is almost the same as our approach from the previous section with the only difference that we have implemented the approach using EmbeddingBag layer. We have again averaged embeddings of text examples.
Below, we have defined a model that uses the EmbeddingBag layer as the first layer. We have provided it with the same embedding length of 25 that we had used in our previous approach. We have provided one more extra parameter named 'mode' with value 'mean' this time.
The EmbeddingBag layer will work exactly like Embedding layer with the only difference that it'll apply the function specified through mode parameter to the output of Embedding layer. It'll first generate embeddings and then take an average for all tokens/words of a single text example.
The EmbeddingBag layer accepts three values as mode parameter.
from torch import nn
from torch.nn import functional as F
class EmbeddingClassifier(nn.Module):
def __init__(self):
super(EmbeddingClassifier, self).__init__()
self.seq = nn.Sequential(
nn.EmbeddingBag(num_embeddings=len(vocab), embedding_dim=25, mode="mean"),
nn.Linear(25, 128), ## 25 = embeding length, 50 = words we kept per sample
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, len(target_classes)),
)
def forward(self, X_batch):
return self.seq(X_batch)
Below, we have trained our network using exactly the same settings that we have been using for all our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model has performed quite well.
from torch.optim import Adam
epochs = 15
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)
TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
Here, we have evaluated network performance as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy that it is almost the same as our accuracy from the previous approach.
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
normalize=True,
title="Confusion Matrix",
cmap="Purples",
hide_zeros=True,
figsize=(5,5)
);
plt.xticks(rotation=90);
Below, we have explained predictions made by our model for the first two test examples using SHAP values. Our model correctly predicts labels as 'Business' and 'Sci/Tech' for them respectively.
We can notice from generated visualization that words like 'pension', 'unions', 'workers', 'firm', 'federal', 'mogul', etc are contributing to predicting category as 'Business' for first example and words like 'space', 'spaceflight', 'manned', 'rocket', etc are contributing to predicting category as 'Sci/Tech' for second text example.
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)
print("Actual Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)
our approach in this section again uses EmbeddingBag layer but this time it sums embeddings of text examples. The only difference in our approach from this section compared to the previous two approaches is that we are summing up embeddings instead of averaging this time.
Below, we have defined a network that we'll use in this section. It has the same structure as our network from the previous section with the only change that mode parameter of EmbeddingBag is set to 'sum' value.
from torch import nn
from torch.nn import functional as F
class EmbeddingClassifier(nn.Module):
def __init__(self):
super(EmbeddingClassifier, self).__init__()
self.seq = nn.Sequential(
nn.EmbeddingBag(num_embeddings=len(vocab), embedding_dim=25, mode="sum"),
nn.Linear(25, 128), ## 25 = embeding length, 50 = words we kept per sample
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, len(target_classes)),
)
def forward(self, X_batch):
return self.seq(X_batch)
Below we have trained our network using exactly the same settings that we have used for all our previous approaches. We can notice from the loss and accuracy getting printed after each epoch that our model is doing a good job at the text classification task.
from torch.optim import Adam
epochs = 15
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
embed_classifier = EmbeddingClassifier()
optimizer = Adam(embed_classifier.parameters(), lr=learning_rate)
TrainModel(embed_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)
Here, we have evaluated network performance as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy that it's a little less compared to our previous approach but better than our first two approaches.
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Y_actual, Y_preds = MakePredictions(embed_classifier, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))
from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
normalize=True,
title="Confusion Matrix",
cmap="Purples",
hide_zeros=True,
figsize=(5,5)
);
plt.xticks(rotation=90);
Below, we have explained predictions made by our network for the first two test examples using SHAP values. Our model correctly predicts categories as Business and Sci/Tech for text examples respectively.
X_batch = [vocab(tokenizer(sample)) for sample in X_test[:2]]
X_batch = torch.tensor([sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch], dtype=torch.int32)
logits = embed_classifier(X_batch)
preds_proba = F.softmax(logits, dim=-1)
preds = preds_proba.argmax(dim=-1)
print("Actual Target Values : {}".format([target_classes[target] for target in Y_test[:2]]))
print("Predicted Target Values : {}".format([target_classes[target] for target in preds]))
print("Predicted Probabilities : {}".format(preds_proba.max(dim=-1)))
masker = shap.maskers.Text(tokenizer=r"\W+")
explainer = shap.Explainer(make_predictions, masker=masker, output_names=target_classes)
shap_values = explainer(X_test[:2])
shap.text_plot(shap_values)
This ends our small tutorial explaining how we can use word embeddings with PyTorch network for text classification tasks. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to