Updated On : Jun-01,2022 Tags Pytorch, LSTM, text-gene…

PyTorch: Text Generation using LSTM Networks (Character-based RNN)

Text Generation also referred to as Natural Language Generation is a kind of Language Modeling problem where we build a model that tries to understand the structure of a text and produce another text. Tasks like machine translation, conversational systems (chatbots), speech-to-text, text summarization, etc at their core try to build language models. Now a day's deep learning models are developed for language modeling tasks. The language model in the case of text generation tries to predict the next token (character/word/n-gram) in text-based on previously seen tokens. In order to predict the next token in sequence, the language model needs to understand the sequence in which tokens are laid out. Deep Learning Recurrent Neural Networks (RNNs) and their variants (LSTM, GRU, etc) are quite good at understanding the sequence of input data hence can be used for language modeling tasks.

As a part of this tutorial, we have explained how we can create Recurrent Neural Networks consisting of LSTM layers using Python deep learning library PyTorch for text generation task. In this tutorial, we have used Character-based approach for text generation tasks where the model takes a specified number of characters as input and predicts the next character in the sequence. In the same way, we can also create networks that take a sequence of words as input and predicts the next word. We have used bag of words approach for encoding text data. We have used the Wikipedia text corpus available from torchtext library (PyTorch NLP tasks helper library) for our purpose. We have another tutorial on text generation using Pytorch which uses character embeddings for encoding text data. Please feel free to check it from the below link.

Please make a NOTE that language models are generally big and take time to train until they can produce some meaningful text. It will be hard to train them on CPU and GPU can help with faster training hence we recommend training language models on GPU.

Below, we have listed important sections of Tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Prepare Data
    • 1.1 Load Data
    • 1.2 Populate Vocabulary
    • 1.3 Reshape Examples to Create Sequence Of Data
    • 1.4 Create Data Loaders
  2. Define LSTM Network
  3. Train Network
  4. Generate Text
  5. Train Network More
  6. Generate Text
  7. Train Even More
  8. Generate Text
  9. Further Suggestions

Below, we have imported the necessary Python libraries and printed the versions that we have used in our tutorial.

In [1]:
import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.9.1
In [2]:
import torchtext

print("TorchText Version : {}".format(torchtext.__version__))
TorchText Version : 0.10.1
In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"

device
Out[3]:
'cuda'
In [4]:
import gc

1. Prepare Data

In this section, we are preparing our data for training our network. As we said earlier, we are going to use character-based approach for text generation hence we'll feed a few characters to the network and make it predict the next character in the sequence. We have decided to use 100 characters sequence to network and make it predict the next character after them.

We'll be encoding data using bag of words approach. We'll follow the below steps to encode and prepare data.

  1. Create a vocabulary of all unique characters of the data. A vocabulary is a simple mapping from characters to their integer index. Each unique character is assigned a unique index starting from 0.
  2. Loop through data sequentially one character at a time. Take the first 100 characters as data features and the next character after them as the target value. E.g characters 1-100 data features and character 101 target value, characters 2-101 data features and character 102 target value, characters 3-102 data features and 103 target value, and so on.
  3. Replace each character with their unique index as per vocabulary.

The data generated after following the above steps will be given to the LSTM network for processing. The network will process a sequence of 100 characters at a time and try to predict the next character. We have explained the steps in more detail below to make them easier to grasp.

1.1 Load Data

In this section, we have simply loaded our Wikipedia dataset. The dataset is already divided into the train, test, and validation sets. We'll use only the train set for our task. The train set has ~36k text examples. Each example represents a Wikipedia article.

In [5]:
train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()
wikitext-2-v1.zip: 100%|██████████| 4.48M/4.48M [00:02<00:00, 1.60MB/s]

1.2 Populate Vocabulary

In this section, we are building a vocabulary of all unique characters present in our dataset. In order to create a vocabulary, we have used build_vocab_from_iterator() function available from 'vocab' sub-module of torchtext library. The function accepts an iterator that returns a list of characters on each call. We have created a small function named build_vocabulary() that works as an iterator. The function takes datasets as input and loops through all datasets and their examples one at a time yielding list of characters. Our text examples have a special token named <unk> which represents the unknown character and we have done special handling of it to count it as one token instead of breaking it into characters.

After building vocabulary, we have printed vocabulary and the number of characters present in it.

In [6]:
from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

def build_vocabulary(datasets):
    for dataset in datasets:
        for text in dataset:
            if "<unk>" in text:
                texts = text.split("<unk>")
                total = list(texts[0].lower())
                for t in texts[1:]:
                    total.extend(["<unk>", ] + list(t.lower()))
                yield total
            else:
                yield list(text.lower())

vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, ]), min_freq=1, specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])
In [7]:
len(vocab)
Out[7]:
244
In [8]:
print(vocab.get_itos())
['<unk>', ' ', 'e', 't', 'a', 'n', 'i', 'o', 'r', 's', 'h', 'd', 'l', 'c', 'm', 'u', 'f', 'g', 'p', 'w', 'b', 'y', ',', '.', 'v', 'k', '@', '\n', '1', '0', '=', '"', '2', "'", '9', '-', 'j', 'x', ')', '(', '3', '5', '8', '4', '6', '7', 'z', 'q', ';', '–', ':', '/', '—', '%', 'é', '$', '[', ']', '&', '!', 'í', '’', 'á', 'ā', '£', '°', '?', 'ó', '+', '#', 'š', '−', 'ō', 'ö', 'è', '×', 'ü', 'ä', 'ʻ', 'ś', 'ć', 'ø', '“', 'ł', 'ç', '”', '₹', 'ã', 'µ', 'ì', 'ư', '\ufeff', 'æ', '…', '→', 'ơ', 'ñ', 'å', '☉', '‘', '*', '~', '⁄', 'î', '²', 'ë', 'ệ', 'ī', 'ú', 'ễ', 'à', 'ô', 'ă', 'ū', '<', '^', 'ê', '♯', 'ỳ', '‑', 'đ', 'μ', '≤', '>', 'ل', 'ṃ', '~', '्', '†', '€', '±', 'ė', 'ž', '〈', '〉', '・', 'û', 'č', 'α', 'β', '½', 'γ', 'с', 'ṭ', 'ị', '„', '♭', 'â', '̃', 'ا', 'ه', '჻', 'ṅ', 'ầ', 'ớ', '′', '⅓', '大', '空', '¡', '¥', '³', '·', 'ş', 'ح', 'ص', 'ن', 'ვ', 'ი', 'კ', 'ო', 'ხ', 'ჯ', 'ḥ', 'ṯ', 'ả', 'ấ', '″', '火', '礮', '\\', '`', '|', '§', 'ò', 'þ', 'ń', 'ų', 'ż', 'ʿ', 'κ', 'а', 'в', 'е', 'к', 'о', 'т', 'я', 'ก', 'ง', 'ณ', 'ต', 'ม', 'ย', 'ร', 'ล', 'ั', 'า', 'ิ', '่', '์', 'გ', 'დ', 'ზ', 'რ', 'ს', 'უ', 'ც', 'ძ', 'წ', 'ṣ', 'ắ', 'ử', '₤', '⅔', 'の', 'ァ', 'ア', 'キ', 'ス', 'ッ', 'ト', 'プ', 'ュ', 'リ', 'ル', 'ヴ', '動', '場', '戦', '攻', '機', '殻', '隊']

1.3 Reshape Examples to Create Sequence Of Data

In this section, we are reorganizing our dataset examples so that they can be used to train our LSTM network. We are simply looping through each text example of our train dataset. For each text example, we are sliding a window of 100 characters. We are taking 100 characters as data features and the next character in the sequence as the target value, then we move the window by 1 character and continue the process until we reach the end of the text. We have also replaced each character with its integer index using our vocabulary. Please make a NOTE that we have not used all examples available from the dataset for the training model as it'll take quite long.

After organizing the dataset, we have converted them to torch tensors. We have also added one extra dimension at the end in order to feed data to the LSTM layer.

Below, we have tried to explain the process with a simple example.

vocab = {
'h':1,
'e':2,
'l':3,
'o':4,
' ':5,
',':6,
'w',7,
'a':8,
'r':9,
'y':10,
'u':11,
'?':12,
'c':13,
'm':14,
't':15,
'd':16,
'z':17,
'n':18
}

text_example = "Hello, How are you? Welcome to coderzcolumn?"
seq_length = 10

X_train = [
            ['h','e','l','l','o',',',' ', 'h','o','w'],
            [,'e','l','l','o',',',' ', 'h','o','w',' '],
            ['l','l','o',',',' ', 'h','o','w', ' ', 'a'],
            ['l','o',',',' ', 'h','o','w',' ', 'a', 'r'],
            ...
            ['d','e','r','z','c','o','l', 'u','m','n']
            ]
Y_train = ['e','l','l','o',',',' ', 'h','o','w',' ',..., '?']

X_train_vectorized = [
                        [1,2,3,4,5,6,1,4,7],
                        [2,3,4,5,6,1,4,7,5],
                        [3,4,5,6,1,4,7,5,1],
                        ...
                        [16,2,9,17,13,4,3,11,14,18]
                     ]
Y_train_vectorized = [1,2,3,4,5,6,1,4,7,5,1,...., 12]
In [9]:
%%time

train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()

seq_length = 100 ## Network Hyperparameter to tune
X_train, Y_train = [], []

for text in list(train_dataset)[:7500]:
    for i in range(0, len(text)-seq_length):
        inp_seq = list(text[i:i+seq_length].lower())
        out_seq = text[i+seq_length].lower()
        X_train.append(vocab(inp_seq))
        Y_train.append(vocab[out_seq])

X_train, Y_train = torch.tensor(X_train, dtype=torch.float32), torch.tensor(Y_train)

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) ## Extra dimension is added for LSTM layer

X_train.shape, Y_train.shape
CPU times: user 31.7 s, sys: 1.08 s, total: 32.8 s
Wall time: 32.9 s
Out[9]:
(torch.Size([1781323, 100, 1]), torch.Size([1781323]))

1.4 Create Data Loaders

In this section, we have simply wrapped our torch tensors in the dataset and created a data loader from it. The data loader will let us process data in batches during the training process. We have set the batch size of 1024.

In [10]:
from torch.utils.data import DataLoader, TensorDataset

vectorized_train_dataset = TensorDataset(X_train, Y_train)

train_loader = DataLoader(vectorized_train_dataset, batch_size=1024, shuffle=False)
In [11]:
for X, Y in train_loader:
    print(X.shape, Y.shape)
    break
torch.Size([1024, 100, 1]) torch.Size([1024])
In [12]:
gc.collect()
Out[12]:
21

2. Define LSTM Network

In this section, we have defined a neural network that we'll use for our task. Our task will be considered a classification task as our network predicts one of the characters from the vocabulary.

The network that we have defined consists of 2 LSTM layers and one linear layer. The output size of each LSTM layer is set at 256. The usage of two consecutive LSTM layers will help us better capture the sequence of characters found in the data. We have defined LSTM layers using LSTM() constructor where we have provided the value of num_layers parameter as 2 instructing it to stack to LSTM layers. The output of the second LSTM layer is given to Linear layer which has output units the same as the size of the vocabulary.

After defining the network, we initialized it, printed the shape of weights/biases of layers, and performed a forward pass for verification purposes.

If you are someone who is new to PyTorch or don't have a background on LSTM Networks then we recommend that you go through the below links as they will help you with the background. We have not covered the inner workings of LSTM in-depth here as it is already covered there.

In [13]:
from torch import nn
from torch.nn import functional as F

hidden_dim = 256
n_layers=2

class LSTMTextGenerator(nn.Module):
    def __init__(self):
        super(LSTMTextGenerator, self).__init__()
        self.lstm = nn.LSTM(input_size=1, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, len(vocab))

    def forward(self, X_batch):
        hidden, carry = torch.randn(n_layers, len(X_batch), hidden_dim).to(device), torch.randn(n_layers, len(X_batch), hidden_dim).to(device)
        output, (hidden, carry) = self.lstm(X_batch, (hidden, carry))
        return self.linear(output[:,-1])
In [14]:
text_generator = LSTMTextGenerator().to(device)

text_generator
Out[14]:
LSTMTextGenerator(
  (lstm): LSTM(1, 256, num_layers=2, batch_first=True)
  (linear): Linear(in_features=256, out_features=244, bias=True)
)
In [15]:
for layer in text_generator.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()
Layer : LSTM(1, 256, num_layers=2, batch_first=True)
Parameters :
torch.Size([1024, 1])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])
torch.Size([1024, 256])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])

Layer : Linear(in_features=256, out_features=244, bias=True)
Parameters :
torch.Size([244, 256])
torch.Size([244])

In [16]:
out = text_generator(torch.randn(1024, seq_length, 1).to(device))

out.shape
Out[16]:
torch.Size([1024, 244])

3. Train Network

Here, we are training our network. To simplify the training process, we have created a helper training function. The function takes the model, loss function, optimizer, train data loader, and a number of epochs as input. It then executes a training loop number of epochs times looping through whole training data in batches each time. For each batch of data, it performs a forward pass to make predictions, calculates loss, calculates gradients, and updates network parameters using gradients. It records the loss value for each batch and prints the average loss value of all batches at the end of each epoch.

In [17]:
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def TrainModel(model, loss_fn, optimizer, train_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X.to(device))

            loss = loss_fn(Y_preds, Y.to(device))
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))

Below, we are actually training our network using the training routine from the previous cell. We have initialized a number of epochs to 25 and the learning rate to 0.001. Then, we have initialized cross entropy loss, our LSTM model, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform training. We have trained the network for 25 epochs to see what kind of results it produces. We can notice from the loss value getting printed after each epoch that the network seems to be doing a good job at learning the sequence of characters.

In [18]:
%%time

from torch.optim import Adam

epochs = 25
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss().to(device)
text_generator = LSTMTextGenerator().to(device)
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [02:47<00:00, 10.36it/s]
Train Loss : 2.532
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 2.166
100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]
Train Loss : 2.040
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.952
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.885
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.828
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.781
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.741
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.706
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.678
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.652
100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]
Train Loss : 1.626
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.604
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.584
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.565
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.550
100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]
Train Loss : 1.535
100%|██████████| 1740/1740 [02:48<00:00, 10.34it/s]
Train Loss : 1.521
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.508
100%|██████████| 1740/1740 [02:48<00:00, 10.33it/s]
Train Loss : 1.496
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.484
100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]
Train Loss : 1.474
100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]
Train Loss : 1.464
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.455
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.446
CPU times: user 1h 9min 6s, sys: 17.3 s, total: 1h 9min 23s
Wall time: 1h 10min 36s

4. Generate Text

In this section, we are trying to generate data using our trained network. We have first retrieved a random text example from our organized train dataset. We have then printed the characters of that example. Then, we have a loop that generates 100 new characters. The logic starts with the initial randomly selected sequence and makes the next character prediction. It then removes the first character from the sequence and adds a newly predicted character at the end. Then, it makes another prediction and the process repeats for 100 characters.

We can notice from the results that our model is not making any spelling errors even though it is predicting one character at a time. The sequence of characters generated does not make much sense but seems like an English language sentence. It is also predicting punctuation marks. The model is a little deterministic and repeats the sequence of characters after some time. This can be avoided by introducing some kind of randomness to the output of the network.

The results look overall good as we have trained the network for just 25 epochs. Next, we'll train the network for more epochs and hopefully, it should improve results further.

In [19]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the country , and the country , and the country , and the country , and the country , and the co

5. Train Network More

In this section, we have trained our network for another 50 epochs. We have also reduced the learning rate from 0.001 to 0.0003. We can notice from the loss values getting printed that it is decreasing at every epoch which means that our network is getting good at the text generation task.

In [20]:
epochs = 50
learning_rate = 3e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]
Train Loss : 1.429
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.422
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.418
100%|██████████| 1740/1740 [02:48<00:00, 10.33it/s]
Train Loss : 1.414
100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]
Train Loss : 1.410
100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]
Train Loss : 1.407
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.404
100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]
Train Loss : 1.401
100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]
Train Loss : 1.398
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.395
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.393
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.390
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.387
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.385
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.382
100%|██████████| 1740/1740 [02:50<00:00, 10.18it/s]
Train Loss : 1.380
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.377
100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]
Train Loss : 1.375
100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]
Train Loss : 1.373
100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]
Train Loss : 1.370
100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]
Train Loss : 1.368
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.366
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.364
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.362
100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]
Train Loss : 1.360
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.358
100%|██████████| 1740/1740 [02:51<00:00, 10.17it/s]
Train Loss : 1.356
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.354
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.352
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.350
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.348
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.346
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.344
100%|██████████| 1740/1740 [02:51<00:00, 10.13it/s]
Train Loss : 1.342
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.341
100%|██████████| 1740/1740 [02:47<00:00, 10.41it/s]
Train Loss : 1.339
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.337
100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]
Train Loss : 1.335
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.334
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.332
100%|██████████| 1740/1740 [02:47<00:00, 10.39it/s]
Train Loss : 1.330
100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]
Train Loss : 1.329
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.327
100%|██████████| 1740/1740 [02:47<00:00, 10.38it/s]
Train Loss : 1.326
100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]
Train Loss : 1.324
100%|██████████| 1740/1740 [02:47<00:00, 10.37it/s]
Train Loss : 1.322
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.321
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.319
100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]
Train Loss : 1.318
100%|██████████| 1740/1740 [02:50<00:00, 10.24it/s]
Train Loss : 1.316

6. Generate Text

Here, we have again generated new characters using our more trained network. We have used the same example that we had used earlier. We can notice that results seem to have improved a little bit. The model is not making spelling mistakes and new words are generated for the same example. The network still seems deterministic and produces the same characters again and again. We can train the network further to see whether it helps or not.

In [21]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the south of the south of the south of the south of the south of the south of the south of the s

7. Train Even More

In this section, we have trained our network for another 50 epochs. We have reduced the learning rate from 0.0003 to 0.0001. We can notice from the loss values at the end of the epoch that the network is improving further.

In [22]:
epochs = 50
learning_rate = 1e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [02:46<00:00, 10.44it/s]
Train Loss : 1.314
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.312
100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]
Train Loss : 1.311
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.310
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.309
100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]
Train Loss : 1.309
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.308
100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]
Train Loss : 1.307
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.307
100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]
Train Loss : 1.306
100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]
Train Loss : 1.305
100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]
Train Loss : 1.305
100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]
Train Loss : 1.304
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.304
100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]
Train Loss : 1.303
100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]
Train Loss : 1.302
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.302
100%|██████████| 1740/1740 [02:46<00:00, 10.48it/s]
Train Loss : 1.301
100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]
Train Loss : 1.301
100%|██████████| 1740/1740 [02:47<00:00, 10.37it/s]
Train Loss : 1.300
100%|██████████| 1740/1740 [02:48<00:00, 10.36it/s]
Train Loss : 1.300
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.299
100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]
Train Loss : 1.299
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.298
100%|██████████| 1740/1740 [02:46<00:00, 10.47it/s]
Train Loss : 1.297
100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]
Train Loss : 1.297
100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]
Train Loss : 1.296
100%|██████████| 1740/1740 [02:47<00:00, 10.41it/s]
Train Loss : 1.296
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.295
100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]
Train Loss : 1.295
100%|██████████| 1740/1740 [02:50<00:00, 10.19it/s]
Train Loss : 1.294
100%|██████████| 1740/1740 [02:47<00:00, 10.40it/s]
Train Loss : 1.294
100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]
Train Loss : 1.293
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.293
100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]
Train Loss : 1.292
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.292
100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]
Train Loss : 1.291
100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]
Train Loss : 1.291
100%|██████████| 1740/1740 [02:48<00:00, 10.36it/s]
Train Loss : 1.290
100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]
Train Loss : 1.290
100%|██████████| 1740/1740 [02:50<00:00, 10.19it/s]
Train Loss : 1.289
100%|██████████| 1740/1740 [02:46<00:00, 10.47it/s]
Train Loss : 1.289
100%|██████████| 1740/1740 [02:50<00:00, 10.20it/s]
Train Loss : 1.288
100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]
Train Loss : 1.288
100%|██████████| 1740/1740 [02:51<00:00, 10.17it/s]
Train Loss : 1.287
100%|██████████| 1740/1740 [02:47<00:00, 10.42it/s]
Train Loss : 1.287
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.286
100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]
Train Loss : 1.286
100%|██████████| 1740/1740 [02:47<00:00, 10.39it/s]
Train Loss : 1.285
100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]
Train Loss : 1.285

8. Generate Text

Here, we are again generating text on the same text example using our trained network. We can notice from the results this time that they are a little better compared to earlier. Though they are still deterministic.

In [23]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the second construction of the second construction of the second construction of the second cons

9. Further Suggestions

Below we have suggested a few more things that can be tried to improve network performance further.

  1. Train the network for more epochs.
  2. Try different combinations of LSTM layers. Maybe add more LSTM layers.
  3. Try different hidden sizes for LSTM layers.
  4. Try different sequence lengths. In our case, we tried a sequence length of 100 characters.
  5. Try using an n-gram/word-based model instead of a character-based.
  6. Try adding linear layers in the network after dense layers.
  7. Try learning rate schedulers.
  8. Try different character encodings like character embeddings, etc.
  9. Try to add randomness to the prediction of the next character to make the predicted text look more natural. REFERENCE

This ends our small tutorial explaining how to design LSTM Networks using PyTorch for Text generation tasks. Please feel free to contact us if you questions

References

Sunny Solanki  Sunny Solanki

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking HERE.