Updated On : Jun-05,2022 Tags keras, LSTM, text-genera…

Keras: Text Generation using LSTM Networks (Character-based RNN)

Text Generation is an active area of research in Natural Language Processing (NLP) where we build models that generate text just like humans. The models commonly used to generate text are referred to as Language Models and the process is referred to as Language Modeling. Nowadays various deep learning models are getting developed for language modeling tasks like conversational systems (chatbots), text translation, text summarization, etc. The type of deep learning models like Recurrent Neural Networks (RNNs) and their variants are pretty commonly used to create language models for text generation. The RNNs by design are good at remembering sequences in data. It keeps track of previously seen examples of data and uses it to make predictions of the current. This is required in the case of text generation where we want to know previous words/characters in order to generate a new word/character after them.

As a part of this tutorial, we have explained how we can create Recurrent Neural Networks consisting of LSTM layers for text generation tasks using Python deep learning library Keras. We have used character-based model for the text generation task which takes a specified number of characters as input and predicts the next character of the sequence. We'll encode characters of text using bag of words approach where we'll assign a unique integer index to each character. This encoded data will be given to the network for training. For training purposes, we have used wikipedia dataset which has a list of good quality articles from Wikipedia. We have another tutorial on text generation using Keras which uses character embeddings for encoding text data. Please feel free to check it from the below link.

Please make a NOTE that the language models take a lot of time to train and require GPU to train. Training language model on CPU will take a lot of time. We have a trained model on GPU in this tutorial and we recommend using it.

Below, we have listed important sections of Tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Prepare Data
    • 1.1 Load Dataset
    • 1.2 Populate Vocabulary
    • 1.3 Organize Data For Training
  2. Define Network
  3. Compile And Train Network
  4. Generate Text Using Trained Model
  5. Train Model For More Epochs
  6. Generate Text Using Trained Model
  7. Train Model Even More
  8. Generate Text
  9. Further Recommendations

Below, we have imported the necessary libraries and printed the versions that we have used in our tutorial.

In [1]:
import tensorflow

from tensorflow import keras

print("Keras Version : {}".format(keras.__version__))
Keras Version : 2.6.0
In [2]:
import torchtext

print("TorchText Version : {}".format(torchtext.__version__))
TorchText Version : 0.10.1
In [3]:
import gc

1. Prepare Data

In this section, we are preparing our data to be given to a neural network for training purposes. As we said earlier, we'll be using character-based approach for text generation which means that we'll give a specified number of characters to the network and will train it to predict the next character after those characters. The neural network works on real numbers hence we have used bag of words approach to encoding characters of text. We have followed the below steps to prepare data for the network.

  1. Load data.
  2. Loop through each text example of data and populate a vocabulary of unique characters. A vocabulary is a simple mapping from characters to their integer index. Each character is assigned a unique integer index starting from 0.
  3. Move window of 100 characters through text examples of data set setting 100 characters as data features (X) and next character after them as target value (Y). For example, characters 1-100 will be data features and character 101 will be target value, characters 2-101 will be data features and character 102 will be target value, characters 3-102 will be data features and character 103 will be target value, and so on.
  4. Retrieve the index for characters present in data features (X) and target values (Y) from our populated vocabulary.

After completing 4 steps, we'll have arrays of integers (X, Y) which we can give to the neural network for training. The steps will become more clear as we go through them below.

1.1 Load Dataset

In this section, we have simply loaded our Wikipedia dataset that we are going to use for our purpose. The dataset is already divided into the train, validation, and test sets. We'll be using only the train set for our purpose. The training dataset has well-curated ~36k articles.

In [4]:
train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()
wikitext-2-v1.zip: 100%|██████████| 4.48M/4.48M [00:00<00:00, 6.65MB/s]
In [5]:
X_train_text = [text for text in train_dataset]

len(X_train_text)
Out[5]:
36718

1.2 Populate Vocabulary

In this section, we have populated the vocabulary of unique characters. In order to populate vocabulary, we have created an instance of Tokenizer() available from processing.text sub-module of keras. We have set char_level to True to inform it to break text at character level otherwise by default it breaks for words. In order to populate vocabulary, we have called fit_on_texts() method on the tokenizer instance with our train text examples. The vocabulary is available through word_index attribute of the tokenizer instance once populated. We have printed vocabulary for reference purposes.

Please make a NOTE that vocabulary starts from index 1 as index 0 is reserved for unknown characters not present in the dataset. It is useful for text classification tasks when we try to classify a new text document that which model has never seen and it has some unseen characters/words. These unseen characters/words will be mapped to the 0th index.

In [6]:
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(char_level=True)

tokenizer.fit_on_texts(X_train_text)
In [7]:
print(tokenizer.word_index)
{' ': 1, 'e': 2, 't': 3, 'a': 4, 'n': 5, 'i': 6, 'o': 7, 'r': 8, 's': 9, 'h': 10, 'd': 11, 'l': 12, 'u': 13, 'c': 14, 'm': 15, 'f': 16, 'g': 17, 'p': 18, 'w': 19, 'b': 20, 'y': 21, 'k': 22, ',': 23, '.': 24, 'v': 25, '<': 26, '>': 27, '@': 28, '\n': 29, '1': 30, '0': 31, '=': 32, '"': 33, '2': 34, "'": 35, '9': 36, '-': 37, 'j': 38, 'x': 39, ')': 40, '(': 41, '3': 42, '5': 43, '8': 44, '4': 45, '6': 46, '7': 47, 'z': 48, 'q': 49, ';': 50, '–': 51, ':': 52, '/': 53, '—': 54, '%': 55, 'é': 56, '$': 57, '[': 58, ']': 59, '&': 60, '!': 61, 'í': 62, '’': 63, 'á': 64, 'ā': 65, '£': 66, '°': 67, '?': 68, 'ó': 69, '+': 70, '#': 71, 'š': 72, '−': 73, 'ō': 74, 'ö': 75, 'è': 76, '×': 77, 'ü': 78, 'ä': 79, 'ʻ': 80, 'ś': 81, 'ć': 82, '“': 83, 'ø': 84, 'ł': 85, 'ç': 86, '”': 87, '₹': 88, 'ã': 89, 'µ': 90, 'ì': 91, 'ư': 92, '\ufeff': 93, 'æ': 94, '…': 95, '→': 96, 'ơ': 97, 'ñ': 98, 'å': 99, '☉': 100, '‘': 101, '~': 102, '*': 103, '⁄': 104, 'î': 105, '²': 106, 'ë': 107, 'ệ': 108, 'ī': 109, 'ú': 110, 'ễ': 111, 'ô': 112, 'à': 113, 'ū': 114, 'ă': 115, '^': 116, '♯': 117, 'ê': 118, '‑': 119, 'ỳ': 120, 'đ': 121, 'μ': 122, '≤': 123, 'ل': 124, '~': 125, 'ṃ': 126, '†': 127, '€': 128, '्': 129, '・': 130, '±': 131, 'ž': 132, 'ė': 133, '〈': 134, '〉': 135, 'β': 136, 'č': 137, 'α': 138, 'û': 139, '♭': 140, '½': 141, '„': 142, 'ị': 143, 'с': 144, 'ṭ': 145, 'γ': 146, 'â': 147, '′': 148, '大': 149, '空': 150, '̃': 151, 'ớ': 152, 'ầ': 153, '⅓': 154, 'ا': 155, 'ه': 156, 'ṅ': 157, '჻': 158, 'ṯ': 159, 'ş': 160, 'ح': 161, 'ص': 162, 'ن': 163, '″': 164, '³': 165, '¥': 166, '¡': 167, 'ấ': 168, 'ả': 169, '火': 170, '礮': 171, '·': 172, 'ḥ': 173, 'ჯ': 174, 'ი': 175, 'ო': 176, 'ხ': 177, 'ვ': 178, 'კ': 179, '戦': 180, '場': 181, 'の': 182, 'ヴ': 183, 'ァ': 184, 'ル': 185, 'キ': 186, 'ュ': 187, 'リ': 188, 'ア': 189, '₤': 190, 'ż': 191, 'ń': 192, '่': 193, 'ง': 194, 'ก': 195, 'ั': 196, 'ล': 197, 'ย': 198, 'า': 199, 'ณ': 200, 'ม': 201, 'ิ': 202, 'ต': 203, 'ร': 204, '์': 205, '§': 206, 'ス': 207, 'ト': 208, 'ッ': 209, 'プ': 210, 'ʿ': 211, 'þ': 212, '\\': 213, '`': 214, '⅔': 215, 'ắ': 216, 'ử': 217, '|': 218, '攻': 219, '殻': 220, '機': 221, '動': 222, '隊': 223, 'ų': 224, 'κ': 225, 'ò': 226, 'о': 227, 'в': 228, 'е': 229, 'т': 230, 'к': 231, 'а': 232, 'я': 233, 'ṣ': 234, 'დ': 235, 'უ': 236, 'ზ': 237, 'რ': 238, 'ს': 239, 'ძ': 240, 'წ': 241, 'გ': 242, 'ც': 243}

1.3 Organize Data For Training

In this section, we are organizing our dataset for training. We have set seq_length variable to value 100 because we are going to use a sequence of 100 characters. Then, we are looping through each text example of data. For each text example, we are moving the window of 100 characters through it putting 100 characters as data features (X_train) and the next character after them as target value (Y_train). After setting characters into X_train and Y_train arrays, we have also retrieved their index from our populated vocabulary. We have used texts_to_sequences() method to transform character sequences to indexes. After converting characters to indexes, we have introduced one extra dimension at the end of data features (X_train) so that they can be processed by the LSTM layer. LSTM layer processes data sequences for each example.

Below, we have explained the process with a simple example.

vocab = {
'h':1,
'e':2,
'l':3,
'o':4,
' ':5,
',':6,
'w',7,
'a':8,
'r':9,
'y':10,
'u':11,
'?':12,
'c':13,
'm':14,
't':15,
'd':16,
'z':17,
'n':18
}

text_example = "Hello, How are you? Welcome to coderzcolumn?"
seq_length = 10

X_train = [
            ['h','e','l','l','o',',',' ', 'h','o','w'],
            [,'e','l','l','o',',',' ', 'h','o','w',' '],
            ['l','l','o',',',' ', 'h','o','w', ' ', 'a'],
            ['l','o',',',' ', 'h','o','w',' ', 'a', 'r'],
            ...
            ['d','e','r','z','c','o','l', 'u','m','n']
            ]
Y_train = ['e','l','l','o',',',' ', 'h','o','w',' ',..., '?']

X_train_vectorized = [
                        [1,2,3,4,5,6,1,4,7],
                        [2,3,4,5,6,1,4,7,5],
                        [3,4,5,6,1,4,7,5,1],
                        ...
                        [16,2,9,17,13,4,3,11,14,18]
                     ]
Y_train_vectorized = [1,2,3,4,5,6,1,4,7,5,1,...., 12]
In [8]:
%%time

import numpy as np
train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()

seq_length = 100 ## Network Hyperparameter to tune
X_train, Y_train = [], []

for text in X_train_text[:6000]: ## Using few text examples
    for i in range(0, len(text)-seq_length):
        inp_seq = text[i:i+seq_length].lower()
        out_seq = text[i+seq_length].lower()
        X_train.append(inp_seq)
        Y_train.append(tokenizer.word_index[out_seq]) ## Retrieve index for characters from vocabulary

X_train = tokenizer.texts_to_sequences(X_train) ## Retrieve index for characters from vocabulary

X_train, Y_train = np.array(X_train, dtype=np.int32).reshape(-1, seq_length,1), np.array(Y_train)

X_train.shape, Y_train.shape
CPU times: user 47.8 s, sys: 958 ms, total: 48.8 s
Wall time: 49.3 s
Out[8]:
((1377719, 100, 1), (1377719,))
In [9]:
gc.collect()
Out[9]:
21

2. Define Network

In this section, we have defined a network that we'll use for our text generation task. The task is a classification task as we are making the network predict one of the possible characters from the vocabulary. The first two layers of the network are LSTM layers with output sizes of 256. The first LSTM layer will takes input of shape (batch_size, seq_len, 1) = (batch_size, 100, 1) and transform it to (batch_size, seq_len, lstm_out) = (batch_size, 100, 256). This output is then given to the second LSTM layer for the processing which transforms shape to (batch_size, 256) after processing. The output of the second LSTM layer is given to a dense layer with the same output units as the length of vocabulary for processing and it transforms shape to (batch_size, vocab_len). The softmax activation is applied to the output of the dense layer.

After defining a network, we have also printed a summary of the model which has output shapes of layers and their parameters counts.

We have not covered LSTM layers in detail here. Please feel free to check the below link if you want to know about them in little detail. It explains the usage of LSTM Networks for text classification tasks.

In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding

lstm_out = 256

model = Sequential([
                    LSTM(lstm_out, input_shape=(seq_length, 1), return_sequences=True),
                    LSTM(lstm_out),
                    Dense(len(tokenizer.word_index)+1, activation="softmax")
                ])


model.summary()
2022-05-28 02:02:09.019168: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:09.020268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:09.020935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:09.021758: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-28 02:02:09.022039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:09.022718: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:09.023339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:13.128010: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:13.128861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:13.129583: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-28 02:02:13.130954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14969 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm (LSTM)                  (None, 100, 256)          264192
_________________________________________________________________
lstm_1 (LSTM)                (None, 256)               525312
_________________________________________________________________
dense (Dense)                (None, 244)               62708
=================================================================
Total params: 852,212
Trainable params: 852,212
Non-trainable params: 0
_________________________________________________________________

3. Compile And Train Network

Below, we have first compiled our network to use Adam optimizer for updating parameters and cross entropy loss for measuring network performance. We have set the learning rate to 0.001.

After compiling the network, we have trained it for 50 epochs. We have set the batch size of 1024. We can notice from the loss value getting printed after each epoch that it is decreasing after each epoch which is good and we can say that our model is learning.

In [11]:
from tensorflow.keras.optimizers import Adam
from keras import backend as K

model.compile(optimizer=Adam(learning_rate=0.001), loss="sparse_categorical_crossentropy")
In [12]:
model.fit(X_train, Y_train, batch_size=1024, epochs=50, verbose=2)
2022-05-28 02:02:14.461037: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/50
2022-05-28 02:02:17.423647: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
1346/1346 - 146s - loss: 2.4957
Epoch 2/50
1346/1346 - 142s - loss: 2.1194
Epoch 3/50
1346/1346 - 142s - loss: 1.9804
Epoch 4/50
1346/1346 - 142s - loss: 1.8876
Epoch 5/50
1346/1346 - 142s - loss: 1.8169
Epoch 6/50
1346/1346 - 142s - loss: 1.7605
Epoch 7/50
1346/1346 - 142s - loss: 1.7156
Epoch 8/50
1346/1346 - 142s - loss: 1.6760
Epoch 9/50
1346/1346 - 142s - loss: 1.6418
Epoch 10/50
1346/1346 - 142s - loss: 1.6128
Epoch 11/50
1346/1346 - 142s - loss: 1.5862
Epoch 12/50
1346/1346 - 142s - loss: 1.5635
Epoch 13/50
1346/1346 - 142s - loss: 1.5437
Epoch 14/50
1346/1346 - 142s - loss: 1.5247
Epoch 15/50
1346/1346 - 142s - loss: 1.5074
Epoch 16/50
1346/1346 - 142s - loss: 1.5251
Epoch 17/50
1346/1346 - 142s - loss: 1.4931
Epoch 18/50
1346/1346 - 142s - loss: 1.4771
Epoch 19/50
1346/1346 - 142s - loss: 1.4628
Epoch 20/50
1346/1346 - 142s - loss: 1.4506
Epoch 21/50
1346/1346 - 142s - loss: 1.4387
Epoch 22/50
1346/1346 - 142s - loss: 1.4277
Epoch 23/50
1346/1346 - 142s - loss: 1.4174
Epoch 24/50
1346/1346 - 142s - loss: 1.4098
Epoch 25/50
1346/1346 - 142s - loss: 1.4002
Epoch 26/50
1346/1346 - 142s - loss: 1.3914
Epoch 27/50
1346/1346 - 142s - loss: 1.3837
Epoch 28/50
1346/1346 - 142s - loss: 1.3759
Epoch 29/50
1346/1346 - 142s - loss: 1.3686
Epoch 30/50
1346/1346 - 142s - loss: 1.3620
Epoch 31/50
1346/1346 - 142s - loss: 1.3555
Epoch 32/50
1346/1346 - 142s - loss: 1.3497
Epoch 33/50
1346/1346 - 142s - loss: 1.3440
Epoch 34/50
1346/1346 - 142s - loss: 1.3386
Epoch 35/50
1346/1346 - 142s - loss: 1.3326
Epoch 36/50
1346/1346 - 142s - loss: 1.3277
Epoch 37/50
1346/1346 - 142s - loss: 1.3226
Epoch 38/50
1346/1346 - 142s - loss: 1.3191
Epoch 39/50
1346/1346 - 142s - loss: 1.3153
Epoch 40/50
1346/1346 - 142s - loss: 1.3100
Epoch 41/50
1346/1346 - 142s - loss: 1.3059
Epoch 42/50
1346/1346 - 142s - loss: 1.3017
Epoch 43/50
1346/1346 - 142s - loss: 1.2975
Epoch 44/50
1346/1346 - 142s - loss: 1.2936
Epoch 45/50
1346/1346 - 142s - loss: 1.2896
Epoch 46/50
1346/1346 - 142s - loss: 1.2860
Epoch 47/50
1346/1346 - 142s - loss: 1.2823
Epoch 48/50
1346/1346 - 142s - loss: 1.2787
Epoch 49/50
1346/1346 - 142s - loss: 1.2759
Epoch 50/50
1346/1346 - 142s - loss: 1.2732
Out[12]:
<keras.callbacks.History at 0x7f9ef0287610>

4. Generate Text Using Trained Model

In this section, we have generated text using our trained network. We have generated 100 new characters. We first selected a text example at random from our train data and printed the characters it. This text example will work as a starting point. We have a loop for generating 100 new characters. The first iteration of the loop starts with the selected example and generates a new character. This character gets added to the end of the text sequence and the first character from the sequence is removed. This modified sequence of 100 characters is used for the second iteration of the loop to generate another character which gets added to the end of the sequence. This process is repeated 100 times. After generating 100 new characters, we have printed them as well.

We can notice from the results that the text generated by the model looks like English language text though it is not making much sense. The network has learned to properly spell words. The network is a little deterministic and repeats a few words. This can be avoided by adding little randomness to the prediction. We'll try network for more epochs in the next sections to see whether it improves results or not.

In [13]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].flatten().tolist()

print("Initial Pattern : {}".format("".join([tokenizer.index_word[idx] for idx in pattern])))

generated_text = []
for i in range(100):
    X_batch = np.array(pattern, dtype=np.int32).reshape(1, seq_length, 1) ## Design Batch
    preds = model.predict(X_batch) ## Make Prediction
    predicted_index = preds.argmax(axis=-1)[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join([tokenizer.index_word[idx] for idx in generated_text])))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the song was a single to the song in the song in the song in the song in the song in the song in

5. Train Model For More Epochs

Here, we have trained the network for another 50 epochs. We have modified the learning rate to 0.0003 for these epochs. The loss values getting printed hint to us that the network has improved. Next, we'll try to generate text using this more trained network.

In [14]:
K.set_value(model.optimizer.learning_rate, 0.0003)

model.fit(X_train, Y_train, batch_size=1024, epochs=50, verbose=2)
Epoch 1/50
1346/1346 - 143s - loss: 1.2390
Epoch 2/50
1346/1346 - 142s - loss: 1.2361
Epoch 3/50
1346/1346 - 142s - loss: 1.2348
Epoch 4/50
1346/1346 - 142s - loss: 1.2329
Epoch 5/50
1346/1346 - 142s - loss: 1.2316
Epoch 6/50
1346/1346 - 142s - loss: 1.2304
Epoch 7/50
1346/1346 - 142s - loss: 1.2289
Epoch 8/50
1346/1346 - 142s - loss: 1.2274
Epoch 9/50
1346/1346 - 142s - loss: 1.2263
Epoch 10/50
1346/1346 - 142s - loss: 1.2252
Epoch 11/50
1346/1346 - 142s - loss: 1.2239
Epoch 12/50
1346/1346 - 142s - loss: 1.2227
Epoch 13/50
1346/1346 - 142s - loss: 1.2215
Epoch 14/50
1346/1346 - 142s - loss: 1.2202
Epoch 15/50
1346/1346 - 142s - loss: 1.2191
Epoch 16/50
1346/1346 - 142s - loss: 1.2180
Epoch 17/50
1346/1346 - 142s - loss: 1.2167
Epoch 18/50
1346/1346 - 142s - loss: 1.2156
Epoch 19/50
1346/1346 - 142s - loss: 1.2144
Epoch 20/50
1346/1346 - 142s - loss: 1.2135
Epoch 21/50
1346/1346 - 142s - loss: 1.2121
Epoch 22/50
1346/1346 - 142s - loss: 1.2113
Epoch 23/50
1346/1346 - 142s - loss: 1.2099
Epoch 24/50
1346/1346 - 142s - loss: 1.2091
Epoch 25/50
1346/1346 - 142s - loss: 1.2079
Epoch 26/50
1346/1346 - 142s - loss: 1.2068
Epoch 27/50
1346/1346 - 142s - loss: 1.2059
Epoch 28/50
1346/1346 - 142s - loss: 1.2048
Epoch 29/50
1346/1346 - 142s - loss: 1.2036
Epoch 30/50
1346/1346 - 142s - loss: 1.2025
Epoch 31/50
1346/1346 - 142s - loss: 1.2018
Epoch 32/50
1346/1346 - 142s - loss: 1.2008
Epoch 33/50
1346/1346 - 142s - loss: 1.1998
Epoch 34/50
1346/1346 - 142s - loss: 1.1989
Epoch 35/50
1346/1346 - 142s - loss: 1.1977
Epoch 36/50
1346/1346 - 142s - loss: 1.1967
Epoch 37/50
1346/1346 - 142s - loss: 1.1959
Epoch 38/50
1346/1346 - 142s - loss: 1.1948
Epoch 39/50
1346/1346 - 142s - loss: 1.1938
Epoch 40/50
1346/1346 - 142s - loss: 1.1930
Epoch 41/50
1346/1346 - 142s - loss: 1.1918
Epoch 42/50
1346/1346 - 142s - loss: 1.1911
Epoch 43/50
1346/1346 - 142s - loss: 1.1900
Epoch 44/50
1346/1346 - 142s - loss: 1.1892
Epoch 45/50
1346/1346 - 142s - loss: 1.1883
Epoch 46/50
1346/1346 - 142s - loss: 1.1874
Epoch 47/50
1346/1346 - 142s - loss: 1.1863
Epoch 48/50
1346/1346 - 142s - loss: 1.1856
Epoch 49/50
1346/1346 - 142s - loss: 1.1845
Epoch 50/50
1346/1346 - 142s - loss: 1.1840
Out[14]:
<keras.callbacks.History at 0x7f9ed9590d50>

6. Generate Text Using Trained Model

Here, we have again tried to generate 100 characters using our trained network. We have started with the same text example with which we had started earlier. We can notice that network has generated little different text this time. It has also generated a punctuation mark this time. It is still repeating a few words though. We'll train the network more to see whether it helps improve further. Language models generally give good results after training for many epochs.

In [15]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].flatten().tolist()

print("Initial Pattern : {}".format("".join([tokenizer.index_word[idx] for idx in pattern])))

generated_text = []
for i in range(100):
    X_batch = np.array(pattern, dtype=np.int32).reshape(1, seq_length, 1) ## Design Batch
    preds = model.predict(X_batch) ## Make Prediction
    predicted_index = preds.argmax(axis=-1)[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join([tokenizer.index_word[idx] for idx in generated_text])))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the second sound of the state of the state of the state of the state of the state of the storm .

7. Train Model Even More

In this section, we have trained the network for another 50 epochs. We have set the learning rate to 0.0001 for these epochs. The loss values getting printed after each epoch hint to us that model is improving further. We'll test it by generating text.

In [16]:
K.set_value(model.optimizer.learning_rate, 0.0001)

model.fit(X_train, Y_train, batch_size=1024, epochs=50, verbose=2)
Epoch 1/50
1346/1346 - 142s - loss: 1.1716
Epoch 2/50
1346/1346 - 142s - loss: 1.1709
Epoch 3/50
1346/1346 - 142s - loss: 1.1704
Epoch 4/50
1346/1346 - 142s - loss: 1.1701
Epoch 5/50
1346/1346 - 142s - loss: 1.1696
Epoch 6/50
1346/1346 - 142s - loss: 1.1693
Epoch 7/50
1346/1346 - 142s - loss: 1.1690
Epoch 8/50
1346/1346 - 142s - loss: 1.1686
Epoch 9/50
1346/1346 - 142s - loss: 1.1682
Epoch 10/50
1346/1346 - 142s - loss: 1.1679
Epoch 11/50
1346/1346 - 142s - loss: 1.1676
Epoch 12/50
1346/1346 - 142s - loss: 1.1671
Epoch 13/50
1346/1346 - 142s - loss: 1.1669
Epoch 14/50
1346/1346 - 142s - loss: 1.1665
Epoch 15/50
1346/1346 - 142s - loss: 1.1661
Epoch 16/50
1346/1346 - 142s - loss: 1.1658
Epoch 17/50
1346/1346 - 142s - loss: 1.1656
Epoch 18/50
1346/1346 - 142s - loss: 1.1651
Epoch 19/50
1346/1346 - 142s - loss: 1.1649
Epoch 20/50
1346/1346 - 142s - loss: 1.1645
Epoch 21/50
1346/1346 - 142s - loss: 1.1642
Epoch 22/50
1346/1346 - 142s - loss: 1.1639
Epoch 23/50
1346/1346 - 142s - loss: 1.1635
Epoch 24/50
1346/1346 - 142s - loss: 1.1632
Epoch 25/50
1346/1346 - 142s - loss: 1.1628
Epoch 26/50
1346/1346 - 142s - loss: 1.1625
Epoch 27/50
1346/1346 - 142s - loss: 1.1622
Epoch 28/50
1346/1346 - 142s - loss: 1.1620
Epoch 29/50
1346/1346 - 142s - loss: 1.1616
Epoch 30/50
1346/1346 - 142s - loss: 1.1613
Epoch 31/50
1346/1346 - 142s - loss: 1.1611
Epoch 32/50
1346/1346 - 142s - loss: 1.1607
Epoch 33/50
1346/1346 - 142s - loss: 1.1603
Epoch 34/50
1346/1346 - 142s - loss: 1.1600
Epoch 35/50
1346/1346 - 142s - loss: 1.1597
Epoch 36/50
1346/1346 - 142s - loss: 1.1594
Epoch 37/50
1346/1346 - 142s - loss: 1.1591
Epoch 38/50
1346/1346 - 142s - loss: 1.1588
Epoch 39/50
1346/1346 - 142s - loss: 1.1584
Epoch 40/50
1346/1346 - 142s - loss: 1.1582
Epoch 41/50
1346/1346 - 142s - loss: 1.1578
Epoch 42/50
1346/1346 - 142s - loss: 1.1575
Epoch 43/50
1346/1346 - 143s - loss: 1.1572
Epoch 44/50
1346/1346 - 142s - loss: 1.1569
Epoch 45/50
1346/1346 - 142s - loss: 1.1565
Epoch 46/50
1346/1346 - 142s - loss: 1.1562
Epoch 47/50
1346/1346 - 142s - loss: 1.1559
Epoch 48/50
1346/1346 - 142s - loss: 1.1556
Epoch 49/50
1346/1346 - 142s - loss: 1.1553
Epoch 50/50
1346/1346 - 142s - loss: 1.1550
Out[16]:
<keras.callbacks.History at 0x7f9ed9883890>

8. Generate Text

In this section, we have again generated new text of 100 characters using our model. We have used the same text example that we had used earlier as a starting point. We can notice from the generated text that it looks like English language text without any spelling errors. The network is still deterministic and repeats few words but it can be improved by trying different approaches. In the next section, we have suggested a few things which can help get good results for text generation tasks.

In [17]:
import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].flatten().tolist()

print("Initial Pattern : {}".format("".join([tokenizer.index_word[idx] for idx in pattern])))

generated_text = []
for i in range(100):
    X_batch = np.array(pattern, dtype=np.int32).reshape(1, seq_length, 1) ## Design Batch
    preds = model.predict(X_batch) ## Make Prediction
    predicted_index = preds.argmax(axis=-1)[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join([tokenizer.index_word[idx] for idx in generated_text])))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the state of the state of the state of the state of the state of the state of the state of the s

9. Further Recommendations

  1. Train the network for more epochs.
  2. Try different sequence lengths. We had used 100 characters sequence in our case.
  3. Try different encoding approaches to encode characters like character embedding, etc.
  4. Try the n-gram/word-based model instead of the character-based model.
  5. Try different output sizes for LSTM layers.
  6. Try adding more LSTM layers. (Please make a NOTE that adding more LSTM layers can increase training time as recurrent layers take more time to train.)
  7. Add more dense layers after LSTM layers.
  8. Try learning rate schedulers
  9. Try other RNN layers like GRU, vanilla RNN, etc.
  10. Add little randomness to the prediction of the next character. REFERENCE
Sunny Solanki  Sunny Solanki

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking HERE.