Natural Language Understanding

You can go a long way capturing user inputs using regular expressions, but it obviously has its limitations. As you find yourself adding more and more functionality to your bot, you get to a point where you need Natural Language Understanding (NLU) capabilities.

NLU lets you capture user inputs by "intent" instead of parsing its raw text. An intent represents all the different ways users can express a unit of meaning that is valid for your bot. You can then map that intent to an action using a route.

Getting started with NLU

Botonic has its own NLU engine which covers intent recognition tasks.

1. Install the Botonic NLU Plugin

If you are using the nlu example, you should already have Botonic NLU set up. If not, you can use the following command from within your bot's project to install it:

npm install @botonic/plugin-nlu

Note: Windows users should first use the command: npm install --global --production windows-build-tools --vs2015

Followed by: npm install @botonic/plugin-nlu

2. Add Utterances and Intents

A user can express an intent in different ways. For example the Utterances "Hello", "Hi", and "Good morning" are all examples of a Greeting intent. To create an intent, simply add a new text file under src/nlu/utterances/en/, for example src/nlu/utterances/en/Greetings.txt. Within the Greetings.txt file, you can add possible ways a user may greet the bot. Then you can reference the intent 'Greetings' by adding it in routes:

3. Add Routes for Intents

You can add routes that capture different intents and their corresponding actions. For example, in your routes.js file:

import Start from './actions/start'
import NotFound from './actions/not-found'

export const routes = [
  { input: i => i.confidence < 0.7, action: NotFound },
  { intent: 'Greetings', action: Start },
]

{ input: i => i.confidence < 0.7, action: NotFound } i.confidence references the confidence value of the input. The confidence value is between 0 and 1 and indicates the likelihood that an input has a certain intent. This route is used if the input doesn't match an intent with enough confidence.
{ intent: 'Greetings', action: Start } will trigger the action Start when the user inputs a greeting.

Note: Routes are checked in order. To function correctly, you'll want to put the more specific ones first and the more generic ones at the end.

Next, you must create a couple of actions that respond to these intents in src/actions.

4. Run `botonic train`

Once you've added utterances to your intents, run botonic train in your command line. This will train your bot with the utterances in your directory.

5. Run `botonic serve`

You can run botonic serve to test your bot. Experiment with adding more routes/actions for different intents.

When you're ready, you must deploy your bot in production and publish it to messaging platforms like Facebook Messenger

Advanced use of Botonic NLU

Botonic NLU is a natural language understanding engine written in Typescript for Node.js, based on tfjs that allows you to train an intent classification model with your own dataset of utterances.

Define your dataset of utterances

You can load your dataset with Botonic NLU from two different sources:

A folder containing one file per intent with the following format: IntentName.txt. Important: Each file has to contain one sentence per line.
E.g:
Greetings.txt
```
Hello!
hi
good morning
```
Farewell.txt
```
Bye
Goodbye
see you soon!
```
A CSV file containing each sentence under a column named features and each intent under a column named label. Important: Separator can be specified when reading the data.
E.g:
data.csv
```
features, label
Hello!, Greetings
hi, Greetings
good morning, Greetings
Bye, Farewell
Goodbye, Farewell
see you soon!, Farewell
```

Train your model

Once you have defined your dataset, you can import BotonicNLU in order to load it in memory, train and save your model.

train-model.ts

import { BotonicNLU, CONSTANTS, ModelTemplatesType } from '@botonic/nlu'

const nlu = new BotonicNLU()

const dataPath = `path/to/your/dataset-directory/`
// or alternatively
// const dataPath = 'path/to/your/dataset/file.csv'

const data = nlu.readData(
  {
    path: dataPath,
    language: 'en',
    maxSeqLen: 20,
  },
  { csvSeparator: ',' }
)

const [xTrain, xTest, yTrain, yTest] = nlu.trainTestSplit({
  data: data,
  testPercentage: 0.2,
})

;(async () => {
  await nlu.createModel({
    template: ModelTemplatesType.SIMPLE_NN,
    learningRate: 0.01,
  })
  await nlu.train(xTrain, yTrain, { epochs: 10 })
  const accuracy = await nlu.evaluate(xTest, yTest)
  console.log('Accuracy:', accuracy)
  await nlu.saveModel('path/to/models-directory/')
  console.log('Model saved.')
  nlu.predict('good afternoon') // --> Will return the intent prediction
})()

Train your custom model

If you have some deep learning knowledge, you can also implement your own neural network model with the tfjs API.

train-model.ts

import { BotonicNLU } from '@botonic/nlu'
import { join } from 'path'
import { tokenizer } from './preprocessing-tools/tokenizer'
import {
  sequential,
  Sequential,
  LayersModel,
  train,
  layers,
} from '@tensorflow/tfjs-node'

function createCustomModel(maxSeqLen: number): Sequential | LayersModel {
  const model = sequential()
  model.add(layers.inputLayer({ inputShape: [maxSeqLen] })) // input must be the same as maxSeqLen
  model.add(layers.dense({ units: 128, activation: 'relu' }))
  model.add(layers.dense({ units: 3, activation: 'softmax' }))
  model.compile({
    optimizer: train.adam(5e-3),
    loss: 'sparseCategoricalCrossentropy',
    metrics: ['accuracy'],
  })
  model.summary()
  return model
}

const nlu = new BotonicNLU({ tokenizer: tokenizer })

const data = nlu.readData(
  {
    path: dataPath,
    language: 'en',
    maxSeqLen: 20,
  },
  { csvSeparator: ',' }
)

const [xTrain, xTest, yTrain, yTest] = nlu.trainTestSplit({
  data: data,
  testPercentage: 0.1,
  stratify: true,
})

nlu.model = createCustomModel(20)
;(async () => {
  await nlu.train(xTrain, yTrain, { epochs: 8 })
  await nlu.saveModel()
  console.log('Model saved.')
})()

Botonic NLU API

Initialization

constructor({ normalizer, tokenizer, stemmer }: ?{
  normalizer?: Normalizer;
  tokenizer?: Tokenizer;
  stemmer?: Stemmer;
})

An instance of Botonic NLU can be initialized with default preprocessing engines by passing an empty object.

const nlu = new BotonicNLU()

Alternatively, BotonicNLU can be initialized with your own preprocessing engines. Each of these must be a class or object implementing the corresponding methods:

Normalizer: transforms text into a single canonical form that guarantees consistency before applying more data preprocessing.

interface Normalizer {
  normalize(sentence: string): string
}

Tokenizer: splits the text into smaller units called tokens.

interface Tokenizer {
  tokenize(sentence: string): string[]
}

Stemmer: reduces a word to its stem or root format.

interface Stemmer {
  stem(token: string, language: Language): string
}

class CustomTokenizer {
  tokenize(sentence: string): string {
    return sentence.split(' ')
  }
}

const nlu = new BotonicNLU({ tokenizer: new CustomTokenizer() })

Dealing with data

Botonic NLU works internally with the following structure for data:

type DataSet = {
  label: string
  feature: string
}[]

BotonicNLU.readData

It reads your data and converts it into the default structure.

readData(options: {
  path: string;
  language: Language;
  maxSeqLen: number;
},
readerConfig?: DataSetReaderConfig
): DataSet

Parameters:

options:
- path: path to your dataset directory (or file).
- language: main language of the data.
- maxSeqLen: number specifying the maximum length of each sequence of tokens.
readerConfig:
- csvSeparator: column separator for csv files (';' as default).

Returns:

Dataset

const data = nlu.readData({
  path: `path/to/your/dataset-directory/`,
  language: 'en',
  maxSeqLen: 20,
})

BotonicNLU.trainTestSplit

It splits the loaded dataset in two sets: one for training the model and the other for testing it.

trainTestSplit(options: {
  data: DataSet;
  testPercentage: number;
  stratify?: boolean;
}): [InputSet, InputSet, OutputSet, OutputSet];

Parameters:

options:
- data: a variable holding the dataset structure.
- testPercentage: a number between 0 and 1 to split the data.
- stratify: whether to maintain the data distribution of the different classes.

Returns:

[InputSet, InputSet, OutputSet, OutputSet]

const [xTrain, xTest, yTrain, yTest] = nlu.trainTestSplit({
  data: data,
  testPercentage: 0.2,
})

Handling your model

Botonic NLU generates NLU model using neural networks (NN). You can use predefined NN templates or create your own networks based on tfjs. Once a model has been trained, it will be stored in a directory to enable running new predictions in future sessions. The directory will hold the following information:

model-data.json holds the following information:

export interface ModelData {
  language: Language
  intents: IntentDecoder
  maxSeqLen: number
  vocabulary: Vocabulary
}
// ... where Vocabulary and IntentDecoder are:
export declare type Vocabulary = {
  [word: string]: number
}
export declare type IntentDecoder = {
  [id: number]: string
}

BotonicNLU.createModel

It creates a model from the chosen Botonic NLU Model template.

The available Botonic NLU Model templates are:

simple-nn: a simple Neural Network that uses Word Embeddings to create an embedding layer followed by an LSTM layer.

createModel(options: {
  template: ModelTemplatesType
  learningRate: number
  wordEmbeddingsType?: WordEmbeddingType
  wordEmbeddingsDimension?: WordEmbeddingDimension
  trainableEmbeddings?: boolean
}): Promise<void>

Parameters:

options:
- template: a constant from ModelTemplatesType to load a predefined neural network template.
- learningRate: the amount that the weights are updated during training. Typical values range from 0.0001 up to 1.
- wordEmbeddingsType: training with glove or fasttext pretrained embeddings.
- wordEmbeddingsDimension: dimension of word embeddings (defaults are 50 for glove and 300 for fasttext).
- trainableEmbeddings: whether the values of word embeddings matrix will be frozen (default is false). If you have a large dataset, we suggest you to set this to false.

Returns:

Promise<void>

import { ModelTemplatesType } from '@botonic/nlu'

await nlu.createModel({
  template: ModelTemplatesType.SIMPLE_NN,
  learningRate: 0.01,
  wordEmbeddingsType: 'glove',
  wordEmbeddingsDimension: 50,
  trainableEmbeddings: true,
})

BotonicNLU.model

Set your own tfjs model to train.

set model(model: Sequential | LayersModel)

Parameters:

model: a tf.sequential model or a tf.model model.

Returns:

void

const nlu = new BotonicNLU()

const myCustomModel = tf.sequential()
myCustomModel.add(tf.layers.dense({ units: 32, inputShape: [50] }))
myCustomModel.add(tf.layers.dense({ units: 4 }))

nlu.model = myCustomModel

const nlu = new BotonicNLU()

const input = tf.input({shape: [5]});
const denseLayer1 = tf.layers.dense({units: 10, activation: 'relu'});
const denseLayer2 = tf.layers.dense({units: 4, activation: 'softmax'});
.
const output = denseLayer2.apply(denseLayer1.apply(input));
const myCustomModel = tf.model({inputs: input, outputs: output})

nlu.model = myCustomModel

BotonicNLU.loadModel

It allows you to load a previously trained model.

loadModel(modelPath: string, modelDataPath: string): Promise<void>

Parameters:

modelPath: path to model.json.
modelDataPath: path to model-data.json.

Returns:

Promise<void>

await nlu.loadModel('path/to/model.json', 'path/to/model-data.json')

BotonicNLU.saveModel

Save generated model in the specified path.

saveModel(path?: string): Promise<void>

Parameters:

path: path to directory where the model will be stored. By default will be stored under the directory /nlu/models/ from within the directory you are running the script.

Returns:

Promise<void>

await saveModel('path/to/new-model-directory/')

// or alternatively:
await saveModel()

Methods to train and evaluate your model

Training and evaluating the model.

BotonicNLU.train

Train a model.

train(
  x: InputSet,
  y: OutputSet,
  options?: {
    epochs?: number
    batchSize?: number
    validationSplit?: number
  }
): Promise<void>

Parameters:

x: set of samples to train.
y: set of labels to predict.
options
- epochs: number of times that the learning algorithm will work through the entire training dataset.
- batchSize: number of samples that will be propagated through the network.
- validationSplit: percentage of data used to select the best algorithm during training.

Returns:

Promise<void>

const [xTrain, _, yTrain, _] = nlu.trainTestSplit({
  data: data,
  testPercentage: 0.2,
})

await nlu.train(xTrain, yTrain, {
  epochs: 25,
  batchSize: 16,
  validationSplit: 0.1,
})

BotonicNLU.evaluate

It evaluates the accuracy of the model over the test set.

evaluate(x: InputSet, y: OutputSet): number

Parameters:

x: set of samples to evaluate.
y: set of labels to evaluate.

Returns:

number: a value between 0 and 1 with the accuracy.

const [_, xTest, _, yTest] = nlu.trainTestSplit({
  data: data,
  testPercentage: 0.2,
})
// Once the model has been trained with nlu.train
const accuracy = await nlu.evaluate(xTest, yTest)
console.log('Accuracy:', accuracy)

Predicting your sentences

Predict results for your model.

BotonicNLU.predict

Predicts the intent related to a given sentence.

predict(sentence: string): string

Parameters:

sentence: sentence to predict.

Returns:

string: name of the predicted intent.

Input:

// Once the model has been trained with nlu.train
const prediction = nlu.predict('good afternoon!')
console.log(prediction)

Output:

Greetings

BotonicNLU.predictProbabilities

It returns a detailed result for each intent and its corresponding confidence.

predictProbabilities(sentence: string): DecodedPrediction[]

Parameters:

sentence: sentence to predict.

Returns:

DecodedPrediction[]: array of objects containing the intentId and the confidence.

Input:

// Once the model has been trained with nlu.train
const predictions = nlu.predictProbabilities('good afternoon!')
console.log(predictions)

Output:

[
  { intent: 'Farewell', confidence: 0.0120968222600000 },
  { intent: 'Greetings', confidence: 0.9879031777381897 }
]

Getting started with NLU​

1. Install the Botonic NLU Plugin​

2. Add Utterances and Intents​

3. Add Routes for Intents​

4. Run botonic train​

5. Run botonic serve​

Advanced use of Botonic NLU​

Define your dataset of utterances​

Train your model​

Train your custom model​

Botonic NLU API​

Initialization​

Dealing with data​

BotonicNLU.readData​

BotonicNLU.trainTestSplit​

Handling your model​

BotonicNLU.createModel​

BotonicNLU.model​

BotonicNLU.loadModel​

BotonicNLU.saveModel​

Methods to train and evaluate your model​

BotonicNLU.train​

BotonicNLU.evaluate​

Predicting your sentences​

BotonicNLU.predict​

BotonicNLU.predictProbabilities​

Getting started with NLU

1. Install the Botonic NLU Plugin

2. Add Utterances and Intents

3. Add Routes for Intents

4. Run `botonic train`

5. Run `botonic serve`

Advanced use of Botonic NLU

Define your dataset of utterances

Train your model

Train your custom model

Botonic NLU API

Initialization

Dealing with data

BotonicNLU.readData

BotonicNLU.trainTestSplit

Handling your model

BotonicNLU.createModel

BotonicNLU.model

BotonicNLU.loadModel

BotonicNLU.saveModel

Methods to train and evaluate your model

BotonicNLU.train

BotonicNLU.evaluate

Predicting your sentences

BotonicNLU.predict

BotonicNLU.predictProbabilities