Starspace

This agent contains a simple implementation of the starspace algorithm, slightly adapted for dialogue. To learn more about the starspace algorithm, see this paper.

Basic Examples

Train a starspace model on the “sentence SQuAD” task.

parlai train_model --task squad:sentence --model starspace -lr 0.01 -esz 512 -k 10 -mf /tmp/starspacesquad

DictionaryAgent Options

BPEHelper Arguments

Argument

Description

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge

--bpe-dropout

Use BPE dropout during training.

StarspaceAgent Options

StarSpace Arguments

Argument

Description

-emb, --embedding-type

Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training.

Choices: random, glove, glove-fixed, fasttext, fasttext-fixed, fasttext_cc, fasttext_cc-fixed.

Default: random.

-esz, --embeddingsize

Size of the token embeddings

Default: 128.

-enorm, --embeddingnorm

Max norm of word embeddings

Default: 10.

-shareEmb, --share-embeddings

Whether LHS and RHS share embeddings

Default: True.

--lins

If set to 1, add a linear layer between lhs and rhs.

Default: 0.

-lr, --learningrate

Learning rate

Default: 0.1.

-margin, --margin

Margin

Default: 0.1.

--input-dropout

Fraction of input/output features to dropout during training

Default: 0.

-opt, --optimizer

Choose between pytorch optimizers. Any member of torch.optim is valid and will be used with default params except learning rate (as specified by -lr).

Choices: adadelta, adagrad, adam, adamax, asgd, lbfgs, rmsprop, rprop, sgd.

Default: sgd.

-tr, --truncate

Truncate input & output lengths to speed up training (may reduce accuracy). This fixes all input and output to have a maximum length.

Default: -1.

-k, --neg-samples

Number k of negative samples per example

Default: 10.

--parrot-neg

Include query as a negative

Default: 0.

--tfidf

Use frequency based normalization for embeddings.

Default: False.

-cs, --cache-size

Size of negative sample cache to draw from

Default: 1000.

-hist, --history-length

Number of past tokens to remember.

Default: 10000.

-histr, --history-replies

Keep replies in the history, or not.

Choices: none, model, label, label_else_model.

Default: label_else_model.

-fixedCands, --fixed-candidates-file

File of cands to use for prediction

BPEHelper Arguments

Argument

Description

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge

--bpe-dropout

Use BPE dropout during training.