Starspace

This agent contains a simple implementation of the starspace algorithm, slightly adapted for dialogue. To learn more about the starspace algorithm, see this paper.

Basic Examples

Train a starspace model on the “sentence SQuAD” task.

parlai train_model --task squad:sentence --model starspace -lr 0.01 -esz 512 -k 10 -mf /tmp/starspacesquad

DictionaryAgent Options

BPEHelper Arguments

Argument

Description

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge

--bpe-dropout

Use BPE dropout during training.

StarspaceAgent Options

StarSpace Arguments

Argument

Description

--embedding-type, --emb

Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training.

Choices: random, glove, glove-fixed, fasttext, fasttext-fixed, fasttext_cc, fasttext_cc-fixed.

Default: random.

--embeddingsize, --esz

Size of the token embeddings

Default: 128.

--embeddingnorm, --enorm

Max norm of word embeddings

Default: 10.

--share-embeddings, --shareEmb

Whether LHS and RHS share embeddings

Default: True.

--lins

If set to 1, add a linear layer between lhs and rhs.

Default: 0.

--learningrate, --lr

Learning rate

Default: 0.1.

--margin, --margin

Margin

Default: 0.1.

--input-dropout

Fraction of input/output features to dropout during training

Default: 0.

--optimizer, --opt

Choose between pytorch optimizers. Any member of torch.optim is valid and will be used with default params except learning rate (as specified by -lr).

Choices: adadelta, adagrad, adam, adamax, asgd, lbfgs, rmsprop, rprop, sgd.

Default: sgd.

--truncate, --tr

Truncate input & output lengths to speed up training (may reduce accuracy). This fixes all input and output to have a maximum length.

Default: -1.

--neg-samples, --k

Number k of negative samples per example

Default: 10.

--parrot-neg

Include query as a negative

Default: 0.

--tfidf

Use frequency based normalization for embeddings.

Default: False.

--cache-size, --cs

Size of negative sample cache to draw from

Default: 1000.

--history-length, --hist

Number of past tokens to remember.

Default: 10000.

--history-replies, --histr

Keep replies in the history, or not.

Choices: none, model, label, label_else_model.

Default: label_else_model.

--fixed-candidates-file, --fixedCands

File of cands to use for prediction

BPEHelper Arguments

Argument

Description

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge

--bpe-dropout

Use BPE dropout during training.