Which text encoding model you are using in this code?

In your paper, it says

> At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?