Transformer and BERT

What are the three different embeddings that are generated from an input sentence in a Transformer model?

Token, segment, and position embeddings

2.What kind of transformer model is BERT?

Encoder-only model

3.What is the name of the language modeling technique that is used in Bidirectional Encoder Representations from Transformers (BERT)?

Transformer

4.What does fine-tuning a BERT model mean?

Training the model and updating the pre-trained weights on a specific task by using labeled data

5.What are the encoder and decoder components of a transformer model?

The encoder ingests an input sequence and produces a sequence of hidden states. The decoder takes in the hidden states from the encoder and produces an output sequence.

6.What are the two sublayers of each encoder in a Transformer model?

Self-attention and feedforward

7.What is the attention mechanism?

A way of determining the importance of each word in a sentence for the translation of another sentence

8.What is a transformer model?

A deep learning model that uses self-attention to learn relationships between different parts of a sequence.

Author: Ajay Ohri

http://about.me/ajayohri

Leave a comment