What are the three different embeddings that are generated from an input sentence in a Transformer model?
Token, segment, and position embeddings
2.What kind of transformer model is BERT?
Encoder-only model
3.What is the name of the language modeling technique that is used in Bidirectional Encoder Representations from Transformers (BERT)?
Transformer
4.What does fine-tuning a BERT model mean?
Training the model and updating the pre-trained weights on a specific task by using labeled data
5.What are the encoder and decoder components of a transformer model?
The encoder ingests an input sequence and produces a sequence of hidden states. The decoder takes in the hidden states from the encoder and produces an output sequence.
6.What are the two sublayers of each encoder in a Transformer model?
Self-attention and feedforward
7.What is the attention mechanism?
A way of determining the importance of each word in a sentence for the translation of another sentence
8.What is a transformer model?
A deep learning model that uses self-attention to learn relationships between different parts of a sequence.