- Published on
Logit Evolution - How the model decides
- Authors
- Name
- Rene Claus
The following figures explore how the model decides what the next token is. By examining how the model converges to the tokens it finally predicts, we can gain insights into how the model might be achieving that.
These figures use this arbitrary test sentence as input1. The sliders under each figure allow the plots to focus on any of the individual tokens within this text.
Figure 1 studies which layers and blocks affect the next token prediction. It focuses on the tokens the model assigns the highest score and examines how that score changes with each attention and feed forward block. This intermediate score is calculated by skipping subsequent attention and feed forward blocks and projecting the intermediate hidden_state directly against the final weight matrix.
Figure 1: Logit Evolution
Each curve plots how the score of a possible next token prediction changes through stages of the model. A point midway along a curve is the score the model would assign to that token if we stopped computing the model at that particular layer or block. These 10 tokens are those that the complete model gives the highest score.
Light segments correspond to the effect of the attention block, while the dark segments are the effect of the feed forward network.
The score examined in Figure 1 can be a bit difficult to understand since it only matters compared to all the other tokens--if the score goes up, but all other tokens' scores also went up, then the model is not actually increasing its estimate of the likelihood of that token. Figure 2 and Figure 3 address this by examining the how the rank of that token compared to all other tokens changes. Figure 2 focuses on how that rank changes through the stages of the model--providing insights about when the model decides and which parts of the model have the biggest contribution. Figure 3 shows how the score compares to tokens the model considers less likely, providing insights on how different these top tokens are from less likely predictions.
Figure 2: Rank Ordering Evolution
This figure visualizes at what point in the model it decides what the correct next token is. The rank of a token compared to all possible tokens is plotted along the y-axis, while progress through the model is along the x-axis. A large rank indicates that there are many tokens that the model considers more likely.
The x-axis corresponds to the layer and light segments highlight the effect of the attention block, while the dark segments indicate the effect of the feed forward network.
Figure 3 shows how the top next token predictions fit into the distribution of all possible next tokens. Moving the top slider selects different stages of the model, visualizing how the eventual predictions separate from all the other possible next tokens.
It is also interesting to note that, while the mean and standard deviation are normalized, the overall shape of the distribution in Figure 3 stays relatively similar across stages of the model.
Figure 3: Histogram of Logits
This figure shows the distribution of logits across all the possible output tokens. The output tokens shown in Figure 1 and Figure 2 are shown in a lighter color to show where they are in the distribution. The distribution has been normalized to have the same meand and standard deviation across all layers (the top slider).
Hovering over each bin shows a few sample tokens from that bin.
Footnotes
See Experiment Setup - Dataset for information about where this sentence came from. ↩