Logit Evolution - How the model decides

The following figures explore how the model decides what the next token is. By examining how the model converges to the tokens it finally predicts, we can gain insights into how the model might be achieving that.

These figures use this arbitrary test sentence as input¹. The sliders under each figure allow the plots to focus on any of the individual tokens within this text.

Community Involvement. Kraemer Mining & Materials is committed to being a strong community partner by working with citizens, businesses and government to be a positive and contributing neighbor.. Kraemer believes in being involved in enhancing and supporting the communities where employees work and live. Each year, Kraemer contributes financially to non-profit organizations and offers in-kind support (donated project management services, labor and materials, either at cost or at no charge) for community projects.. Kraemer proactively seeks ways to strengthen relationships with the community and is proud to stand behind the following charities and organizations through product and financial donations: Burnsville Chamber of Commerce, Burnsville Fire Muster, City of Burnsville, Water Treatment Plant, Minnesota State Patrol Troopers Association, Minnesota Chamber of Commerce, Burnsville Robotics, Wounded Warrior Project, Burnsville Community Foundation-Police Donation, Boy Scouts of America, and Wings of The North.

Figure 1 studies which layers and blocks affect the next token prediction. It focuses on the tokens the model assigns the highest score and examines how that score changes with each attention and feed forward block. This intermediate score is calculated by skipping subsequent attention and feed forward blocks and projecting the intermediate hidden_state directly against the final weight matrix.

token93

Figure 1: Logit Evolution

Each curve plots how the score of a possible next token prediction changes through stages of the model. A point midway along a curve is the score the model would assign to that token if we stopped computing the model at that particular layer or block. These 10 tokens are those that the complete model gives the highest score.

Light segments correspond to the effect of the attention block, while the dark segments are the effect of the feed forward network.

The score examined in Figure 1 can be a bit difficult to understand since it only matters compared to all the other tokens--if the score goes up, but all other tokens' scores also went up, then the model is not actually increasing its estimate of the likelihood of that token. Figure 2 and Figure 3 address this by examining the how the rank of that token compared to all other tokens changes. Figure 2 focuses on how that rank changes through the stages of the model--providing insights about when the model decides and which parts of the model have the biggest contribution. Figure 3 shows how the score compares to tokens the model considers less likely, providing insights on how different these top tokens are from less likely predictions.

token93

Figure 2: Rank Ordering Evolution

This figure visualizes at what point in the model it decides what the correct next token is. The rank of a token compared to all possible tokens is plotted along the y-axis, while progress through the model is along the x-axis. A large rank indicates that there are many tokens that the model considers more likely.

The x-axis corresponds to the layer and light segments highlight the effect of the attention block, while the dark segments indicate the effect of the feed forward network.

Figure 3 shows how the top next token predictions fit into the distribution of all possible next tokens. Moving the top slider selects different stages of the model, visualizing how the eventual predictions separate from all the other possible next tokens.

It is also interesting to note that, while the mean and standard deviation are normalized, the overall shape of the distribution in Figure 3 stays relatively similar across stages of the model.

token93

Figure 3: Histogram of Logits

This figure shows the distribution of logits across all the possible output tokens. The output tokens shown in Figure 1 and Figure 2 are shown in a lighter color to show where they are in the distribution. The distribution has been normalized to have the same meand and standard deviation across all layers (the top slider).

Hovering over each bin shows a few sample tokens from that bin.

Footnotes

See Experiment Setup - Dataset for information about where this sentence came from. ↩