transformer_lens.utilities.exploratory_utils module¶

attribute_utils.

This module contains utility functions related to exploratory analysis

transformer_lens.utilities.exploratory_utils.test_prompt(prompt: str, answer: str | list[str], model, prepend_space_to_answer: bool = True, print_details: bool = True, prepend_bos: bool | None = None, top_k: int = 10) → None¶

Test if the Model Can Give the Correct Answer to a Prompt.

Intended for exploratory analysis. Prints out the performance on the answer (rank, logit, prob), as well as the top k tokens. Works for multi-token prompts and multi-token answers.

Warning:

This will print the results (it does not return them).

Examples:

>>> from transformer_lens import HookedTransformer, utilities
>>> model = HookedTransformer.from_pretrained("tiny-stories-1M")
Loaded pretrained model tiny-stories-1M into HookedTransformer

>>> prompt = "Why did the elephant cross the"
>>> answer = "road"
>>> utilities.test_prompt(prompt, answer, model)
Tokenized prompt: ['<|endoftext|>', 'Why', ' did', ' the', ' elephant', ' cross', ' the']
Tokenized answer: [' road']
Performance on answer token:
Rank: 2        Logit: 14.24 Prob:  3.51% Token: | road|
Top 0th token. Logit: 14.51 Prob:  4.59% Token: | ground|
Top 1th token. Logit: 14.41 Prob:  4.18% Token: | tree|
Top 2th token. Logit: 14.24 Prob:  3.51% Token: | road|
Top 3th token. Logit: 14.22 Prob:  3.45% Token: | car|
Top 4th token. Logit: 13.92 Prob:  2.55% Token: | river|
Top 5th token. Logit: 13.79 Prob:  2.25% Token: | street|
Top 6th token. Logit: 13.77 Prob:  2.21% Token: | k|
Top 7th token. Logit: 13.75 Prob:  2.16% Token: | hill|
Top 8th token. Logit: 13.64 Prob:  1.92% Token: | swing|
Top 9th token. Logit: 13.46 Prob:  1.61% Token: | park|
Ranks of the answer tokens: [(' road', 2)]

Parameters:

prompt – The prompt string, e.g. “Why did the elephant cross the”.
answer – The answer, e.g. “road”. Note that if you set prepend_space_to_answer to False, you need to think about if you have a space before the answer here (as e.g. in this example the answer may really be “ road” if the prompt ends without a trailing space). If this is a list of strings, then we only look at the next-token completion, and we compare them all as possible model answers.
model – The model.
prepend_space_to_answer – Whether or not to prepend a space to the answer. Note this will only ever prepend a space if the answer doesn’t already start with one.
print_details – Print the prompt (as a string but broken up by token), answer and top k tokens (all with logit, rank and probability).
prepend_bos – Overrides self.cfg.default_prepend_bos if set. Whether to prepend the BOS token to the input (applicable when input is a string). Models generally learn to use the BOS token as a resting place for attention heads (i.e. a way for them to be “turned off”). This therefore often improves performance slightly.
top_k – Top k tokens to print details of (when print_details is set to True).

Returns:

None (just prints the results directly).