A downloadable tool

We identify the broad structure of a circuit that is associated with correctly predicting a gendered pronoun given the subject of a rhetorical question. Progress towards identifying this circuit is achieved through a variety of existing tools, namely Conmy’s Automatic Circuit Discovery and Nanda’s Exploratory Analysis tools.

We present this report, not only as a preliminary understanding of the broad structure of a gendered pronoun circuit, but also as (perhaps) a structured, re-implementable procedure (or maybe just naive inspiration) for identifying circuits for other tasks in large transformer language models.

Further work is warranted in refining the proposed circuit and better understanding the associated human-interpretable algorithm.

More information

Status	Released
Category	Tool
Author	cmathw

Download

Attention_Visualisation_Script.ipynb

Download

Automatic_Circuits_Script.ipynb 13 kB

Download

Hackathon Report_Chris_Guillaume.pdf 1.1 MB

Comments

cmathw2 years ago

Note dumb bug: The names dataset should not include names: Carol, Karen, Julie and Judy. These are all tokenized as 2 tokens (unlike others that are 1). Leaving this error in the notebook so results match report but watch out for these token errors :').

I believe prepend_bos should also be False across notebook to match ACDC implementation. Again leaving in notebook to match report. This work was produced as part of a weekend hackathon, beware of bugs :D

ArthurConmy2 years ago

The two token name fact doesn't seem true? https://colab.research.google.com/drive/17pU4A_DHH6GczbCwoVQcuQAIspjGhkRV?usp=sh...

Butanium2 years ago

Impressive work!

Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small

Download

Comments