Skip to Content

Attention Head Visualiser: Mapping What Each Head in GPT-2 Actually Does

NEO built an attention head probe that automatically classifies GPT-2 attention heads into copying, induction, previous-token, and retrieval behaviors, then generates interactive HTML reports with heatmaps and head passport cards.


Problem Statement

We asked NEO to: Build a tool that automates attention head analysis for GPT-2—classifying each head into interpretable behavior types (copying, induction, previous-token, retrieval), scoring confidence per head, and producing interactive reports so teams can diagnose model behavior, compare architectures, and teach attention mechanics without writing custom code.


Solution Overview

NEO built the Attention Head Visualiser with:

  1. Four Head Behaviors — Copying (exact token reproduction), induction (pattern completion), previous-token (local syntax), retrieval (factual associations from context)
  2. Vectorized Scoring — Per-behavior scores 0–1; load ~2s (GPT-2 small) to <20s (1.5B); scoring adds <8s
  3. GPT-2 Variants — 124M, 355M, 774M, 1.5B; auto device (CUDA, MPS, CPU)
  4. Interactive HTML Reports — Heatmaps, head passport cards, summary statistics and layer breakdowns

Attention Head Visualiser Pipeline Architecture

Workflow / Pipeline

StepDescription
1. Model LoadLoad GPT-2 variant (124M–1.5B); auto-detect CUDA / MPS / CPU
2. Behavior ScoringRun vectorized scoring per behavior type (induction, previous-token, copying, retrieval with entity spans)
3. Classification & ConfidenceAssign each head to a behavior; attach confidence score
4. Report GenerationOutput HTML with heatmaps, passport cards, aggregate stats, layer breakdowns

Head Behaviors Explained


Repository & Artifacts

dakshjain-1616/Attention-Head-VisualiserView on GitHub

Generated Artifacts:


Use Cases


References

View source on GitHub


Learn More