Institute of Science Tokyo · AI / interpretability

Tyrone White

Swiss-Rwandan master's student in AI at the Institute of Science Tokyo. I work on mechanistic interpretability of large language models from a linguistics-oriented perspective: grammatical generalization, syntactic structure, lexical frequency effects, and how linguistic knowledge is represented inside neural networks.

Research Projects Blog Email

Tyrone White standing on the bridge to Ukimidō, a temple on Lake Biwa.

Now

Updated May 2026

Researching grammatical generalization in LLMs: how lexical frequency shapes grammatical preferences, and whether models stay robust when minimal pairs use rare words. Manuscript under review for EMNLP 2026.
Studying for the JLPT N1 (sitting it this summer).
Reading The Three-Body Problem.

Recent

All research →

2026 Under review

Lexical frequency and grammatical generalization in LLMs

Under review (ARR 2026 / EMNLP 2026)
Feb 2026

Large Language Models Are Robust to Low-Frequency Words in Grammatical Evaluation

言語処理学会 (NLP) 2026, poster

Selected projects

All projects →

Twenty Questions, Interpreted

2026

A mechanistic-interpretability study of whether an LLM truly commits to a secret in 20 Questions, using linear probes, activation patching, steering, and sparse autoencoders on Gemma-3.

interpretabilityLLMsresearch

Read the writeup

Two Heads or One? Multi-Agent LLM Reasoning

2025

Bachelor's thesis (UZH). Tests whether gains in multi-agent LLM reasoning come from genuinely separate model instances or just role-based perspective diversity. It compares two DeepSeek-V3 instances against a single model alternating roles, across Debate / Cooperative / Teacher-Student strategies on AIME, GPQA Diamond, and LiveBench Reasoning. Model separation helped most in critique-oriented dialogue; cooperative settings didn't require true independence.

LLMsmulti-agentreasoningthesis

Thesis (PDF)

Lexicon Meets Prosody

2025

Classifies overlapping speech in spontaneous multi-party conversation (AMI Meeting Corpus) as cooperative (e.g. backchannels) or competitive (e.g. interruptions). Combines Wav2Vec audio embeddings with lexical sentence embeddings from noisy ASR, trained via a weakly-supervised labeling pipeline (heuristics + LLM-assisted annotation). Adding lexical features improved performance, though competitive overlaps stayed hard.

speechASRclassification

Paper (PDF)

From the blog

All posts →

May 30, 2026

Summoned by the Question

What mechanistic interpretability says about whether an LLM commits to a secret in 20 Questions.

interpretabilityLLMsmech-interp

Read