Institute of Science Tokyo · AI / interpretability

Tyrone White

Swiss-Rwandan master's student in AI at the Institute of Science Tokyo. I work on mechanistic interpretability of large language models from a linguistics-oriented perspective: grammatical generalization, syntactic structure, lexical frequency effects, and how linguistic knowledge is represented inside neural networks.

Tyrone White standing on the bridge to Ukimidō, a temple on Lake Biwa.

Now

Updated May 2026

Recent

All research →

Selected projects

All projects →

Twenty Questions, Interpreted

2026

A mechanistic-interpretability study of whether an LLM truly commits to a secret in 20 Questions, using linear probes, activation patching, steering, and sparse autoencoders on Gemma-3.

interpretabilityLLMsresearch

Two Heads or One? Multi-Agent LLM Reasoning

2025

Bachelor's thesis (UZH). Tests whether gains in multi-agent LLM reasoning come from genuinely separate model instances or just role-based perspective diversity. It compares two DeepSeek-V3 instances against a single model alternating roles, across Debate / Cooperative / Teacher-Student strategies on AIME, GPQA Diamond, and LiveBench Reasoning. Model separation helped most in critique-oriented dialogue; cooperative settings didn't require true independence.

LLMsmulti-agentreasoningthesis

Lexicon Meets Prosody

2025

Classifies overlapping speech in spontaneous multi-party conversation (AMI Meeting Corpus) as cooperative (e.g. backchannels) or competitive (e.g. interruptions). Combines Wav2Vec audio embeddings with lexical sentence embeddings from noisy ASR, trained via a weakly-supervised labeling pipeline (heuristics + LLM-assisted annotation). Adding lexical features improved performance, though competitive overlaps stayed hard.

speechASRclassification

From the blog

All posts →