Research

SAEs Trained on Same Data Show Feature Variability

EleutherAI BlogDecember 12, 2024medium confidence

Why it matters

→Highlights the variability in feature learning among SAEs, which can impact model performance. • Provides insights into the relationship between model size and feature overlap, informing future architecture choices. • Suggests the need for careful consideration of initialization in training SAEs to achieve desired feature representations.

SAEs Trained on Same Data Show Feature Variability — ©EleutherAI Blog

A study published by EleutherAI demonstrates that when two TopK Sparse Autoencoders (SAEs) are trained on the same dataset with identical batch orders but different random initializations, they do not learn the same features. The research indicates that only 53% of the features are shared between the two models, with narrower SAEs showing higher feature overlap. The findings suggest that as the size of the SAE increases, the overlap of learned features decreases, highlighting the variability in feature learning based on model architecture and initialization. This has implications for understanding feature representation in neural networks and could influence future model training strategies.

Read original

SAEs Trained on Same Data Show Feature Variability

Why it matters

More in Research

Beacon Biosignals maps brain activity during sleep

MIT Student Explores Language and AI Intersections

Red-teaming AI agent networks reveals new vulnerabilities