
A study published by EleutherAI demonstrates that when two TopK Sparse Autoencoders (SAEs) are trained on the same dataset with identical batch orders but different random initializations, they do not learn the same features. The research indicates that only 53% of the features are shared between the two models, with narrower SAEs showing higher feature overlap. The findings suggest that as the size of the SAE increases, the overlap of learned features decreases, highlighting the variability in feature learning based on model architecture and initialization. This has implications for understanding feature representation in neural networks and could influence future model training strategies.
Read original