98 - Analyzing Information Flow In Transformers, With Elena Voita by NLP Highlights

published on 2019-12-09T17:51:08Z

98 - Analyzing Information Flow In Transformers, With Elena Voita What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model. Lena’s homepage: https://lena-voita.github.io/ Blog posts: https://lena-voita.github.io/posts/acl19_heads.html https://lena-voita.github.io/posts/emnlp19_evolution.html Papers: https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380

Genre: Science

Comment by justHeuristic

Probably the best piece of insight into deep NLP since Karpathy's RNN guide! Btw if you're looking for extended version, there are blog posts with cool visualizations for both papers in Lena's blog: lena-voita.github.io/posts.html

2019-12-09T19:47:48Z

Your current browser isn't compatible with SoundCloud.
Please download one of our supported browsers. Need help?

Chrome | Firefox | Safari | Edge

Sorry! Something went wrong

Is your network connection unstable or browser outdated?

I need help