MSWC Samples Combined by HSEAS published on 2021-12-16T16:56:40Z SEAS researchers built a dataset automation pipeline that can automatically identify and extract keywords and synthesize them into a dataset. To build the dataset, the team used recordings from Mozilla Common Voice, a massive global project that collects donated voice recordings in a wide variety of spoken languages. The researchers applied a machine learning algorithm that can recognize and pull keywords from recorded sentences in Common Voice. This is a compilation of those keywords. English: hello, light city, find, start German: hallo, licht, stadt, finden, beginnen Spanish: hola, luces, ciudad, encuentra, comenzar French: bonjour, lumière, ville, trouver, commencer