Analyzing the Tatoeba dataset
Tatoeba is a website to crowdsource sentences translated in several languages, a resource that is very useful to language learners or people interested in NLP.
I am a contributor and an user of Tatoeba, where I mostly translate sentences to Italian. In 2020 Tatoeba organized an event called Kodoeba to which I participated with an automated cloze deletion tool.
In this article I’m going to analyze the Tatoeba dataset and build some charts from it.
...