this is _default loayout

Recent Post

Analyzing the Tatoeba dataset

Analyzing the Tatoeba dataset

Tatoeba is a website to crowdsource sentences translated in several languages, a resource that is very useful to language learners or people interested in NLP.

I am a contributor and an user of Tatoeba, where I mostly translate sentences to Italian. In 2020 Tatoeba organized an event called Kodoeba to which I participated with an automated cloze deletion tool.

In this article I’m going to analyze the Tatoeba dataset and build some charts from it.

...