Tatoeba is a website to crowdsource sentences translated in several languages, a resource that is
very useful to language learners or people interested in NLP.
I am a contributor and an user of Tatoeba, where I mostly translate sentences to Italian.
In 2020 Tatoeba organized an event called Kodoeba
to which I participated with an automated cloze deletion tool.
In this article I’m going to analyze the Tatoeba dataset and build some charts from it.
A few weeks ago I saw a talk about Skorch, a library that wraps a PyTorch neural network to use it as a Scikit-learn model.
That is amazing: I can take an existing product based on, say, a random forest, and replace only the model without refactoring anything else: the fit and predict functions have the usual interface. On the other hand, I can use the powerful tools offered by Scikit-learn, like the grid search for hyperparameters and make_pipeline to apply encoders.
Note: this is an old article and while the software is still available and I think the idea is pretty you probably want to give a look to this. The software here described is probably more flexible, but harder to use.
Some months ago, I stumbled across this amazing article about transforming an arbitrary English text in a patent application. The underlying pattern library allows, among other nice things, to find patterns like “The [an adjective] [a noun] and the [a noun]” easily, look for hypernyms (“is a” relationships between expressed concepts, for example “animal” is an hypernym of “cat”) in WordNet, and conjugate verbs in various languages comprehending Italian.