Jacopo Farina's blog

Recent Posts

Lessons learned using Postgres in production

Lessons learned using Postgres in production

On April, 12th 2022 I’ll give a talk at PyCon Berlin about how we use Postgres in a data science project at Flixbus. These are the slides for this presentation, you can contact me on the conference Discord, Twitter, Github or in person at the venue. Download the presentation...

Render a building in 3D from OpenStreetMap data

Render a building in 3D from OpenStreetMap data

Since quite some time I have an interest in GIS and rendering, and after experimenting with the two separately I decided to finally try and render geographical data from OpenStreetMap in 3D, focusing on a small scale never bigger than a city. In this article I will go through the process of generating a triangle mesh from a building shape, rendering and exporting it in a format suitable for Blender or game engines like Godot....

Insert data into Postgres. Fast.

Insert data into Postgres. Fast.

The task of ingesting data into Postgres is a common one in my job as data engineer, and also in my side projects. As such, I learned a few tricks that here I’m going to discuss, in particular related to ingesting data from Python and merging it with existing rows. Before starting, I have to say the fastest way to insert data into a Postgres DB is the COPY command, which has a counterpart \copy on the psql CLI tool that is useful to invoke it remotely....

Generate a grammar quiz in 300+ languages using simple NLP

Generate a grammar quiz in 300+ languages using simple NLP

In this article I’ll explain how I populated the database that powers grammarquiz, a grammar quiz app that I created for the Kotoeba initiative. The code of the application is freely available. The backstory As you may already know, Tatoeba is a database of sentences translated in different languages. The database is at this time (early 2021) almost 10 million sentences strong and keeps growing. The dataset can be downloaded and used with an open license, similar to Wikipedia or Openstreetmap, which makes it very interesting for users who, like me, have interest in NLP and languages....

Correlating sleep duration and flashcard performance

Correlating sleep duration and flashcard performance

Since a few years I regularly use Anki, a flashcard system, to memorize and remember information, in particular German words. This daily activity sometimes feels like a breeze and sometimes more like an endless chore. It requires focus, and some day I have the feeling that I’m forgetting words and concepts that normally are a piece of cake. Anki provides reports about the usage, which in my case don’t show any particular pattern (e....

Add punctuation to text using Skorch

Add punctuation to text using Skorch

Note: The whole code for this article is freely available on GitHub A few weeks ago I saw a talk about Skorch, a library that wraps a PyTorch neural network to use it as a Scikit-learn model. That is amazing: I can take an existing product based on, say, a random forest, and replace only the model without refactoring anything else: the fit and predict functions have the usual interface. On the other hand, I can use the powerful tools offered by Scikit-learn, like the grid search for hyperparameters and make_pipeline to apply encoders....

Create an animated heatmap from a Google location data Takeout export

Create an animated heatmap from a Google location data Takeout export

I love to go around by bike, and Berlin offers a good choice of paths to explore. However, after some year in the city I did realize there were areas I never visited and routes I did so often to become boring. Out of curiosity I tried to process my own location history to map the places I visited more often and, to tell the routine commute habit apart, visualize the time of the day of a visit....

Visualize the functioning of supervised learning models – part 4: Neural networks

Visualize the functioning of supervised learning models – part 4: Neural networks

After trying regression using k-neighbours, linear and SVR models, I wanted to conclude using neural networks. I did the 5 deep learning courses from Andrew Ng on Coursera to get a grasp of these models, and decided to use Keras. This library makes defining, training and applying a model quite easy, once one has an idea of what to use. Artificial neural networks The naming is suggestive and one may think the goal is to replicate the human brain, but it’s more akin to a generic and very flexible mathematical function....

Gensim: a generator is not an iterator

Gensim: a generator is not an iterator

When using Gensim word2vec on a dataset stored in a database, I was pleased to see the library accepts an iterator to represent the corpus, allowing to process bigger-than-memory datasets. So, I wrote my generator function to stream text directly from a database, and came across a strange message: TypeError: You can't pass a generator as the sentences argument. Try an iterator. Looking at the code of Gensim, this is intended and is for a good reason: while Gensim is fine with iterating over the dataset, it may need to iterate on it more than once....

Visualize the functioning of supervised learning models – part 3: K-neighbours and decision trees

Visualize the functioning of supervised learning models – part 3: K-neighbours and decision trees

After trying regression using linear and SVR models, I wanted to try other two methods offered by scikit-learn based on different principles: K nearest neighbor and decision trees. Nearest Neighbors K-nearest neighbor in this case is straightforward, with K=1 transforms the picture in a mosaic (a Voronoi diagram based on the sampled points): Nearest neighbor model with K=1 and 1000 samples increasing the value of K, the model will use more points to predict the color of each pixel, doing an average and as a consequence smoothing the zones:...

Visualize the functioning of supervised learning models – part 2 – SVR and GridSearchCV

Visualize the functioning of supervised learning models – part 2 – SVR and GridSearchCV

In the previous article we used a linear regression model to predict the color of an image pixel given a sample of other pixels, then used a hand-written function to enrich the coordinates and add non linearity, seeing how it improves the result. Without it, we can only get a gradient image. Again, all the code is visible in the notebook. I had fun playing with the enrichment function, but scikit-learn offers kernel methods out of the box....

Visualize the functioning of supervised learning models

Visualize the functioning of supervised learning models

ConvnetJS offers a demo of a neural network which paints an image learning to predict the color of a pixel given its coordinates. I liked the idea as it is immediate and visually appealing, and decided to create a visual comparison of various supervised learning models applied to this toy problem. In this and further articles will review the results All the code is in this Jupyter notebook. Given an image with the 3 RGB channels, a train and test dataset can be created just by sampling random pixels....