Jacopo Farina's blog

Recent Posts

Making a fully static map, part 2: Vector tiles

Making a fully static map, part 2: Vector tiles

In the previous post of this series we saw that an extract of the data from OpenStreetMap can be easily transformed into a set of raster tiles, essentially fragments of the map at different levels of zoom, arranged in a structure that enables a library like Leaflet.js to fetch them as needed when the user zooms and pans on the map. NOTE: a complete demo of the final result is available on GitHub...

Making a fully static map, part 1: Generate raster tiles from QGIS

Making a fully static map, part 1: Generate raster tiles from QGIS

In this article we are going to implement an interactive map that can be included in a fully static website. By fully static I mean that the map does not rely on any external service nor a backend, it is just a bunch of files served directly by nginx (like this blog) or even a CDN. This approach is generally cheaper and simpler to operate, maintain and migrate, without depending on external services whose terms of use may change....

Lessons learned using Postgres in production

Lessons learned using Postgres in production

On April, 12th 2022 I willgive a talk at PyCon Berlin about how we use Postgres in a data science project at Flixbus. These are the slides for this presentation, you can contact me on the conference Discord, Twitter, Github or in person at the venue. Download the presentation...

Render a building in 3D from OpenStreetMap data

Render a building in 3D from OpenStreetMap data

Since quite some time I have an interest in GIS and rendering, and after experimenting with the two separately I decided to finally try and render geographical data from OpenStreetMap in 3D, focusing on a small scale never bigger than a city. In this article I will go through the process of generating a triangle mesh from a building shape, rendering and exporting it in a format suitable for Blender or game engines like Godot....

Insert data into Postgres. Fast.

Insert data into Postgres. Fast.

The task of ingesting data into Postgres is a common one in my job as data engineer, and also in my side projects. As such, I learned a few tricks that here I’m going to discuss, in particular related to ingesting data from Python and merging it with existing rows. Before starting, I have to say the fastest way to insert data into a Postgres DB is the COPY command, which has a counterpart \copy on the psql CLI tool that is useful to invoke it remotely....

Generate a grammar quiz in 300+ languages using simple NLP

Generate a grammar quiz in 300+ languages using simple NLP

In this article I’ll explain how I populated the database that powers grammarquiz, a grammar quiz app that I created for the Kotoeba initiative. The code of the application is freely available. The backstory As you may already know, Tatoeba is a database of sentences translated in different languages. The database is at this time (early 2021) almost 10 million sentences strong and keeps growing. The dataset can be downloaded and used with an open license, similar to Wikipedia or Openstreetmap, which makes it very interesting for users who, like me, have interest in NLP and languages....

Correlating sleep duration and flashcard performance

Correlating sleep duration and flashcard performance

Since a few years I regularly use Anki, a flashcard system, to memorize and remember information, in particular German words. This daily activity sometimes feels like a breeze and sometimes more like an endless chore. It requires focus, and some day I have the feeling that I’m forgetting words and concepts that normally are a piece of cake. Anki provides reports about the usage, which in my case don’t show any particular pattern (e....

Add punctuation to text using Skorch

Add punctuation to text using Skorch

Note: The whole code for this article is freely available on GitHub A few weeks ago I saw a talk about Skorch, a library that wraps a PyTorch neural network to use it as a Scikit-learn model. That is amazing: I can take an existing product based on, say, a random forest, and replace only the model without refactoring anything else: the fit and predict functions have the usual interface. On the other hand, I can use the powerful tools offered by Scikit-learn, like the grid search for hyperparameters and make_pipeline to apply encoders....

Create an animated heatmap from a Google location data Takeout export

Create an animated heatmap from a Google location data Takeout export

I love to go around by bike, and Berlin offers a good choice of paths to explore. However, after some year in the city I did realize there were areas I never visited and routes I did so often to become boring. Out of curiosity I tried to process my own location history to map the places I visited more often and, to tell the routine commute habit apart, visualize the time of the day of a visit....

Visualize the functioning of supervised learning models – part 4: Neural networks

Visualize the functioning of supervised learning models – part 4: Neural networks

After trying regression using k-neighbours, linear and SVR models, I wanted to conclude using neural networks. I did the 5 deep learning courses from Andrew Ng on Coursera to get a grasp of these models, and decided to use Keras. This library makes defining, training and applying a model quite easy, once one has an idea of what to use. Artificial neural networks The naming is suggestive and one may think the goal is to replicate the human brain, but it’s more akin to a generic and very flexible mathematical function....

Gensim: a generator is not an iterator

Gensim: a generator is not an iterator

When using Gensim word2vec on a dataset stored in a database, I was pleased to see the library accepts an iterator to represent the corpus, allowing to process bigger-than-memory datasets. So, I wrote my generator function to stream text directly from a database, and came across a strange message: TypeError: You can't pass a generator as the sentences argument. Try an iterator. Looking at the code of Gensim, this is intended and is for a good reason: while Gensim is fine with iterating over the dataset, it may need to iterate on it more than once....

Visualize the functioning of supervised learning models – part 3: K-neighbours and decision trees

Visualize the functioning of supervised learning models – part 3: K-neighbours and decision trees

After trying regression using linear and SVR models, I wanted to try other two methods offered by scikit-learn based on different principles: K nearest neighbor and decision trees. Nearest Neighbors K-nearest neighbor in this case is straightforward, with K=1 transforms the picture in a mosaic (a Voronoi diagram based on the sampled points): Nearest neighbor model with K=1 and 1000 samples increasing the value of K, the model will use more points to predict the color of each pixel, doing an average and as a consequence smoothing the zones:...