Syntax highlighting with my Tree-sitter grammar In the last months I worked on a new project, a dashboard/blog/dataviz experiment focused on the city of Milan, and doing so started focusing on ways to automate and validate the integration of WebAssembly, DuckDB SQL, JS, Markdown and other things.
A concept I come across again and again playing with this is parsing. I use tools to parse Markdown, transpile Typescript into something the browser understands, translate schemas from JSON to Parquet to Vega-lite charts and a lot of other transformations....
There are few things developers love more than having to handle timezones. One of them is having to handle timezones in different environments!
Lately I had to deal with some timezone operations across Python and Postgres and decided to document here the shenanigans and quirks of the two systems and how I try to avoid them.
TIMESTAMP WITH TIME ZONE does NOT store a timezone This is something I knew already but it irks me every time I remember it exists....
For quite some time I entertained the idea of implementing an emulator. My knowledge of low level programming is mostly teoretical and this would be a good chance to learn more, and also to experiment with optimizations I rarely encounter in my usual machine learning tasks (being based on libraries like Numpy and Scikit-learn which already take care of the heavier operations).
The Game Boy is an obvious candidate, being it a console I had as a kid, well documented and for which there are many existing implementations including a Python one....
NOTE: a complete interactive demo of the final result is here.
In the previous post of this series, we saw how to generate vector tiles starting from an OpenStreetMap PBF extract using Tilemaker.
After the article I refined the process and wrote a Python tool to automate it, adding the possibility to index named objects like streets and shops.
Usually, such a search would be performed using a geocoding service that can handle the full text search with all the nuances like alternative spellings, typos and ambiguities....
UPDATE: I created a Python tool to automate this process, including a refined style and packaging. I suggest using it.
In the previous post of this series we saw that an extract of the data from OpenStreetMap can be easily transformed into a set of raster tiles, essentially fragments of the map at different levels of zoom, arranged in a structure that enables a library like Leaflet.js to fetch them as needed when the user zooms and pans on the map....
In this article we are going to implement an interactive map that can be included in a fully static website.
By fully static I mean that the map does not rely on any external service nor a backend, it is just a bunch of files served directly by nginx (like this blog) or even a CDN. This approach is generally cheaper and simpler to operate, maintain and migrate, without depending on external services whose terms of use may change....
On April, 12th 2022 I willgive a talk at PyCon Berlin about how we use Postgres in a data science project at Flixbus.
These are the slides for this presentation, you can contact me on the conference Discord, Twitter, Github or in person at the venue.
Download the presentation...
Since quite some time I have an interest in GIS and rendering, and after experimenting with the two separately I decided to finally try and render geographical data from OpenStreetMap in 3D, focusing on a small scale never bigger than a city.
In this article I will go through the process of generating a triangle mesh from a building shape, rendering and exporting it in a format suitable for Blender or game engines like Godot....
The task of ingesting data into Postgres is a common one in my job as data engineer, and also in my side projects.
As such, I learned a few tricks that here I’m going to discuss, in particular related to ingesting data from Python and merging it with existing rows.
Before starting, I have to say the fastest way to insert data into a Postgres DB is the COPY command, which has a counterpart \copy on the psql CLI tool that is useful to invoke it remotely....
In this article I’ll explain how I populated the database that powers grammarquiz, a grammar quiz app that I created for the Kotoeba initiative. The code of the application is freely available.
The backstory As you may already know, Tatoeba is a database of sentences translated in different languages. The database is at this time (early 2021) almost 10 million sentences strong and keeps growing. The dataset can be downloaded and used with an open license, similar to Wikipedia or Openstreetmap, which makes it very interesting for users who, like me, have interest in NLP and languages....
Since a few years I regularly use Anki, a flashcard system, to memorize and remember information, in particular German words. This daily activity sometimes feels like a breeze and sometimes more like an endless chore. It requires focus, and some day I have the feeling that I’m forgetting words and concepts that normally are a piece of cake. Anki provides reports about the usage, which in my case don’t show any particular pattern (e....
Note: The whole code for this article is freely available on GitHub
A few weeks ago I saw a talk about Skorch, a library that wraps a PyTorch neural network to use it as a Scikit-learn model.
That is amazing: I can take an existing product based on, say, a random forest, and replace only the model without refactoring anything else: the fit and predict functions have the usual interface. On the other hand, I can use the powerful tools offered by Scikit-learn, like the grid search for hyperparameters and make_pipeline to apply encoders....