• May 28, 2020

    Introduction to Snakemake

    In this tutorial, we will learn how to operate snakemake to create executable workflows.

  • Jul 9, 2019

    Parsing TEI XML documents with Python

    In the previous blogpost, we learned about GROBID which outputs TEI XMLs from PDFs as input. We now attain some hand-on experience with juggling TEI XML documents.

  • Jul 1, 2019

    GROBID: Structured text from PDFs

    In this post, we learn how to turn a pdf into a structured text document. To this end, we will use a tool called GROBID outputting a corresponding XML document for each pdf. This approach has these advantages over OCR techniques to be

    I’ll conclude with a brief discussion of the TEI format (semi)structuring a PDF and with an application of GROBID.

  • Jun 21, 2019

    Welcome!

    Hi there, I am Max and thank you for your time checking out my new blog about code, data and open science. In this blogpost, I will give a brief overview about me and this blog’s objective.