Compare PDF to Markdown Extraction Using the JFK Files

With the release of the JFK Files, it provided a robust set of real-world examples of scanned and handwritten PDFs.

Given the variety of API services, as well as visual LLMs, for PDF to Markdown extraction, here we will compare the output from an example PDF.

This PDF is three pages long, and appears to be a scanned classified message form.

For each output example below, we have copy/pasted the output Markdown into Markdown Live Preview and taken a screenshot of the formatted Markdown.

Using Graphlit

This comparison shows the diversity of PDF extraction results, across available APIs and visual LLMs.

You will need to evaluate the proper solution based on the layout and type of content that you are starting with.

Also, for each of these results, they come with a difference in cost per page, depending on the compute required.

Please email any questions on this article or the Graphlit Platform to questions@graphlit.com.