Pdf to structured data
Sample 1: Student Admission List
Sample 1: Student Admission List
This project involved use of python scripts to convert tabular data within the pdf to a csv file. Tabula python package along with python pandas were used to create the out csv file as shown below. This approach is useful in number of applications such as exporting pdfs data to a database.
Pdf containing the data to be converted to CSV
Final output in CSV file
Sample 2: Pdf to word and HTML
Sample 2: Pdf to word and HTML
A pdf report was converted to html and word file formats using tabula and docx python packages. The resultant data can be used in analysis such as NLP or conversion of pdf model manuals to html/word documents.
Section of the original pdf report