Using PDF (unstructured data) as a source

In some cases, you may want to convert a PDF file into a text document. Our strategy for this implementation would be to create a custom plugin to convert a PDF file into a tab delimited text file. The PDF represents unstructured data and in order to get the data from PDF in a structured format, it must be interpreted according to the screen graphics(x and y coordinates).

The attached PDF to Text Conversion Usage Guide provides the API that can be used to transform a PDF document into a tab delimited text file. From the tab delimited text file, the data can then be converted to any format necessary.

In order to use the methods provided in the document, you'll need to place the attached jar files (PDFjars.zip) within the \AdeptiaServer\ServerKernel\ext directory. You can create a sub-folder such as "PDFconversion" to hold the jar files. Once this is completed, restart the Adeptia services and you'll be able to develop your custom plugin.

Have more questions? Submit a request

0 Comments

Article is closed for comments.