Wednesday, April 23, 2014

Parsing a PDF file in Excel

Every Linux distro comes with a handy utility called pdftotext. But you can use it on a windows machine, as well.  Using the browser of your choice, visit http://www.foolabs.com/xpdf/download.html, and download the precompiled xpdfbin-win-3.03.zip for x86 Windows.

Different windows versions and installs give you different default directories, so I'll tell you what I did.

1. The file you downloaded is a zip, so first unzip it.Then look in the subfolders - on my pc:

      C:\Users\Bruce\Documents\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64

     if you are running on a 32 bit OS, there is also a

      C:\Users\Bruce\Documents\xpdfbin-win-3.03\xpdfbin-win-3.03\bin32


2. Go to Start>Run and enter cmd. That puts me in C:\Users\Bruce. Then enter cd Documents. My dos prompt now says

      C:\Users\Bruce\Documents

    That maps to the Documents folder on the start menu. In an explorer window, copy the pdtotext.exe file from the folder in step 1 to your Documents folder.

3. Put your PDF doc in the Documents folder. Now, from the dos prompt, enter:

      pdftotext <filename>.pdf -layout

In the explorer window, you should now see a file named <filename>.txt

If that gives you the results you are looking for, then this excel macro will probably make things easier - just change the line:

        exe = "C:\Users\Bruce\Documents\pdftotext.exe"

to reflect where your exe ended up:


3 comments:

  1. I admit, I have not been on this web page in a long time... however it was another joy to see It is such an important topic and ignored by so many, even professionals. professionals. I thank you to help making people more aware of possible issues. pdf to excel

    ReplyDelete
  2. Making PDF records is an extraordinary programming highlight, yet changing over PDF documents into Microsoft Word DOC records that can be altered by Word is far better. altoconvertpngtopdf.com

    ReplyDelete
  3. Thusly, this kind of PDF record can't be printed by most business counterbalance printers. https://altopdf.com/blog/a-step-by-step-guide-to-editing-pdfs

    ReplyDelete