You are on page 1of 12

26/10/2017 How to use VBA to View PDF file and use an OCR object?

Developer Network
Sign in Subscriber portal Get tools

Downloads Programs Community Documentation

Ask a question Search related threads Search forum questions

Quick access

Asked by: How to use VBA to View PDF file and use an OCR object?

0 Microsoft ISV Community Center>Visual Basic for Applications (VBA)


Points
Question

MaerDam There are problems to view PDF with VBA.


Joined Sep 2012
I have 2 questions:
MaerDam's threads
1. How to get text contents from PDF via VBA.
1 Show activity
0 2. If PDF is a scaned file, is there any OCR object to convert image to text and get the contents?
Sign
Top related threads in to
vote
How read PDF File using VBA
Monday, November 19, 2012 11:57 AM
how to use VBA Access to export a
report to a path and pdf file? Reply | Quote
MaerDam 0 Points
How to use Project Oxford OCR for
PDF file conversions?
Search - Crawl Index PDF with OCR All replies
Using Optical Character Recognition
(OCR) with an html objects

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 1/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

Hi MaerDam,

If you have OneNote, you can paste the scanned image onto a OneNote page and have that convert the
image to text.
0
Sign Regards, Jan Karel Pieterse|Excel MVP|http://www.jkp-ads.com
in to
vote
Monday, November 19, 2012 12:12 PM

Reply | Quote

Jan Karel Pieterse [MVP] JKP Application Developm... (MCC, Partner, MVP) 2,815 Points

Hi Jan,

Open PDF file, OCR, find out key words should be completed by one macro. Which means that I click one
button, all job will be done.
0
Sign
in to Monday, November 19, 2012 2:35 PM
vote
Reply | Quote
MaerDam 0 Points

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 2/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

Hi MaerDam,

Have you got OneNote? If so, perhaps it is better to ask your question in a forum about onenote.

0 Regards, Jan Karel Pieterse|Excel MVP|http://www.jkp-ads.com


Sign
in to
vote Monday, November 19, 2012 3:34 PM

Reply | Quote

Jan Karel Pieterse [MVP] JKP Application Developm... (MCC, Partner, MVP) 2,815 Points

VBA can't do what you want so you need to get a utility app that can do what you want and be controlled
by VBA. You have some research ahead of you!

Rod Gill
0
Sign The one and only Project VBA Book
in to
vote
Rod Gill Project Management

Monday, November 19, 2012 8:12 PM

Reply | Quote Rod Gill ACE Project Systems Limited 10,530 Points

You should to have fill version of Acrobat (not reader)

Take look on this procedure:

0 Sub test_with_PDF()
Sign Dim objApp As Object
https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 3/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?
Sign
in to Dim objPDDoc As Object
vote
Dim objjso As Object
Dim wordsCount As Long
Dim page As Long
Dim i As Long
Dim strData As String
Dim strFileName As String

strFileName = "C:\Temp\File.pdf"

Set objApp = CreateObject("AcroExch.App")


Set objPDDoc = CreateObject("AcroExch.PDDoc")
'AD.1 open file, if =false file is damage
If objPDDoc.Open(strFileName) Then
Set objjso = objPDDoc.GetJSObject
For page = 0 To objPDDoc.GetNumPages - 1
wordsCount = objjso.GetPageNumWords(page)
For i = 0 To wordsCount
'AD.2 Set text to variable strData
strData = strData & " " & objjso.getPageNthWord(page, i)
Next i
Next
MsgBox strData
Else
MsgBox "error!"
End If
End Sub

Oskar Shon, Office System MVP

Press if Helpful; Answer when a problem solved

Monday, November 19, 2012 9:39 PM

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 4/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

Reply | Quote
VBATools Veracomp SA (MCC, MVP) 39,964 Points

Hi MaerDam,

Have you got OneNote? If so, perhaps it is better to ask your question in a forum about onenote.

0
Sign Regards, Jan Karel Pieterse|Excel MVP|http://www.jkp-ads.com
in to
vote Yes, I have got OneNote already. How to use OCR function of OneNote via VBA?

Tuesday, November 20, 2012 2:20 AM

Reply | Quote
MaerDam 0 Points

Hi MaerDam,

How to use OCR function of OneNote via VBA?

0
Sign Ask your question here:
in to
vote http://answers.microsoft.com/en-us/office/forum/onenote

Regards, Jan Karel Pieterse|Excel MVP|http://www.jkp-ads.com

Tuesday, November 20, 2012 6:35 AM

Reply | Quote

Jan Karel Pieterse [MVP] JKP Application Developm... (MCC, Partner, MVP) 2,815 Points

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 5/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

I did this several years ago using Office 2007. Office 2003/2007 comes with an OCR scanner. It is under
Start >> Microsoft Office >> Microsoft Tools >> Document Imaging. It takes tif files and converts them to
word docs. You can control it with VBA. It only took a few lines of code to work. I used a free command
line program called ghostview to convert the pdf to tif. I used VBA to call ghostview using win32
0 commands. I think I did it page by page but maybe it works with the whole document. I seem to
Sign remember getting reasonable results. The pdf was a scan of a lost document. I set it all up in less than a
in to
vote day. I remember playing with the resolution of the tif document to improve the results.

Tuesday, November 20, 2012 4:31 PM

Reply | Quote
mogulman52 2,160 Points

but it is not workable in office 2010..

Monday, December 10, 2012 7:13 AM

0 Reply | Quote
MaerDam 0 Points
Sign
in to
vote

HiOskar Shon, thanks a lot for your answer.

Are the codes will get all the words from a PDF file?

0
Sign Monday, December 10, 2012 7:16 AM
in to
vote Reply | Quote
MaerDam 0 Points

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 6/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

Note that Microsoft have not supported their MODI ( ocr imaging ) in 2010 versions.. works OK in 2003 (
Library 11) and 2007 (library 12)

C:\programfiles\commonfiles\microsoftshared\modi\12.0\modivc12.dll
0
or 11 not 12 for 2003
Sign
in to
vote .. has a excellent compiled help file from MS with good VBA examples.. an excellent OCR for images

Kills my version of excell 2010

A-pdf.com has one has a-pdf text extractor as a free offer at the moment

http://www.a-pdf.com

It does a good job on text from a pdf.. ( any better free about ??)

But does any one know how to get MODI or some equal OCR working with excel 2012

MS has an article on this but it works to set up 2007 byt not for me on 2010

So that Image text may be extracted.. without going to PDF or using acrobat

farmer

Thursday, December 13, 2012 8:10 PM

Reply | Quote
Harry Soren being retired 5 Points

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 7/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

You could use tesseract-ocr, an open free OCR program. I think Google uses it. There is a Windows
version.

0 http://code.google.com/p/tesseract-ocr/downloads/list
Sign
in to You could use ghostview to convert pdf to tiff. You can call the programs using Win32 commands in VBA.
vote

Saturday, December 15, 2012 9:06 PM

Reply | Quote
mogulman52 2,160 Points

If works only with full ver. of Acrobate (course if PDF is not from image).

Oskar Shon, Office System MVP

0 Press if Helpful; Answer when a problem solved


Sign
in to
vote
Sunday, December 16, 2012 10:02 PM

Reply | Quote
VBATools Veracomp SA (MCC, MVP) 39,964 Points

You should to have fill version of Acrobat (not reader)

Take look on this procedure:

0 Sub test_with_PDF()
Sign
in to Dim objApp As Object
vote Dim objPDDoc As Object
Dim objjso As Object
https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 8/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?
Dim objjso As Object
Dim wordsCount As Long
Dim page As Long
Dim i As Long
Dim strData As String
Dim strFileName As String

strFileName = "C:\Temp\File.pdf"

Set objApp = CreateObject("AcroExch.App")


Set objPDDoc = CreateObject("AcroExch.PDDoc")
'AD.1 open file, if =false file is damage
If objPDDoc.Open(strFileName) Then
Set objjso = objPDDoc.GetJSObject
For page = 0 To objPDDoc.GetNumPages - 1
wordsCount = objjso.GetPageNumWords(page)
For i = 0 To wordsCount
'AD.2 Set text to variable strData
strData = strData & " " & objjso.getPageNthWord(page, i)
Next i
Next
MsgBox strData
Else
MsgBox "error!"
End If
End Sub

Oskar Shon, Office System MVP

Press if Helpful; Answer when a problem solved

Hi Oskar, How to output the contents with format?

Monday, December 17, 2012 7:28 AM

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 9/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

Reply | Quote
MaerDam 0 Points

You have seen two loop inside code.

Variable strData collects strings from PDF file.

You can cut that to pieces, if you need for or in loop write somewhere page by page.
0
Sign
in to Oskar Shon, Office System MVP
vote
Press if Helpful; Answer when a problem solved

Monday, December 17, 2012 1:00 PM

Reply | Quote
VBATools Veracomp SA (MCC, MVP) 39,964 Points

Got it! Thanks a lot.

I can use instr function to find the key words..

But I find the codes cannot outputCharacters. And if I need to output the tables in PDF. Can it be
0
achieved?
Sign
in to
vote

Tuesday, December 18, 2012 3:10 AM

Reply | Quote
MaerDam 0 Points

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 10/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

You can use Split Function to find and cut sentence from the string.

Add to variant or to collection.

0 Oskar Shon, Office System MVP


Sign
in to Press if Helpful; Answer when a problem solved
vote

Tuesday, December 18, 2012 8:29 AM

Reply | Quote
VBATools Veracomp SA (MCC, MVP) 39,964 Points

Help us improve MSDN. Make a suggestion

Dev centers Learning resources Community Support


Microsoft Virtual Academy Forums Self support
Windows
Channel 9 Blogs

Office MSDN Magazine Codeplex

Visual Studio
Programs
Microsoft Azure BizSpark (for startups)
Microsoft Imagine (for students)
More...

United States (English) Newsletter Privacy & cookies Terms of use Trademarks 2017 Microsoft

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 11/12
26/10/2017 How to use VBA to View PDF file and use an OCR object?

https://social.msdn.microsoft.com/Forums/en-US/2c2bb856-d8d9-4624-80a1-2425a5660e8c/how-to-use-vba-to-view-pdf-file-and-use-an-ocr-object?forum=isvvba 12/12

You might also like