vba 从多个pdf文件复制数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28835164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Copying data from multiple pdf files
提问by Sam
I have pdf files from which I would like to copy all the data to a column in a spreadsheet.
我有 pdf 文件,我想从中将所有数据复制到电子表格中的一列。
Here is the code I have. All it does is open the pdf, use control-a, then control-c to copy then activates the workbook, finds an open column and pastes the data with a control-v Sendkey.
这是我的代码。它所做的只是打开 pdf,使用 control-a,然后使用 control-c 复制然后激活工作簿,找到一个打开的列并使用 control-v Sendkey 粘贴数据。
I have a range with path names it opens and copies data from all but only pastes the last one.
我有一个带有路径名的范围,它打开并复制所有数据但只粘贴最后一个。
Sub StartAdobe1()
Dim AdobeApp As String
Dim AdobeFile As String
Dim StartAdobe
Dim fname As Variant
Dim iRow As Integer
Dim Filename As String
For Each fname In Range("path")
AdobeApp = "C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe"
StartAdobe = Shell("" & AdobeApp & " " & fname & "", 1)
Application.Wait Now + TimeValue("00:00:01")
SendKeys "^a", True
Application.Wait Now + TimeValue("00:00:01")
SendKeys "^c"
Application.Wait Now + TimeValue("00:00:01")
SendKeys ("%{F4}")
Windows("transfer (Autosaved).xlsm").Activate
Worksheets("new").Activate
ActiveSheet.Range("A1").Select
Selection.End(xlToRight).Offset(0, 1).Select
SendKeys "^v"
Application.Wait Now + TimeValue("00:00:2")
Next fname
回答by jlookup
Jeanno's right, if you have Acrobat then using its API library to work with the file directly is much better than the workarounds. I use this every day to convert pdf files into database entries.
Jeanno 是对的,如果你有 Acrobat,那么使用它的 API 库直接处理文件比解决方法要好得多。我每天都使用它来将 pdf 文件转换为数据库条目。
Your code has a few problems, but I suspect the biggest issue is the use of SendKeys "^v"to paste into Excel. You're better off selecting the cell you want then using Selection.Paste. Or even better, transfer the contents of the clipboard to a variable, then parse it out as needed on the backend before writing to your spreadsheet--but that adds a bunch of complexity and doesn't help you a lot in this case.
您的代码有一些问题,但我怀疑最大的问题是使用SendKeys "^v"粘贴到 Excel 中。你最好选择你想要的单元格然后使用Selection.Paste. 或者更好的是,将剪贴板的内容传输到一个变量,然后在写入电子表格之前根据需要在后端解析它——但这增加了一堆复杂性,在这种情况下对你没有多大帮助。
To use the code below, be sure to select your 'Acrobat x.x Type Library' under Tools>References.
要使用下面的代码,请确保在工具>参考下选择您的“Acrobat xx 类型库”。
Sub StartAdobe1()
Dim fName As Variant
Dim wbTransfer As Excel.Workbook
Dim wsNew As Excel.Worksheet
Dim dOpenCol As Double
Dim oPDFApp As AcroApp
Dim oAVDoc As AcroAVDoc
Dim oPDDoc As AcroPDDoc
'Define your spreadsheet
Set wbTransfer = Workbooks("transfer (Autosaved).xlsm")
Set wsNew = wbTransfer.Sheets("new")
'Find first open column
dOpenCol = ws.Cells(1, columns.count).End(xlToleft).Column + 1
'Instantiate Acrobat Objects
Set oPDFApp = CreateObject("AcroExch.App")
Set oAVDoc = CreateObject("AcroExch.AVDoc")
Set oPDDoc = CreateObject("AcroExch.PDDoc")
For Each fName In Range("path")
'Open the PDF file. The AcroAVDoc.Open function returns a true/false
'to tell you if it worked
If oAVDoc.Open(fName.Text, "") = True Then
Set oPDDoc = oAVDoc.GetPDDoc
Else
Debug.Assert False
End If
'Copy all using Acrobat menu
oPDFApp.MenuItemExecute ("SelectAll")
oPDFApp.MenuItemExecute ("Copy")
'Paste into open column
wbTransfer.Activate
wsNew.Cells(1, dOpenCol).Select
ActiveSheet.Paste
'Select next open column
dOpenCol = dOpenCol + 1
oAVDoc.Close (1) '(1)=Do not save changes
oPDDoc.Close
Next
'Clean up
Set wbTransfer = Nothing
Set wsNew = Nothing
Set oPDFApp = Nothing
Set oAVDoc = Nothing
Set oPDDoc = Nothing
End Sub
Note:
1-There is also a menu item oPDFApp.MenuItemExecute ("CopyFileToClipboard")that should do the select all and copy in one step, but I have had problems with it so I stick to the two-step method above.
注意:1-还有一个菜单项oPDFApp.MenuItemExecute ("CopyFileToClipboard")应该一步完成全选和复制,但我遇到了问题,所以我坚持上面的两步方法。
2-A pdf file consists of two objects, the oAVDocand the oPDDoc. Different aspects of the file are controlled by each. In this case you might only need the oAVDoc. Try commenting out the lines dealing with oPDDocand see if it works without them.
2-一个 pdf 文件由两个对象组成,oAVDoc和oPDDoc。文件的不同方面由每个方面控制。在这种情况下,您可能只需要oAVDoc. 尝试注释掉处理的行oPDDoc,看看它是否在没有它们的情况下工作。
回答by OpiesDad
I can't quite get your code to work, but my guess is that it's copying all of the data, but overwriting it each time through the loop. To fix this try:
我无法让您的代码正常工作,但我的猜测是它正在复制所有数据,但每次通过循环都会覆盖它。要解决此问题,请尝试:
ActiveSheet.Cells(1, ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column + 1).Select
instead of the two lines that begin activesheet.range("A1").Select and Selection.End....
而不是以 activesheet.range("A1").Select 和 Selection.End .... 开头的两行
回答by saurabh255
try this code this might work:
试试这个可能有效的代码:
Sub Shell_Copy_Paste()
Dim o As Variant
Dim wkSheet As Worksheet
Set wkSheet = ActiveSheet
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\red.pdf", vbNormalFocus)
Application.Wait (Now + TimeSerial(0, 0, 2)) 'Wait for Acrobat to load
SendKeys "^a" 'Select All
SendKeys "^c" 'Copy
SendKeys "%{F4}" 'Close shell application
wkSheet.Range("B5").Select
SendKeys "^v" 'Paste
End Sub
回答by saurabh255
BELOW CODE WILL COPY DATA FROM PDF & will PASTE IT IN WORD THEN COPY DATA FROM WORD AND THEN PASTE IT TO THE EXCEL .
下面的代码将从 PDF 复制数据并将其粘贴到 WORD 中,然后从 WORD 中复制数据,然后将其粘贴到 Excel 中。
NOW Why I am copying data from pdf to word & then copying from word and pasting it to the excel because i want the data from the pdf in exact format to my excel sheet if i copy directly from pdf to excel it will paste the whole data from pdf into a single cell means even if i am having two columns or multiple rows it will paste all of my data into one column and that too in single cell but if i copy from word to excel it will retain its original format and two columns will get pasted as two columns only in excel.
现在为什么我将数据从 pdf 复制到 word,然后从 word 复制并将其粘贴到 excel 中,因为如果我直接从 pdf 复制到 excel,我希望将 pdf 中的数据以精确格式复制到 excel 它将粘贴整个数据从 pdf 到单个单元格意味着即使我有两列或多行,它也会将我的所有数据粘贴到一列中,也将粘贴到单个单元格中,但如果我从 word 复制到 excel,它将保留其原始格式和两列只会在 excel 中粘贴为两列。
Private Sub CommandButton3_Click() '(load pdf)
Dim o As Variant
Set appWord = CreateObject("Word.Application")
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf", vbNormalFocus) 'loading adobe reader & pdf file from their location
Application.Wait (Now + TimeSerial(0, 0, 2))
SendKeys ("^a")
SendKeys ("^c")
SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
Set appWord = CreateObject("Word.Application")
appWord.Visible = True
appWord.Documents.Add.Content.Paste
With appWord
.ActiveDocument.SaveAs Filename:=ThisWorkbook.Path & "\pdf" & ".docx", FileFormat:=wdocument 'saving word file in docx format
.ActiveWindow.Close
.Quit
End With
MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "
Set appWord = CreateObject("Word.Application")
appWord.Visible = True
appWord.Documents.Open "C:\Users\saurabh.ad.sharma\Desktop\pdf.docx" 'opening word document
appWord.Selection.WholeStory
appWord.Selection.Copy
Set wkSheet = ActiveSheet
wkSheet.Range("A1").Select
wkSheet.Paste 'pasting to the excel file
End Sub
回答by saurabh255
This is the more modified version of my above code it will not save any document it will save data in clipboard and will do the execution fast..
这是我上面代码的更多修改版本,它不会保存任何文档,它会将数据保存在剪贴板中,并且会快速执行..
Private Sub CommandButton3_Click() '(load pdf)
Dim o As Variant
Set appWord = CreateObject("Word.Application")
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf2", vbNormalFocus)
Application.Wait (Now + TimeSerial(0, 0, 2))
SendKeys ("^a")
SendKeys ("^c")
SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
Set appWord = CreateObject("Word.Application")
appWord.Visible = False
appWord.Documents.Add.Content.Paste
With appWord
.Selection.WholeStory
.Selection.Copy
.ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
.Quit
End With
MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "
Set wkSheet = ActiveSheet
wkSheet.Range("A1").Select
wkSheet.Paste
End Sub
回答by DMM Wallnut
I had similar problem. The best solution is, as it was mentioned before, to use Adobe API. In my case it was impossible because macro was intended for 100+ users without Adobe Pro on their PC.
我有类似的问题。正如前面提到的,最好的解决方案是使用 Adobe API。就我而言,这是不可能的,因为宏是为 100 多个用户设计的,他们的 PC 上没有 Adobe Pro。
Ultimate solution that I have developed recently was to build converted in C# (for free using Visual Studio and iText library), install it on end users computers and run whenever I need via VBA. Here are some links for more guidance:
我最近开发的终极解决方案是在 C# 中构建转换(免费使用 Visual Studio 和 iText 库),将其安装在最终用户计算机上,并在我需要时通过 VBA 运行。以下是更多指导的一些链接:
- How to develop pdf converter in C#: link
- How to create Excel Addin in C#: link
- How to run C# addin from VBA: link
Overall it's fairly complicated but once done works like a dream.
总的来说,它相当复杂,但一旦完成,就像做梦一样。
Another solution as mentioned before is to use sendkeys in VBA. My experience is that it requires some optimization to handle various opening and copying times (depending on file size). Below is code that worked for me, however it's not even near that fast and stable as C# converter.
前面提到的另一个解决方案是在 VBA 中使用 sendkeys。我的经验是它需要一些优化来处理各种打开和复制时间(取决于文件大小)。下面是对我有用的代码,但它甚至没有 C# 转换器那么快和稳定。
Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'Initialize sleep function via Windows API
Public Sub CopyToSheet(destinationSheet As Worksheet, pathToPdf as String)
'Copy data from PDF to worksheet
'Initialize timer
Dim StartTime As Double
StartTime = Timer
'Clear clipboard
Dim myData As DataObject
Set myData = New DataObject
myData.SetText text:=Empty
myData.PutInClipboard
Set myData = Nothing
'Build file paths
Dim pathToAdobe As String
pathToAdobe = """C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe"""
pathToPdf = """" & pathToPdf & """"
'Open PDF and wait untill it is open. If file is already opened it will be just activated
Dim pdfId As Long
pdfId = Shell(pathToAdobe & " " & pathToPdf, vbMaximizedFocus)
Do
Sleep (500)
If Round(Timer - StartTime, 2) > 9 Then MsgBox "Failed to open PDF": Exit Sub 'Safety check
Loop Until Me.IsPdfOpen(pathToPdf)
'Copy and wait until copying is completed
SendKeys "^a"
SendKeys "^c"
Do
Sleep (500)
If Round(Timer - StartTime, 2) > 18 Then MsgBox "Failed to copy data to clipboard": Exit Sub 'Safety check
Loop Until Me.GetClipboardStatus = "ClipboardHasData"
'Paste data into worksheet
destinationSheet.Activate
destinationSheet.Range("A1").Select
destinationSheet.Paste
'Close pdf
Call Shell("TaskKill /F /PID " & CStr(pdfId), vbHide)
'Clear clipboard
Set myData = New DataObject
myData.SetText text:=Empty
myData.PutInClipboard
Set myData = Nothing
End Sub
Function IsPdfOpen(pathToPdf) As Boolean
'Check if PDF is already opened
'Build window name (window name is name of the application on Windows task bar)
Dim windowName As String
windowName = pathToPdf
windowName = Mid(windowName, InStrRev(windowName, "\") + 1, Len(windowName) - InStrRev(windowName, "\") + 1)
windowName = windowName + " - Adobe Acrobat Reader DC"
'Try to activate application to check if is opened
On Error Resume Next
AppActivate windowName, True
Select Case Err.Number
Case 5: IsPdfOpen = False
Case 0: IsPdfOpen = True
Case Else: Debug.Assert False
End Select
On Error GoTo 0
End Function
Function GetClipboardStatus() As String
'Check if copying data to clipboard is completed
Dim tempString As String
Dim myData As DataObject
'Try to put data from clipboard to string to check if operations on clipboard are completed
On Error Resume Next
Set myData = New DataObject
myData.GetFromClipboard
tempString = myData.GetText(1)
If Err.Number = 0 Then
If tempString = "" Then
GetClipboardStatus = "ClipboardEmpty"
Else
GetClipboardStatus = "ClipboardHasData"
End If
Else
GetClipboardStatus = "ClipboardBusy"
End If
On Error GoTo 0
Set myData = Nothing
End Function

