windows 使用命令行开关将 PDF 保存为文本 - 可以吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1196148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Command Line Switches to Save a PDF as Text - Can it be done?
提问by
I need to use command line switches to execute the 'Save as Text' command. Ideally, I want to:
我需要使用命令行开关来执行“另存为文本”命令。理想情况下,我想:
- use a command line switch to open a PDF
- use a command line switch to convert the PDF to a text file by mimicking the 'Save as Text' command.
- use a command line to close the PDF.
- 使用命令行开关打开 PDF
- 通过模仿“另存为文本”命令,使用命令行开关将 PDF 转换为文本文件。
- 使用命令行关闭 PDF。
Is this possible? If so, then does anyone know how to do this?
这可能吗?如果是这样,那么有谁知道如何做到这一点?
回答by AutoDoc
Don't use CMD; use AutoIt. Very easy to do and takes a few lines
不要使用 CMD;使用 AutoIt。很容易做到,只需几行
Run("file.pdf")
winwait("Adobe")
send(?);; whatever commands necessary to save as text
send("{enter}")
send("!{F4}")
回答by Gareth Davidson
I don't understand why you'd not want to use free software (not freeware), pdftotextis the ideal solution. However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.
我不明白你为什么不想使用免费软件(不是免费软件), pdftotext是理想的解决方案。但是,如果您只想使用 Windows GUI 以自动方式实际打开和保存 PDF,则可以使用 vbscript 和 sendkeys 命令。
Just use pdftotext though, it would be much more reliable and won't cost you a whole box.
只需使用 pdftotext ,它会更可靠,并且不会花费您整个盒子。
回答by Acie
I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One majorbummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!
我认为下面的 VBscript 应该可以解决问题。它将获取给定文件夹位置中的所有 .pdf 文件并将它们保存为 .txt 文件。一个主要的问题是它仅在您的机器未锁定时才有效,因为它使用 SendKeys 命令。如果有人有在计算机锁定时有效的解决方案,请将其发送给我!
Set objFSO = CreateObject("Scripting.FileSystemObject")
objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE"
Set objFolder = objFSO.GetFolder(objStartFolder)
Set colFiles = objFolder.Files
For Each objFile In colFiles
extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4)
file = Mid(objFile.Name, 1, Len(objFile.Name) - 4)
fullname = objFSO.BuildPath(objStartFolder, objFile.Name)
fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt")
Set objFSO = CreateObject("Scripting.FileSystemObject")
If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then
WScript.Echo fullname
Set WshShell = WScript.CreateObject("WScript.Shell")
WshShell.Run """" + fullname + """"
WScript.Sleep 1000
WshShell.SendKeys "%"
WScript.Sleep 100
WshShell.SendKeys "f"
WScript.Sleep 100
WshShell.SendKeys "h"
WScript.Sleep 100
WshShell.SendKeys "x"
WScript.Sleep 300
WshShell.SendKeys "{ENTER}"
count = 0
'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete
Do While i = 0 And count < 100
On Error Resume Next
Set fso = CreateObject("Scripting.FileSystemObject")
Set MyFile = fso.OpenTextFile(fullname_txt, 8)
If Err.Number = 0 Then
i = 1
End If
count = count + 1
WScript.Sleep 20000
Loop
End If
Next
回答by luochen1990
Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser
也许你可以试试这个:https: //github.com/luochen1990/nodejs-easy-pdf-parser
It is a npm package and you need to install nodejs (and npm) to use it.
它是一个 npm 包,您需要安装 nodejs(和 npm)才能使用它。
It can be used as a command line tool:
它可以用作命令行工具:
npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt
And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.
此工具将按 y 坐标对文本行进行排序,因此在大多数情况下效果很好。它也适用于 unicode 和跨平台。