windows 使用命令行开关将 PDF 保存为文本 - 可以吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1196148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 06:37:06  来源:igfitidea点击:

Using Command Line Switches to Save a PDF as Text - Can it be done?

windowspdfautomation

提问by

I need to use command line switches to execute the 'Save as Text' command. Ideally, I want to:

我需要使用命令行开关来执行“另存为文本”命令。理想情况下,我想:

  1. use a command line switch to open a PDF
  2. use a command line switch to convert the PDF to a text file by mimicking the 'Save as Text' command.
  3. use a command line to close the PDF.
  1. 使用命令行开关打开 PDF
  2. 通过模仿“另存为文本”命令,使用命令行开关将 PDF 转换为文本文件。
  3. 使用命令行关闭 PDF。

Is this possible? If so, then does anyone know how to do this?

这可能吗?如果是这样,那么有谁知道如何做到这一点?

回答by AutoDoc

Don't use CMD; use AutoIt. Very easy to do and takes a few lines

不要使用 CMD;使用 AutoIt。很容易做到,只需几行

Run("file.pdf")
winwait("Adobe")
send(?);; whatever commands necessary to save as text
send("{enter}")
send("!{F4}")

回答by Gareth Davidson

I don't understand why you'd not want to use free software (not freeware), pdftotextis the ideal solution. However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.

我不明白你为什么不想使用免费软件(不是免费软件), pdftotext是理想的解决方案。但是,如果您只想使用 Windows GUI 以自动方式实际打开和保存 PDF,则可以使用 vbscript 和 sendkeys 命令。

Just use pdftotext though, it would be much more reliable and won't cost you a whole box.

只需使用 pdftotext ,它会更可靠,并且不会花费您整个盒子。

回答by Acie

I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One majorbummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!

我认为下面的 VBscript 应该可以解决问题。它将获取给定文件夹位置中的所有 .pdf 文件并将它们保存为 .txt 文件。一个主要的问题是它仅在您的机器未锁定时才有效,因为它使用 SendKeys 命令。如果有人有在计算机锁定时有效的解决方案,请将其发送给我!

Set objFSO = CreateObject("Scripting.FileSystemObject")
objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE"
Set objFolder = objFSO.GetFolder(objStartFolder)

Set colFiles = objFolder.Files
For Each objFile In colFiles
  extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4)
  file = Mid(objFile.Name, 1, Len(objFile.Name) - 4)
  fullname = objFSO.BuildPath(objStartFolder, objFile.Name)
  fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt")

  Set objFSO = CreateObject("Scripting.FileSystemObject")

  If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then
      WScript.Echo fullname
    Set WshShell = WScript.CreateObject("WScript.Shell")
    WshShell.Run """" + fullname + """"
    WScript.Sleep 1000
    WshShell.SendKeys "%"
    WScript.Sleep 100
    WshShell.SendKeys "f"
    WScript.Sleep 100
    WshShell.SendKeys "h"
    WScript.Sleep 100
    WshShell.SendKeys "x"
    WScript.Sleep 300
    WshShell.SendKeys "{ENTER}"

    count = 0
    'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete
    Do While i = 0 And count < 100
      On Error Resume Next
      Set fso = CreateObject("Scripting.FileSystemObject")
      Set MyFile = fso.OpenTextFile(fullname_txt, 8)
      If Err.Number = 0 Then
        i = 1
      End If
      count = count + 1
      WScript.Sleep 20000
    Loop
  End If
Next

回答by luochen1990

Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser

也许你可以试试这个:https: //github.com/luochen1990/nodejs-easy-pdf-parser

It is a npm package and you need to install nodejs (and npm) to use it.

它是一个 npm 包,您需要安装 nodejs(和 npm)才能使用它。

It can be used as a command line tool:

它可以用作命令行工具:

npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt

And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.

此工具将按 y 坐标对文本行进行排序,因此在大多数情况下效果很好。它也适用于 unicode 和跨平台。