vba 将多个文本文件导入并解析为单个 XLS?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16925563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Importing and parsing multiple text files into single XLS?
提问by Ikarian
I have been googling at this all day, but as I have zero VBA skills, I can't find anything that works properly and that I can adapt to my needs.
我整天都在谷歌上搜索这个,但是由于我的 VBA 技能为零,我找不到任何可以正常工作并且可以适应我的需求的东西。
I have a folder with about 4500 text files with the outputs of a hardware test run on computers we process. The text file has a lot of info I don't need. What I do need is to import all these files into a single spreadsheet and then parse them down to 3 desired fields each - and in such a way that is easily readable.
我有一个包含大约 4500 个文本文件的文件夹,其中包含在我们处理的计算机上运行的硬件测试的输出。文本文件有很多我不需要的信息。我需要做的是将所有这些文件导入到一个电子表格中,然后将它们分别解析为 3 个所需的字段 - 并且以易于阅读的方式进行。
I have had a hard time even finding a VBA macro that import the text files without an error I can't get past. I did find the following one, however it is set up to look for a second column in the file, which in my case screws up the inputted data and throws around values all over the place (think minor formatting issue times 4500 records).
我什至很难找到一个 VBA 宏来导入文本文件而没有我无法通过的错误。我确实找到了以下一个,但是它被设置为在文件中查找第二列,在我的情况下,它会弄乱输入的数据并在所有地方抛出值(想想次要格式问题乘以 4500 条记录)。
Here is the macro:
这是宏:
Sub test()
Dim myDir As String, fn As String, ff As Integer, txt As String
Dim delim As String, n As Long, b(), flg As Boolean, x
myDir = "c:\test" '<- change to actual folder path
delim = vbTab '<- delimiter (assuming Tab delimited)
Redim b(1 To Rows.Count, 1 To 2)
fn = Dir(myDir & "\*.txt")
Do While fn <> ""
ff = FreeFile
Open myDir & "\" & fn For Input As #ff
Do While Not EOF(ff)
Line Input #ff, txt
x = Split(txt, delim)
n = n + 1
If Not flg Then b(n,2) = fn
If UBound(x) > 0 Then
b(n,1) = x(1)
End If
flg = True
Loop
Close #ff
flg = False
fn = Dir()
Loop
ThisWorkbooks.Sheets(1).Range("a1").Resize(n,2).Value = b
End Sub
And here is a sample of one of the text files:
这是其中一个文本文件的示例:
ILPN Number: I01128204
MAC Address: E0DB55820F85
Hardware:
CPU:Intel(R) Core(TM) i3-2370M CPU @ 2.40GHz
MOTHERBOARD:Dell Inc. - 0G8TPV - A02
BIOS VERSION/DATE:A02 - 08/15/2012
RAM DETECTED:4096 MB (BANK 0: 0/DDR3/1333 - BANK 2: 0/Unknown/0)
MANUFACTURER:Dell Inc.
PRODUCT:Inspiron 3520
SERIAL:B1JW9V1
NIC SPEED/NAME:100 Mbps - Realtek PCIe FE Family Controller, V:8.1
GRAPHIC RES:1366 x 768 (32 bits)
OPTIC DRIVE:HL-DT-ST, DVD+-RW GT80N , A103 (E:)
FIXED DISK:ST500LM012 HN-M500MBB - 465.76 GB (C: - GPT)
- 0.00 GB (D: - MBR)
WINPE:Microsoft Windows 8 Ultimate Edition, 32-bit (build 9200)
Module 126: Result: Ok
Initializing module version='1.0.0.17' with ''
Module Initialization done.
Starting module with: Param1=0x00010010 - Param2=0xFFFF0000
Module start: 14/05/2013 10:38:40
Set language module to: en-US
Found disk: 0 - 'ST500LM012 HN-M500MBB'
All PHYSICALDRIVE will be used as valid target...
Module started properly.
Cleaning element 'Disk ID: 0 - Model: ST500LM012 HN-M500MBB - Size: 465.762 GB' with algorithm 'Basic (random)'...
Clean process Successful
Process took 6099 sec to clean 476940.02MB ~ 78.20 MB/Sec
Closing module with code '1'...
Closing module done with result '0'
Releasing module...
Releasing module done at: 14/05/2013 12:20:19
So, from all that, I need to pull out the "ILPN number", The MAC address, and the line that says "Module 126: Result OK". Everything else can go away. It would be nice if I could get it sorted with columns for the three values, and then each record in its own row.
因此,从所有这些中,我需要提取“ILPN 号码”、MAC 地址和显示“模块 126:结果正常”的行。其他一切都可以消失。如果我能用三个值的列对它进行排序,然后每个记录在它自己的行中,那就太好了。
This is probably a two step process. Either or both solutions would be tremendously helpful. Thank you!
这大概是一个两步过程。任何一个或两个解决方案都会非常有帮助。谢谢!
回答by Ansgar Wiechers
A VBScript for reading the three lines from multiple input files into an Excel sheet could look like this:
用于将多个输入文件中的三行读取到 Excel 工作表中的 VBScript 可能如下所示:
Set xl = CreateObject("Excel.Application")
xl.Visible = True
Set wb = xl.Workbooks.Add
Set ws = wb.Sheets(1)
row = 1
ws.Cells(row, 1).Value = "ILPN"
ws.Cells(row, 2).Value = "MAC Address"
ws.Cells(row, 3).Value = "Module 126"
Set fso = CreateObject("Scripting.FileSystemObject")
For Each f In fso.GetFolder("C:\your\folder").Files
If LCase(fso.GetExtensionName(f.Name)) = "txt" Then
row = row + 1
Set stream = f.OpenAsTextStream
ws.Cells(row, 1).Value = Trim(Split(stream.ReadLine, ":")(1))
ws.Cells(row, 2).Value = Trim(Split(stream.ReadLine, ":")(1))
Do Until stream.AtEndOfStream
line = stream.ReadLine
If Left(line, 10) = "Module 126" Then
ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))
Exit Do
End If
Loop
stream.Close
End If
Next
wb.SaveAs "C:\some\folder\output.xls", -4143, , , , False
wb.Close
xl.Quit
As a VBA macro the following should work:
作为 VBA 宏,以下应该起作用:
Sub LoadDataFromFiles
row = 1
ActiveSheet.Cells(row, 1).Value = "ILPN"
ActiveSheet.Cells(row, 2).Value = "MAC Address"
ActiveSheet.Cells(row, 3).Value = "Module 126"
Set fso = CreateObject("Scripting.FileSystemObject")
For Each f In fso.GetFolder("C:\your\folder").Files
If LCase(fso.GetExtensionName(f.Name)) = "txt" Then
row = row + 1
Set stream = f.OpenAsTextStream
ActiveSheet.Cells(row, 1).Value = Trim(Split(stream.ReadLine, ":")(1))
ActiveSheet.Cells(row, 2).Value = Trim(Split(stream.ReadLine, ":")(1))
Do Until stream.AtEndOfStream
line = stream.ReadLine
If Left(line, 10) = "Module 126" Then
ActiveSheet.Cells(row, 3).Value = Trim(Split(line, ":")(2))
Exit Do
End If
Loop
stream.Close
End If
Next
ActiveWorkbook.Save
End Sub
回答by Ekkehard.Horner
Restating the task/problem:
重述任务/问题:
Given a lot of structured text files in a folder, extract from each file three (or a multiple of three?) data items (ILPN Number, MAC Address, Test Result) and put them as rows in a .txt/.csv file that can be red by/imported into Excel.
给定文件夹中的大量结构化文本文件,从每个文件中提取三个(或三个的倍数?)数据项(ILPN 编号、MAC 地址、测试结果)并将它们作为行放在 .txt/.csv 文件中可以被红色/导入到 Excel 中。
Idea:
主意:
Loop over the files, read each into memory, parse/extract the data triple(s) using a RegExp, write them into the output file; use Excel to open the output file, do further manipulation manually.
循环文件,将每个文件读入内存,使用 RegExp 解析/提取数据三元组,将它们写入输出文件;使用 Excel 打开输出文件,手动进行进一步操作。
Plan:
计划:
Use a VBScript from the command line to keep things simple.
从命令行使用 VBScript 使事情变得简单。
[If that look good to you and you can answer the question "Does each input file contain one or more info triple(s)?", I'm willing to add some proof of concept code to this concept.]
[如果这对您来说看起来不错,并且您可以回答“每个输入文件是否包含一个或多个信息三元组?”的问题,我愿意为此概念添加一些概念验证代码。]
While you wait:
在您等待时:
Given your additional info, I believe that @Ansgar's assumptions about your input data are mostly correct, and I fixed two typos in the code. So give his script (+1) a try by starting a "Command Prompt", creating/changing to a suitable directory, copy the code into a file - say ansgar.vbs
-, adapt the folder/output file specs to your need, and run it via cscript ansgar.vbs
.
鉴于您的其他信息,我相信 @Ansgar 对您输入数据的假设大多是正确的,并且我修复了代码中的两个错别字。因此,通过启动“命令提示符”,创建/更改到合适的目录,将代码复制到文件中来尝试他的脚本(+1) - 比如说ansgar.vbs
-,根据您的需要调整文件夹/输出文件规范,然后运行它通过cscript ansgar.vbs
.
Small changes - like putting the whole "Module" line into the 3rd item
小的变化——比如把整个“模块”行放到第三个项目中
If Left(line, 10) = "Module 126" Then
ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))
==>
If Left(line, 7) = "Module " Then
ws.Cells(row, 3).Value = line
are not to hard. If you are lucky, your problem is solved.
不难。如果你幸运的话,你的问题就解决了。
Update (wrt comment/subscript):
更新(写评论/下标):
I thinkthe
我想了
ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))
line is to blame. The Split() on ":" should split input lines like
线是罪魁祸首。":" 上的 Split() 应该拆分输入行,如
Module 126: Result: Ok
into an array of three elements "Module 126", " Result", and " Ok" numbered/indexed/subscripted from 0 to 2. If Split() does not get two : separators in the line, the resulting array is smaller and the access to element #2 will fail.
成一个由三个元素“模块 126”、“结果”和“确定”组成的数组,从 0 到 2 编号/索引/下标。如果 Split() 没有得到两个:行中的分隔符,则结果数组更小,访问元素#2 将失败。
You'll have to assign the return value from Split() to a variable, check the size of the array (UBound), look at the faulty lines, and decide whether to ignore them or to change the guarding If condition.
您必须将 Split() 的返回值分配给一个变量,检查数组的大小 (UBound),查看错误行,并决定是忽略它们还是更改保护 If 条件。