使用python在文本文件中的两个字符串之间提取值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18865058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract Values between two strings in a text file using python
提问by user2790219
Lets say I have a Text file with the below content
假设我有一个包含以下内容的文本文件
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.
现在我需要编写一个 Python 代码,它将读取文本文件并将开始和结束之间的内容复制到另一个文件中。
I wrote the following code.
我写了以下代码。
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("Start"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("End"):
keepCurrentSet = True
inFile.close()
outFile.close()
I'm not getting the desired output as expected I'm just getting Start What I want to get is all the lines between Start and End. Excluding Start & End.
我没有按预期获得所需的输出我只是开始我想要得到的是开始和结束之间的所有线。不包括开始和结束。
采纳答案by inspectorG4dget
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
以防万一您的文本文件中有多个“开始”和“结束”,这会将所有数据一起导入,不包括所有“开始”和“结束”。
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
回答by Rafi Kamal
I'm not a Python expert, but this code should do the job.
我不是 Python 专家,但这段代码应该可以完成这项工作。
inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("End"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("Start"):
keepCurrentSet = True
inFile.close()
outFile.close()
回答by TerryA
If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:
如果文本文件不一定很大,您可以获取文件的全部内容,然后使用正则表达式:
import re
with open('data.txt') as myfile:
content = myfile.read()
text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
回答by pts
Move the outFile.write
call into the 2nd if
:
将outFile.write
呼叫移至第二个if
:
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
if line.startswith("Start"):
buffer = ['']
elif line.startswith("End"):
outFile.write("".join(buffer))
buffer = []
elif buffer:
buffer.append(line)
inFile.close()
outFile.close()
回答by falsetru
Using itertools.dropwhile
, itertools.takewhile
, itertools.islice
:
使用itertools.dropwhile
, itertools.takewhile
, itertools.islice
:
import itertools
with open('data.txt') as f, open('result.txt', 'w') as fout:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
it = itertools.islice(it, 1, None)
it = itertools.takewhile(lambda line: line.strip() != 'End', it)
fout.writelines(it)
UPDATE: As inspectorG4dget commented, above code copies over the first block. To copy multiple blocks, use following:
更新:正如inspectorG4dget 所评论的,上面的代码复制了第一个块。要复制多个块,请使用以下命令:
import itertools
with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
while True:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
if next(it, None) is None: break
fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
回答by Gaurav
import re
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
buffer1=buffer1+(line)
buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)
outFile.write("".join(buffer1))
inFile.close()
outFile.close()
回答by user2787688
I would handle it like this :
我会这样处理:
inFile = open("data.txt")
outFile = open("result.txt", "w")
data = inFile.readlines()
outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()