如何使用 Python 在 csv 文件中搜索“文本”或“数字”,如果存在,则仅将第一列和第二列值打印到新的 csv 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43986126/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to search for a 'text' or 'number' in a csv file with Python AND if exists print only first and second column values to a new csv file
提问by rcubefather
I want to do the following using Python.
我想使用 Python 执行以下操作。
Step-1: Read a specific third column on a csv file using Python.
Step-2: Create a list with values got from step-1
Step-3: Take the value of index[0], search in csv file, if present print the values of column 1 and 2 only to a new csv file(There are 6 columns). If Not presents just ignore and goto next search.
file1.csv:
文件1.csv:
Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,
Python script written for this:
为此编写的 Python 脚本:
import csv
import time
from collections import defaultdict
columns = defaultdict(list)
with open('file1.csv') as f:
reader = csv.reader(f)
reader.next()
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
#print(columns[2])
b=(columns[2])
for x in b[:]:
time.sleep(1)
print x
Output of above script:
上面脚本的输出:
MacBook-Pro:test_usr$ python csv_file.py
1
1
2
3
5
6
7
8
8
MacBook-Pro:test_usr$
I am able to do the steps 1 and 2.
我能够完成第 1 步和第 2 步。
Please guide me on doing Step-3. That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?
请指导我做第 3 步。那就是如何在 csv 文件中搜索文本/字符串,如果存在如何仅将特定列值提取到新的 csv 文件中?
Output file should look like:
输出文件应如下所示:
a,ab
b,cd
c,ef
d,gh
e,ij
f,kl
g,mn
h,op
i,qr
Note : Search string will be from another csv file. Please don't suggest the direct answer for printing values of column 1 and 2 directly.
注意:搜索字符串将来自另一个 csv 文件。请不要直接建议直接打印第 1 列和第 2 列的值。
FINAL CODE is looks this:
最终代码看起来是这样的:
import csv
import time
from collections import defaultdict
columns = defaultdict(list)
with open('file1.csv') as f:
reader = csv.reader(f)
reader.next()
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
b=(columns[2])
for x in b[:]:
with open('file2.csv') as f, open('file3.csv', 'a') as g:
reader = csv.reader(f)
#next(reader, None) # discard the header
writer = csv.writer(g)
for row in reader:
if row[2] == x:
writer.writerow(row[:2])
file1.csv:
文件1.csv:
Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,
file2.csv:
file2.csv:
count,name,number,Type,status,Config Version,,IP1,port
1,bob,1,TRAFFIC,end,1.2,,1.1.1.1,1
2,john,1,TRAFFIC,end,2.1,,1.1.1.2,2
4,foo,2,TRAFFIC,end,1.1,,1.1.1.3,3
5.333333333,test,3,TRAFFIC,end,3.1,,1.1.1.4,4
6.833333333,raa,5,TRAFFIC,end,5.1,,1.1.1.5,5
8.333333333,kaa,6,TRAFFIC,end,7.1,,1.1.1.6,6
9.833333333,thaa,7,TRAFFIC,end,9.1,,1.1.1.7,7
11.33333333,paa,8,TRAFFIC,end,11.1,,1.1.1.8,8
12.83333333,maa,8,TRAFFIC,end,13.1,,1.1.1.9,9
If I run the above script, output of file3.csv:
如果我运行上面的脚本,file3.csv 的输出:
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
.
.
.
Its goes like this in loop
But output should be like this:
但是输出应该是这样的:
count,name
1,bob,
2,john,
4,foo,
5.333333333,test,
6.833333333,raa,
8.333333333,kaa,
9.833333333,thaa,
11.33333333,paa,
12.83333333,maa,
采纳答案by juanpa.arrivillaga
I think you should reconsider your approach. You can achieve your goal simply by iterating over the CSV file, without creating intermediate dict
s and list
s..., and since you want to work with specific columns, you'll make your life easier and your code more readable by using DictReader
and DictWriter
我认为你应该重新考虑你的方法。您只需遍历 CSV 文件即可实现您的目标,而无需创建中间dict
s 和list
s...,并且由于您希望使用特定列,因此通过使用DictReader
和DictWriter
import csv
import time
search_string = "whatever"
with open('file1.csv', 'rb') as f, open('file2.csv', 'wb') as g:
reader = csv.DictReader(f)
c1, c2, c3, *_ = reader.fieldnames
writer = csv.DictWriter(g, fieldnames=(c1, c2))
for row in reader:
if row[c3] == search_string:
writer.writerow({c1:row[c1], c2:row[c2]})
Keep in mind that csv
module will always return strings. You have to handle data-type conversions yourself, if you need them (I've left that out form above).
请记住,csv
模块将始终返回strings。如果需要,您必须自己处理数据类型转换(我在上面省略了该表格)。
If you don't want to use DictReader
/DictWriter
, I suppose it is a little more verbose, and don't want a header in your output file:
如果您不想使用DictReader
/ DictWriter
,我想它有点冗长,并且不希望输出文件中有标题:
with open('file1.csv') as f, open('file2.csv', 'w') as g:
reader = csv.reader(f)
next(reader, None) # discard the header
writer = csv.writer(g)
for row in reader:
if row[2] == search_string:
writer.writerow(row[:2])
回答by Jon Kiparsky
That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?
那就是如何在 csv 文件中搜索文本/字符串,如果存在如何仅将特定列值提取到新的 csv 文件中?
This is two questions.
这是两个问题。
First question: to search for text in a file, the simplest answer would be to read the file text into memory and look for the text. If you want to look for the text in a specific column of the csv you're reading in, you can use a DictReader to make life easy:
第一个问题:要在文件中搜索文本,最简单的答案是将文件文本读入内存并查找文本。如果您想在您正在阅读的 csv 的特定列中查找文本,您可以使用 DictReader 来简化工作:
for row in reader:
if search_target in row[header]:
# found it!
Second question: One way to write specific columns to a new csv would be as follows:
第二个问题:将特定列写入新 csv 的一种方法如下:
keys = ["Country", "Location"]
new_rows = [{key: row[key] for key in keys} for row in reader]
writer = csv.DictWriter(somefile, keys)
writer.writerows(new_rows)