Python 将逗号分隔列表转换为 Pandas 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32224363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python convert comma separated list to pandas dataframe
提问by user636322
I am struggling to convert a comma separated list into a multi column (7) data-frame.
我正在努力将逗号分隔列表转换为多列 (7) 数据框。
print (type(mylist))
<type 'list'>
Print(mylist)
['AN,2__AAS000,26,20150826113000,-283.000,20150826120000,-283.000', 'AN,2__AE000,26,20150826113000,0.000,20150826120000,0.000',.........
The following creates a frame of a single column:
下面创建了一个单列的框架:
df = pd.DataFrame(mylist)
I have reviewed the inbuilt csv functionality for Pandas, however my csv data is held in a list. How can I simply covert the list into a 7 column data-frame.
我已经查看了 Pandas 的内置 csv 功能,但是我的 csv 数据保存在一个列表中。我怎样才能简单地将列表转换为 7 列数据框。
Thanks in advance.
提前致谢。
采纳答案by Padraic Cunningham
You need to split each string in your list:
您需要拆分列表中的每个字符串:
import pandas as pd
df = pd.DataFrame([sub.split(",") for sub in l])
print(df)
Output:
输出:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata
:
如果您知道要在 csv 中跳过多少行,则可以使用 read_csv 完成所有操作skiprows=lines_of_metadata
:
import pandas as pd
df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)
Or if each line of the metadata starts with a certain character you can use comment:
或者,如果元数据的每一行都以某个字符开头,您可以使用注释:
df = pd.read_csv("in.csv",header=None,comment="#")
If you need to specify more then one character you can combine itertools.takewhile
which will drop lines starting with xxx
:
如果您需要指定多于一个字符,您可以组合使用以下字符itertools.takewhile
开头的行xxx
:
import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
f = dropwhile(lambda x: x.startswith("#!!"), f)
r = csv.reader(f)
df = pd.DataFrame().from_records(r)
Using your input data adding some lines starting with #!!:
使用您的输入数据添加一些以 #!! 开头的行:
#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000
Outputs:
输出:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
回答by AFault
I encounter a similar problem. I solve it by this way.
我遇到了类似的问题。我是这样解决的。
def lrsplit(line):
left, *_ , right = line.split('-')
mid = '-'.join(_)
return left, mid, right.strip()
example = pd.DataFrame(lrsplit(line) for line in open("example.csv"))
example.columns = ['location', 'position', 'company']
Result:
结果:
location position company
0 india manager intel
1 india sales-manager amazon
2 banglore ccm- head - county jp morgan
回答by Wanji
you can covert the list into a 7 column data-frame in the following way:
您可以通过以下方式将列表转换为 7 列数据框:
import pandas as pd
df = pd.read_csv(filename, sep=',')