Python 将逗号分隔列表转换为 Pandas 数据框

Question

提问by user636322

I am struggling to convert a comma separated list into a multi column (7) data-frame.

我正在努力将逗号分隔列表转换为多列 (7) 数据框。

print (type(mylist))

<type 'list'>
Print(mylist)


['AN,2__AAS000,26,20150826113000,-283.000,20150826120000,-283.000',         'AN,2__AE000,26,20150826113000,0.000,20150826120000,0.000',.........

The following creates a frame of a single column:

下面创建了一个单列的框架：

df = pd.DataFrame(mylist)

I have reviewed the inbuilt csv functionality for Pandas, however my csv data is held in a list. How can I simply covert the list into a 7 column data-frame.

我已经查看了 Pandas 的内置 csv 功能，但是我的 csv 数据保存在一个列表中。我怎样才能简单地将列表转换为 7 列数据框。

Thanks in advance.

提前致谢。

Answer 1

采纳答案by Padraic Cunningham

You need to split each string in your list:

您需要拆分列表中的每个字符串：

import  pandas as pd

df = pd.DataFrame([sub.split(",") for sub in l])
print(df)

Output:

输出：

   0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata:

如果您知道要在 csv 中跳过多少行，则可以使用 read_csv 完成所有操作skiprows=lines_of_metadata：

import  pandas as pd

df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)

Or if each line of the metadata starts with a certain character you can use comment:

或者，如果元数据的每一行都以某个字符开头，您可以使用注释：

df = pd.read_csv("in.csv",header=None,comment="#")

If you need to specify more then one character you can combine itertools.takewhilewhich will drop lines starting with xxx:

如果您需要指定多于一个字符，您可以组合使用以下字符itertools.takewhile开头的行xxx：

import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
    f = dropwhile(lambda x: x.startswith("#!!"), f)
    r = csv.reader(f)
    df = pd.DataFrame().from_records(r)

Using your input data adding some lines starting with #!!:

使用您的输入数据添加一些以 #!! 开头的行：

#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000

Outputs:

输出：

    0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

Answer 2

回答by AFault

I encounter a similar problem. I solve it by this way.

我遇到了类似的问题。我是这样解决的。

def lrsplit(line):
    left, *_ , right = line.split('-')
    mid = '-'.join(_)
    return left, mid, right.strip()
example = pd.DataFrame(lrsplit(line) for line in open("example.csv"))
example.columns = ['location', 'position', 'company']

Result:

结果：

    location    position    company
0   india   manager intel
1   india   sales-manager   amazon
2   banglore    ccm- head - county  jp morgan

Answer 3

回答by Wanji

you can covert the list into a 7 column data-frame in the following way:

您可以通过以下方式将列表转换为 7 列数据框：

import pandas as pd

df = pd.read_csv(filename, sep=',')

Python 将逗号分隔列表转换为 Pandas 数据框

提问by user636322

采纳答案by Padraic Cunningham

回答by AFault

回答by Wanji

相关推荐

最近更新

标签

Python 将逗号分隔列表转换为 Pandas 数据框

提问by user636322

采纳答案by Padraic Cunningham

回答by AFault

回答by Wanji

相关推荐

Python sqlite3.DatabaseError：文件已加密或不是数据库

Python 如何检索 NumPy 随机数生成器的当前种子？

如何将运算符传递给 python 函数？

Python 如何在 Django 中捕获 MultipleObjectsReturned 错误

相关推荐

最近更新

标签