Python pandas.read_csv 使用逗号将列拆分为多个新列以分隔
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31682798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas.read_csv split column into multiple new columns using comma to separate
提问by J.A.Cado
I've used pandas.read_csv to load in a file.
我使用 pandas.read_csv 加载文件。
I've stored the file into a variable. The first column is a series of numbers separated by a comma (,) I want to split these numbers, and put each number to a new column.
我已将文件存储到变量中。第一列是用逗号(,)分隔的一系列数字,我想把这些数字拆分,把每个数字放到一个新的列中。
I can't seem to find the write functionality for pandas.dataframe.
我似乎找不到 pandas.dataframe 的写入功能。
Side Note I would prefer a different library for loading in my file, but pandas provides some other different functionality which I need.
旁注我更喜欢在我的文件中加载不同的库,但 Pandas 提供了我需要的其他一些不同的功能。
My Code:
我的代码:
Data = pandas.read_csv(pathFile,header=None)
Data = pandas.read_csv(pathFile,header=None)
doing: print Datagives me:
做:print Data给我:
0 1 2 ...
0 [2014, 8, 26, 5, 30, 0.0] 0 0.25 ...
(as you can see its a date)
(你可以看到它的日期)
Question: How to split/separate each number and save it in a new array
问题:如何拆分/分离每个数字并将其保存在新数组中
p.s. I'm trying to achieve the same thing the matlab method datevec()does
ps 我正在尝试实现与 matlab 方法相同的 datevec()功能
回答by unutbu
If the CSV data looks like
如果 CSV 数据看起来像
"[2014, 8, 26, 5, 30, 0.0]",0,0.25
then
然后
import pandas as pd
import json
df = pd.read_csv('data', header=None)
dates, df = df[0], df.iloc[:, 1:]
df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1,
ignore_index=True)
print(df)
yields
产量
0 1 2 3 4 5 6 7
0 0 0.25 2014 8 26 5 30 0
with the values parsed as numeric values.
将值解析为数值。
How it works:
这个怎么运作:
dates, df = df[0], df.iloc[:, 1:]
peels off the first column, and reassigns dfto the rest of the DataFrame:
剥离第一列,并重新分配df给 DataFrame 的其余部分:
In [217]: dates
Out[217]:
0 [2014, 8, 26, 5, 30, 0.0]
Name: 0, dtype: object
datescontains strings:
dates包含字符串:
In [218]: dates.iloc[0]
Out[218]: '[2014, 8, 26, 5, 30, 0.0]'
We can convert these to a list using json.loads:
我们可以使用以下方法将它们转换为列表json.loads:
In [219]: import json
In [220]: json.loads(dates.iloc[0])
Out[220]: [2014, 8, 26, 5, 30, 0.0]
In [221]: type(json.loads(dates.iloc[0]))
Out[221]: list
We can do this for each row of datesby using apply:
我们可以使用以下方法为每一行执行此dates操作apply:
In [222]: dates.apply(lambda x: pd.Series(json.loads(x)))
Out[222]:
0 1 2 3 4 5
0 2014 8 26 5 30 0
By making lambda, above, return a Series, applywill return a DataFrame,
with the index of the Series becoming the column index of the DataFrame.
通过使lambda,上面返回一个系列,apply将返回一个DataFrame,Series 的索引成为DataFrame 的列索引。
Now we can use pd.concatto concatenate this DataFrame with df:
现在我们可以使用pd.concat连接这个 DataFrame df:
In [228]: df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1, ignore_index=True)
In [229]: df
Out[229]:
0 1 2 3 4 5 6 7
0 0 0.25 2014 8 26 5 30 0
In [230]: df.dtypes
Out[230]:
0 int64
1 float64
2 float64
3 float64
4 float64
5 float64
6 float64
7 float64
dtype: object
回答by dermen
How about
怎么样
df
# datestr
#0 2014, 8, 26, 5, 30, 0.0
#1 2014, 8, 26, 5, 30, 0.0
#2 2014, 8, 26, 5, 30, 0.0
#3 2014, 8, 26, 5, 30, 0.0
#4 2014, 8, 26, 5, 30, 0.0
# each entry is a string
df.datestr[0]
#'2014, 8, 26, 5, 30, 0.0'
Then
然后
date_order = ('year', 'month','day','hour','minute','sec') # order matters here, should match the datestr column
for i,col in enumerate( date_order):
df[col] = df.datestr.map( lambda x: x.split(',')[i].strip() )
#df
# datestr year month day hour minute sec
#0 2014, 8, 26, 5, 30, 0.0 2014 8 26 5 30 0.0
#1 2014, 8, 26, 5, 30, 0.0 2014 8 26 5 30 0.0
#2 2014, 8, 26, 5, 30, 0.0 2014 8 26 5 30 0.0
#3 2014, 8, 26, 5, 30, 0.0 2014 8 26 5 30 0.0
#4 2014, 8, 26, 5, 30, 0.0 2014 8 26 5 30 0.0

