Python pandas.read_csv 使用逗号将列拆分为多个新列以分隔

Question

提问by J.A.Cado

I've used pandas.read_csv to load in a file.

我使用 pandas.read_csv 加载文件。

I've stored the file into a variable. The first column is a series of numbers separated by a comma (,) I want to split these numbers, and put each number to a new column.

我已将文件存储到变量中。第一列是用逗号(,)分隔的一系列数字，我想把这些数字拆分，把每个数字放到一个新的列中。

I can't seem to find the write functionality for pandas.dataframe.

我似乎找不到 pandas.dataframe 的写入功能。

Side Note I would prefer a different library for loading in my file, but pandas provides some other different functionality which I need.

旁注我更喜欢在我的文件中加载不同的库，但 Pandas 提供了我需要的其他一些不同的功能。

My Code:

我的代码：

Data = pandas.read_csv(pathFile,header=None)

doing: print Datagives me:

做：print Data给我：

   0                          1         2          ...
0 [2014, 8, 26, 5, 30, 0.0]   0         0.25       ...

(as you can see its a date)

（你可以看到它的日期）

Question: How to split/separate each number and save it in a new array

问题：如何拆分/分离每个数字并将其保存在新数组中

p.s. I'm trying to achieve the same thing the matlab method datevec()does

ps 我正在尝试实现与 matlab 方法相同的 datevec()功能

Answer 1

回答by unutbu

If the CSV data looks like

如果 CSV 数据看起来像

"[2014, 8, 26, 5, 30, 0.0]",0,0.25

then

然后

import pandas as pd
import json

df = pd.read_csv('data', header=None)
dates, df = df[0], df.iloc[:, 1:]
df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1,
               ignore_index=True)
print(df)

yields

产量

   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

with the values parsed as numeric values.

将值解析为数值。

How it works:

这个怎么运作：

dates, df = df[0], df.iloc[:, 1:]

peels off the first column, and reassigns dfto the rest of the DataFrame:

剥离第一列，并重新分配df给 DataFrame 的其余部分：

In [217]: dates
Out[217]: 
0    [2014, 8, 26, 5, 30, 0.0]
Name: 0, dtype: object

datescontains strings:

dates包含字符串：

In [218]: dates.iloc[0]
Out[218]: '[2014, 8, 26, 5, 30, 0.0]'

We can convert these to a list using json.loads:

我们可以使用以下方法将它们转换为列表json.loads：

In [219]: import json

In [220]: json.loads(dates.iloc[0])
Out[220]: [2014, 8, 26, 5, 30, 0.0]

In [221]: type(json.loads(dates.iloc[0]))
Out[221]: list

We can do this for each row of datesby using apply:

我们可以使用以下方法为每一行执行此dates操作apply：

In [222]: dates.apply(lambda x: pd.Series(json.loads(x)))
Out[222]: 
      0  1   2  3   4  5
0  2014  8  26  5  30  0

By making lambda, above, return a Series, applywill return a DataFrame, with the index of the Series becoming the column index of the DataFrame.

通过使lambda，上面返回一个系列，apply将返回一个DataFrame，Series 的索引成为DataFrame 的列索引。

Now we can use pd.concatto concatenate this DataFrame with df:

现在我们可以使用pd.concat连接这个 DataFrame df：

In [228]: df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1, ignore_index=True)

In [229]: df
Out[229]: 
   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

In [230]: df.dtypes
Out[230]: 
0      int64
1    float64
2    float64
3    float64
4    float64
5    float64
6    float64
7    float64
dtype: object

Answer 2

回答by dermen

How about

怎么样

df
#                   datestr
#0  2014, 8, 26, 5, 30, 0.0
#1  2014, 8, 26, 5, 30, 0.0
#2  2014, 8, 26, 5, 30, 0.0
#3  2014, 8, 26, 5, 30, 0.0
#4  2014, 8, 26, 5, 30, 0.0

# each entry is a string
df.datestr[0]
#'2014, 8, 26, 5, 30, 0.0'

Then

然后

date_order = ('year', 'month','day','hour','minute','sec') # order matters here, should match the datestr column 

for i,col in enumerate( date_order):
    df[col] = df.datestr.map( lambda x: x.split(',')[i].strip() )

#df
#                   datestr  year month day hour minute  sec
#0  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#1  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#2  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#3  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#4  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0

Python pandas.read_csv 使用逗号将列拆分为多个新列以分隔

提问by J.A.Cado

回答by unutbu

回答by dermen

相关推荐

最近更新

标签

Python pandas.read_csv 使用逗号将列拆分为多个新列以分隔

提问by J.A.Cado

回答by unutbu

回答by dermen

相关推荐

pandas 在python中将查询结果转换为DataFrame

pandas 如何量化熊猫中的数据？

pandas 熊猫的开发构建给出了导入错误：C 扩展名：'hashtable' 不是基于 python 3.4 (anaconda) 构建的

pandas 如何让熊猫 get_dummies 发出 N-1 个变量以避免共线性？

相关推荐

最近更新

标签