Pandas 中的简单列拆分

Question

提问by Xodarap777

I have something like this in a CSV:

我在 CSV 中有这样的东西：

 phone                            name     area
 (444) 444-4444, (000) 000-0000   Foo      cityname, ST
 (555) 555-5555                   Bar      othercity, SN

How would I arrive at this most simply:

我将如何最简单地得出这个结论：

 phone            name     area       State
 (444) 444-4444   Foo      cityname   ST
 (555) 555-5555   Bar      othercity  SN

It's two basic splits - in the first, I want to get rid of everything past the first index from ['phone'], but in the second, I want to add everything after the comma from ['area'] into ['State'] - I figured it would be great to learn both methods.

这是两个基本的拆分 - 在第一个中，我想摆脱 ['phone'] 中第一个索引之后的所有内容，但在第二个中，我想将 ['area'] 中的逗号后的所有内容添加到 ['State '] - 我认为学习这两种方法会很棒。

In the actual file, the CSV is split with commas and fields use quotation marks: it's a standard csv. I used the whitespace table to show the problem.

在实际文件中，CSV 用逗号分隔，字段使用引号：它是标准的 csv。我使用空白表来显示问题。

Answer 1

采纳答案by cyborg

import pandas as pd
#df = pd.read_csv('file.csv', dtype={'area': str, 'phone': str})
df=pd.DataFrame(columns=['phone','name','area'],
                data=[['(444) 444-4444, (000) 000-0000', 'Foo', 'cityname, ST'],
                      ['(555) 555-5555',   'Bar', 'othercity, SN']])
print df
df['State'] = df.area.apply(lambda x: x.split(',')[1] if len(x.split(','))>1 else '')
df.area = df.area.apply(lambda x: x.split(',')[0])
df.phone = df.phone.apply(lambda x: x.split(',')[0])
print df

Out:

出去：

                            phone name           area
0  (444) 444-4444, (000) 000-0000  Foo   cityname, ST
1                  (555) 555-5555  Bar  othercity, SN
            phone name       area State
0  (444) 444-4444  Foo   cityname    ST
1  (555) 555-5555  Bar  othercity    SN

Pandas 中的简单列拆分

提问by Xodarap777

采纳答案by cyborg

相关推荐

最近更新

标签

Pandas 中的简单列拆分

提问by Xodarap777

采纳答案by cyborg

相关推荐

pandas 接收`KeyError: u'no item named XYZ'` 错误

Python Pandas 在函数中处理数据帧

Pandas 和 sum 和 cum sum 在同一数据框中

在将 Pandas 数据帧列传递给 scikit 学习回归器之前，是否应该以某种方式对其进行转换？

相关推荐

最近更新

标签