Pandas 中的简单列拆分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21081045/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Simple Column split in Pandas
提问by Xodarap777
I have something like this in a CSV:
我在 CSV 中有这样的东西:
phone name area
(444) 444-4444, (000) 000-0000 Foo cityname, ST
(555) 555-5555 Bar othercity, SN
How would I arrive at this most simply:
我将如何最简单地得出这个结论:
phone name area State
(444) 444-4444 Foo cityname ST
(555) 555-5555 Bar othercity SN
It's two basic splits - in the first, I want to get rid of everything past the first index from ['phone'], but in the second, I want to add everything after the comma from ['area'] into ['State'] - I figured it would be great to learn both methods.
这是两个基本的拆分 - 在第一个中,我想摆脱 ['phone'] 中第一个索引之后的所有内容,但在第二个中,我想将 ['area'] 中的逗号后的所有内容添加到 ['State '] - 我认为学习这两种方法会很棒。
In the actual file, the CSV is split with commas and fields use quotation marks: it's a standard csv. I used the whitespace table to show the problem.
在实际文件中,CSV 用逗号分隔,字段使用引号:它是标准的 csv。我使用空白表来显示问题。
采纳答案by cyborg
import pandas as pd
#df = pd.read_csv('file.csv', dtype={'area': str, 'phone': str})
df=pd.DataFrame(columns=['phone','name','area'],
data=[['(444) 444-4444, (000) 000-0000', 'Foo', 'cityname, ST'],
['(555) 555-5555', 'Bar', 'othercity, SN']])
print df
df['State'] = df.area.apply(lambda x: x.split(',')[1] if len(x.split(','))>1 else '')
df.area = df.area.apply(lambda x: x.split(',')[0])
df.phone = df.phone.apply(lambda x: x.split(',')[0])
print df
Out:
出去:
phone name area
0 (444) 444-4444, (000) 000-0000 Foo cityname, ST
1 (555) 555-5555 Bar othercity, SN
phone name area State
0 (444) 444-4444 Foo cityname ST
1 (555) 555-5555 Bar othercity SN

