使用 Pandas,我如何根据第一个空间进行拆分。
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51290134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Pandas, how do I split based on the first space.
提问by Jessica Warren
So i have a column of codes: "dataset.csv"
所以我有一列代码:“dataset.csv”
0020-004241 purple
00532 - Blue
00121 - Yellow
055 - Greem
0025-097 - Orange
Desired Output:
期望输出:
code name_of_code
0020-004241 purple
00532 blue
I want the codes and the words for the codes to be split into two different columns.
我希望将代码和代码的单词分成两个不同的列。
I tried:
我试过:
df =pandas.read_csv(dataset.txt)
df =pandas.read_csv(dataset.txt)
df = pandas.concat([df, df.columnname.str.split('/s', expand=True)], 1)
df = pandas.concat([df, df.columnname.str.split('-', expand=True)], 1)
` It gave the unexpected output of: purple none blue none yellow none green none orange none
` 它给出了意外的输出:紫色无蓝色无黄色无绿色无橙色无
How should I split this data correctly?
我应该如何正确分割这些数据?
回答by Rakesh
Using str.split(" ", 1)
使用 str.split(" ", 1)
Ex:
前任:
import pandas as pd
df = pd.read_csv(filename,names=['code'])
df[['code','name_of_code']] = df["code"].str.split(" ", 1, expand=True)
df["name_of_code"] = df["name_of_code"].str.strip("-")
print(df)
Output:
输出:
code name_of_code
0 0020-004241 purple
1 00532 Blue
2 00121 Yellow
3 055 Greem
4 0025-097 Orange
回答by jpp
You can process this via a couple of split calls:
您可以通过几个拆分调用来处理此问题:
df = pd.DataFrame({'col': ['0020-004241 purple', '00532 - Blue',
'00121 - Yellow', '055 - Greem',
'0025-097 - Orange']})
df[['col1', 'col2']] = df['col'].str.split(n=1, expand=True)
df['col2'] = df['col2'].str.split().str[-1]
print(df)
col col1 col2
0 0020-004241 purple 0020-004241 purple
1 00532 - Blue 00532 Blue
2 00121 - Yellow 00121 Yellow
3 055 - Greem 055 Greem
4 0025-097 - Orange 0025-097 Orange
回答by Oleh Rybalchenko
You can use a regex as a separator when loading CSV to avoid further splittings.
您可以在加载 CSV 时使用正则表达式作为分隔符以避免进一步拆分。
from io import StringIO
import pandas as pd
file = StringIO(
"""0020-004241 purple
00532 - Blue
00121 - Yellow
055 - Greem
0025-097 - Orange"""
)
df = pd.read_csv(file, sep='\s+\-*\s*', header=None)
Of course, you may add the headers, but I'm trying to stay close to your initial input with this example.
当然,您可以添加标题,但我试图在此示例中与您的初始输入保持一致。
Now read_csv
produces the following DF:
现在read_csv
产生以下 DF:
0 1
0 0020-004241 purple
1 00532 Blue
2 00121 Yellow
3 055 Greem
4 0025-097 Orange
回答by Grant Shannon
Two lines of code using lambdas:
使用 lambdas 的两行代码:
df['code'] = df['code_and_name_of_code'].apply(lambda x: x.split(" ", 1)[0])
df['name_of_code'] = df['code_and_name_of_code'].apply(lambda x: x.split(" ", 1)[1].replace('-',''))
Inputs:
输入:
import pandas as pd
df =pd.read_csv('data.txt')
code_and_name_of_code
0 0020-004241 purple
1 00532 - Blue
2 00121 - Yellow
3 055 - Greem
4 0025-097 - Orange
Apply Lambdas
应用 Lambda
df['code'] = df['code_and_name_of_code'].apply(lambda x: x.split(" ", 1)[0])
df['name_of_code'] = df['code_and_name_of_code'].apply(lambda x: x.split(" ", 1)[1].replace('-',''))
Note:
注意:
- x.split(" ", 1) implies splitting on the first space
- x.split(" ", 1) is returned as a list where [0] exposes whatever is before the first space and [1] exposes whatever is after the first space
- x.split(" ", 1) 意味着在第一个空格上分裂
- x.split(" ", 1) 作为列表返回,其中 [0] 公开第一个空格之前的内容,而 [1] 公开第一个空格之后的内容
Outputs
输出
code_and_name_of_code code name_of_code
0 0020-004241 purple 0020-004241 purple
1 00532 - Blue 00532 Blue
2 00121 - Yellow 00121 Yellow
3 055 - Greem 055 Greem
4 0025-097 - Orange 0025-097 Orange
回答by ALollz
You can also use .str.extract
with a regular expression.
您还可以使用.str.extract
正则表达式。
df[['code', 'name_of_code']] = df.col.str.extract('(.*\d+)\s-?\s?(.*)', expand=True)
print(df)
col code name_of_code
0 0020-004241 purple 0020-004241 purple
1 00532 - Blue 00532 Blue
2 00121 - Yellow 00121 Yellow
3 055 - Greem 055 Greem
4 0025-097 - Orange 0025-097 Orange