pandas 如何从熊猫的字符串中提取前 8 个字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51607400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract first 8 characters from a string in pandas
提问by Rahul rajan
I have column in a dataframe and i am trying to extract 8 digits from a string. How can I do it
我在数据框中有一列,我正在尝试从字符串中提取 8 位数字。我该怎么做
Input
Shipment ID
20180504-S-20000
20180514-S-20537
20180514-S-20541
20180514-S-20644
20180514-S-20644
20180516-S-20009
20180516-S-20009
20180516-S-20009
20180516-S-20009
Expected Output
预期产出
Order_Date
20180504
20180514
20180514
20180514
20180514
20180516
20180516
20180516
20180516
I tried below code and it didnt work.
我尝试了下面的代码,但没有用。
data['Order_Date'] = data['Shipment ID'][:8]
回答by jezrael
You are close, need indexing with str
which is apply for each value of Serie
s:
您很接近,需要索引,str
适用于Serie
s 的每个值:
data['Order_Date'] = data['Shipment ID'].str[:8]
For better performance if no NaN
s values:
如果没有NaN
s 值,为了获得更好的性能:
data['Order_Date'] = [x[:8] for x in data['Shipment ID']]
print (data)
Shipment ID Order_Date
0 20180504-S-20000 20180504
1 20180514-S-20537 20180514
2 20180514-S-20541 20180514
3 20180514-S-20644 20180514
4 20180514-S-20644 20180514
5 20180516-S-20009 20180516
6 20180516-S-20009 20180516
7 20180516-S-20009 20180516
8 20180516-S-20009 20180516
If omit str
code filter column by position, first N values like:
如果str
按位置省略代码过滤器列,则前 N 个值如:
print (data['Shipment ID'][:2])
0 20180504-S-20000
1 20180514-S-20537
Name: Shipment ID, dtype: object
回答by Rakesh
You can also use str.extract
你也可以使用 str.extract
Ex:
前任:
import pandas as pd
df = pd.DataFrame({'Shipment ID': ['20180504-S-20000', '20180514-S-20537', '20180514-S-20541', '20180514-S-20644', '20180514-S-20644', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009']})
df["Order_Date"] = df["Shipment ID"].str.extract(r"(\d{8})")
print(df)
Output:
输出:
Shipment ID Order_Date
0 20180504-S-20000 20180504
1 20180514-S-20537 20180514
2 20180514-S-20541 20180514
3 20180514-S-20644 20180514
4 20180514-S-20644 20180514
5 20180516-S-20009 20180516
6 20180516-S-20009 20180516
7 20180516-S-20009 20180516
8 20180516-S-20009 20180516
回答by Onyambu
You can also decide to delete from -S
to the end
您也可以决定从删除-S
到最后
df["Order_Date"]=df['Shipment ID'].replace(regex=r"\-.*",value="")
df
Shipment ID Order_Date
0 20180504-S-20000 20180504
1 20180514-S-20537 20180514
2 20180514-S-20541 20180514
3 20180514-S-20644 20180514
4 20180514-S-20644 20180514
5 20180516-S-20009 20180516
6 20180516-S-20009 20180516
7 20180516-S-20009 20180516
8 20180516-S-20009 20180516
Also you can capture the first 8 digits then delete everything and replace back with a backreference of the captured group:
您也可以捕获前 8 位数字,然后删除所有内容并用捕获组的反向引用替换回来:
df['Shipment ID'].replace(regex=r"(\d{8}).*",value="\1")