Python 如何删除整数类型列中的最后两位数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33034559/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:38:38  来源:igfitidea点击:

How to remove last the two digits in a column that is of integer type?

pythonpandasdataframeinteger

提问by Techno04335

How can I remove the last two digits of a DataFrame column of type int64?

如何删除 int64 类型的 DataFrame 列的最后两位数字?

For example df['DATE']includes:

例如df['DATE']包括:

DATE
20110708
20110709
20110710
20110711
20110712
20110713
20110714
20110815
20110816
20110817

What I would like is:

我想要的是:

DATE
201107
201107
201107
201107
201107
201107
201107
201108
201108
201108

What is the simplest way of achieving this?

实现这一目标的最简单方法是什么?

采纳答案by EdChum

Convert the dtype to str using astypethen used vectorised strmethod to slice the str and then convert back to int64dtype again:

使用astype然后使用向量化str方法将 dtype 转换为 str 对 str 进行切片,然后int64再次转换回dtype:

In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df

Out[184]:
     DATE
0  201107
1  201107
2  201107
3  201107
4  201107
5  201107
6  201107
7  201108
8  201108
9  201108

In [185]:    
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE    10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes

Hmm...

唔...

Turns out there is a built in method floordiv:

原来有一个内置方法floordiv

In [191]:
df['DATE'].floordiv(100)

Out[191]:
0    201107
1    201107
2    201107
3    201107
4    201107
5    201107
6    201107
7    201108
8    201108
9    201108
Name: DATE, dtype: int64

update

更新

For a 1000 row df, the floordivmethod is considerably faster:

对于 1000 行 df,该floordiv方法要快得多:

%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)

100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 μs per loop

Here we observe ~10x speedup

在这里,我们观察到约 10 倍的加速

回答by Alex Riley

You could use floor division //to drop the last two digits and preserve the integer type:

您可以使用地板除法//删除最后两位数字并保留整数类型:

>>> df['DATE'] // 100
     DATE
0  201107
1  201107
2  201107
3  201107
4  201107
5  201107
6  201107
7  201108
8  201108
9  201108