Python 如何删除整数类型列中的最后两位数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33034559/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove last the two digits in a column that is of integer type?
提问by Techno04335
How can I remove the last two digits of a DataFrame column of type int64?
如何删除 int64 类型的 DataFrame 列的最后两位数字?
For example df['DATE']
includes:
例如df['DATE']
包括:
DATE
20110708
20110709
20110710
20110711
20110712
20110713
20110714
20110815
20110816
20110817
What I would like is:
我想要的是:
DATE
201107
201107
201107
201107
201107
201107
201107
201108
201108
201108
What is the simplest way of achieving this?
实现这一目标的最简单方法是什么?
采纳答案by EdChum
Convert the dtype to str using astype
then used vectorised str
method to slice the str and then convert back to int64
dtype again:
使用astype
然后使用向量化str
方法将 dtype 转换为 str 对 str 进行切片,然后int64
再次转换回dtype:
In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df
Out[184]:
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
In [185]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE 10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes
Hmm...
唔...
Turns out there is a built in method floordiv
:
原来有一个内置方法floordiv
:
In [191]:
df['DATE'].floordiv(100)
Out[191]:
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
Name: DATE, dtype: int64
update
更新
For a 1000 row df, the floordiv
method is considerably faster:
对于 1000 行 df,该floordiv
方法要快得多:
%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)
100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 μs per loop
Here we observe ~10x speedup
在这里,我们观察到约 10 倍的加速
回答by Alex Riley
You could use floor division //
to drop the last two digits and preserve the integer type:
您可以使用地板除法//
删除最后两位数字并保留整数类型:
>>> df['DATE'] // 100
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108