pandas 获取熊猫数据框列中值的长度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49215099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get length of values in pandas dataframe column
提问by modLmakur
I'm trying to get the length of each zipCd value in the dataframe mentioned below. When I run the code below I get 958 for every record. I'm expecting to get something more like '4'. Does anyone see what the issue is?
我正在尝试获取下面提到的数据帧中每个 zipCd 值的长度。当我运行下面的代码时,每条记录都会得到 958。我期待得到更像“4”的东西。有没有人看到问题是什么?
Code:
zipDfCopy['zipCd'].str.len()
Data:
print zipDfCopy[1:5]
Zip Code Place Name State State Abbreviation County \
1 544 Holtsville New York NY Suffolk
2 1001 Agawam Massachusetts MA Hampden
3 1002 Amherst Massachusetts MA Hampshire
4 1003 Amherst Massachusetts MA Hampshire
Latitude Longitude zipCd
1 40.8154 -73.0451 0 501\n1 544\n2 1001...
2 42.0702 -72.6227 0 501\n1 544\n2 1001...
3 42.3671 -72.4646 0 501\n1 544\n2 1001...
4 42.3919 -72.5248 0 501\n1 544\n2 1001...
回答by jpp
One way is to convert to string and use pd.Series.map
with len
built-in.
一种方法是转换为字符串并pd.Series.map
与len
内置一起使用。
pd.Series.str
is used for vectorized string functions, while pd.Series.astype
is used to change column type.
pd.Series.str
用于向量化字符串函数,而pd.Series.astype
用于更改列类型。
import pandas as pd
df = pd.DataFrame({'ZipCode': [341, 4624, 536, 123, 462, 4642]})
df['ZipLen'] = df['ZipCode'].astype(str).map(len)
# ZipCode ZipLen
# 0 341 3
# 1 4624 4
# 2 536 3
# 3 123 3
# 4 462 3
# 5 4642 4
A more explicit alternative is to use np.log10
:
更明确的替代方法是使用np.log10
:
df['ZipLen'] = np.floor(np.log10(df['ZipCode'].values)).astype(int) + 1