pandas 获取熊猫数据框列中值的长度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49215099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:18:16  来源:igfitidea点击:

Get length of values in pandas dataframe column

pythonpython-2.7pandas

提问by modLmakur

I'm trying to get the length of each zipCd value in the dataframe mentioned below. When I run the code below I get 958 for every record. I'm expecting to get something more like '4'. Does anyone see what the issue is?

我正在尝试获取下面提到的数据帧中每个 zipCd 值的长度。当我运行下面的代码时,每条记录都会得到 958。我期待得到更像“4”的东西。有没有人看到问题是什么?

Code:
zipDfCopy['zipCd'].str.len()

Data:
print zipDfCopy[1:5]

   Zip Code  Place Name          State State Abbreviation     County  \
1       544  Holtsville       New York                 NY    Suffolk   
2      1001      Agawam  Massachusetts                 MA    Hampden   
3      1002     Amherst  Massachusetts                 MA  Hampshire   
4      1003     Amherst  Massachusetts                 MA  Hampshire   

   Latitude  Longitude                                              zipCd  
1   40.8154   -73.0451  0          501\n1          544\n2         1001...  
2   42.0702   -72.6227  0          501\n1          544\n2         1001...  
3   42.3671   -72.4646  0          501\n1          544\n2         1001...  
4   42.3919   -72.5248  0          501\n1          544\n2         1001...  

回答by jpp

One way is to convert to string and use pd.Series.mapwith lenbuilt-in.

一种方法是转换为字符串并pd.Series.maplen内置一起使用。

pd.Series.stris used for vectorized string functions, while pd.Series.astypeis used to change column type.

pd.Series.str用于向量化字符串函数,而pd.Series.astype用于更改列类型。

import pandas as pd

df = pd.DataFrame({'ZipCode': [341, 4624, 536, 123, 462, 4642]})

df['ZipLen'] = df['ZipCode'].astype(str).map(len)

#    ZipCode  ZipLen
# 0      341       3
# 1     4624       4
# 2      536       3
# 3      123       3
# 4      462       3
# 5     4642       4

A more explicit alternative is to use np.log10:

更明确的替代方法是使用np.log10

df['ZipLen'] = np.floor(np.log10(df['ZipCode'].values)).astype(int) + 1