Python 使用 Pandas 将字符串对象转换为 int/float

Question

提问by tejesh95

import pandas as pd

path1 = "/home/supertramp/Desktop/100&life_180_data.csv"

mydf =  pd.read_csv(path1)

numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}

print mydf['Cigarettes']

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

print mydf['CigarNum']

mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')

The csv file "100&life_180_data.csv" contains columns like age, bmi,Cigarettes,Alocohol etc.

csv 文件“100&life_180_data.csv”包含年龄、bmi、香烟、酒精等列。

No                int64
Age               int64
BMI             float64
Alcohol          object
Cigarettes       object
dtype: object

Cigarettes column contains "Never" "1-5 Cigarettes/day","10-20 Cigarettes/day". I want to assign weights to these object (Never,1-5 Cigarettes/day ,....)

香烟栏包含“从不”、“1-5 支香烟/天”、“10-20 支香烟/天”。我想为这些对象分配权重（从不，1-5 支香烟/天，....）

The expected output is new column CigarNum appended which consists only numbers 0,1,2 CigarNum is as expected till 8 rows and then shows Nan till last row in CigarNum column

预期的输出是附加的新列 CigarNum，它只包含数字 0,1,2 CigarNum 与预期的一样，直到 8 行，然后显示 Nan 直到 CigarNum 列的最后一行

0                     Never
1                     Never
2        1-5 Cigarettes/day
3                     Never
4                     Never
5                     Never
6                     Never
7                     Never
8                     Never
9                     Never
10                    Never
11                    Never
12     10-20 Cigarettes/day
13       1-5 Cigarettes/day
14                    Never
...
167                    Never
168                    Never
169     10-20 Cigarettes/day
170                    Never
171                    Never
172                    Never
173                    Never
174                    Never
175                    Never
176                    Never
177                    Never
178                    Never
179                    Never
180                    Never
181                    Never
Name: Cigarettes, Length: 182, dtype: object

The output I get shoudln't give NaN after few first rows.

在第一行几行之后，我得到的输出不应给出 NaN。

0      0
1      0
2      1
3      0
4      0
5      0
6      0
7      0
8      0
9      0
10   NaN
11   NaN
12   NaN
13   NaN
14     0
...
167   NaN
168   NaN
169   NaN
170   NaN
171   NaN
172   NaN
173   NaN
174   NaN
175   NaN
176   NaN
177   NaN
178   NaN
179   NaN
180   NaN
181   NaN
Name: CigarNum, Length: 182, dtype: float64

Answer 1

采纳答案by EdChum

OK, first problem is you have embedded spaces causing the function to incorrectly apply:

好的，第一个问题是您嵌入了空格，导致函数应用不正确：

fix this using vectorised str:

使用矢量化解决这个问题str：

mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')

now create your new column should just work:

现在创建您的新列应该可以正常工作：

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

UPDATE

更新

Thanks to @Jeff as always for pointing out superior ways to do things:

感谢@Jeff 一如既往地指出做事的优越方法：

So you can call replaceinstead of calling apply:

所以你可以打电话replace而不是打电话apply：

mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)

you can also use factorizemethod also.

你也可以使用factorize方法。

Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?

考虑一下为什么不将 dict 值设置为浮点数，然后避免类型转换？

So:

所以：

numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}

Version 0.17.0 or newer

0.17.0 或更新版本

convert_objectsis deprecated since 0.17.0, this has been replaced with to_numeric

convert_objects已被弃用0.17.0，这已被替换为to_numeric

mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')

Here errors='coerce'will return NaNwhere the values cannot be converted to a numeric value, without this it will raise an exception

这里errors='coerce'将返回NaN值不能转换为数值的地方，没有它会引发异常

Answer 2

回答by Apogentus

Try using this function for all problems of this kind:

尝试使用此函数解决所有此类问题：

def get_series_ids(x):
    '''Function returns a pandas series consisting of ids, 
       corresponding to objects in input pandas series x
       Example: 
       get_series_ids(pd.Series(['a','a','b','b','c'])) 
       returns Series([0,0,1,1,2], dtype=int)'''

    values = np.unique(x)
    values2nums = dict(zip(values,range(len(values))))
    return x.replace(values2nums)

Python 使用 Pandas 将字符串对象转换为 int/float

提问by tejesh95

采纳答案by EdChum

回答by Apogentus

相关推荐

最近更新

标签

Python 使用 Pandas 将字符串对象转换为 int/float

提问by tejesh95

采纳答案by EdChum

回答by Apogentus

相关推荐

Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 34 中的字节 0xc3：数据意外结束

使用 Python 的 JSON 数据中的空值，而不是无值

IPython Notebook - 提前退出单元格

Python Pandas GroupBy 获取组列表

相关推荐

最近更新

标签