Python AttributeError: 'float' 对象没有属性 'split'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42224700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:25:22  来源:igfitidea点击:

AttributeError: 'float' object has no attribute 'split'

pythoncsvparsingnan

提问by Dhruv Ghulati

I am calling this line:

我打电话给这条线:

lang_modifiers = [keyw.strip() for keyw in row["language_modifiers"].split("|") if not isinstance(row["language_modifiers"], float)]

This seems to work where row["language_modifiers"]is a word (atlas method, central), but not when it comes up as nan.

这似乎适用于row["language_modifiers"]单词 ( atlas method, central) 的位置,但当它出现为nan.

I thought my if not isinstance(row["language_modifiers"], float)could catch the time when things come up as nanbut not the case.

我以为我if not isinstance(row["language_modifiers"], float)可以赶上事情出现的时间,nan但事实并非如此。

Background: row["language_modifiers"]is a cell in a tsv file, and comes up as nanwhen that cell was empty in the tsv being parsed.

背景:row["language_modifiers"]是 tsv 文件中的一个单元格,nan当该单元格在被解析的 tsv 中为空时出现。

回答by Ozgur Ozturk

You are right, such errors mostly caused by NaN representing empty cells. It is common to filter out such data, before applying your further operations, using this idiom on your dataframe df:

您是对的,此类错误主要是由表示空单元格的 NaN 引起的。在应用您的进一步操作之前,在您的数据帧 df 上使用此习语来过滤掉此类数据是很常见的:

df_new = df[df['ColumnName'].notnull()]

Alternatively, it may be more handy to use fillna()method to impute (to replace) nullvalues with something default. E.g. all nullor NaN's can be replaced with the average value for its column

或者,使用fillna()方法null用默认值来估算(替换)值可能更方便。例如,所有nullorNaN可以替换为其列的平均值

housing['LotArea'] = housing['LotArea'].fillna(housing.mean()['LotArea'])

or can be replaced with a value like empty string "" or another default value

或者可以替换为空字符串 "" 或其他默认值之类的值

housing['GarageCond']=housing['GarageCond'].fillna("")

回答by hpl002

You might also use df = df.dropna(thresh=n)where nis the tolerance. Meaning, it requires n Non-NA values to not drop the row

您还可以使用df = df.dropna(thresh=n)其中n的公差。意思是,它需要n 个非 NA 值才能不删除行

Mind you, this approach will remove the row

请注意,这种方法将删除该行

For example: If you have a dataframe with 5 columns, df.dropna(thresh=5)would drop any row that does not have 5 valid, or non-Na values.

例如:如果您有一个包含 5 列的数据框,df.dropna(thresh=5)将删除没有 5 个有效值或非 Na 值的任何行。

In your case you might only want to keep valid rows; if so, you can set the threshold to the number of columns you have.

在您的情况下,您可能只想保留有效行;如果是这样,您可以将阈值设置为您拥有的列数。

pandas documentation on dropna

关于 dropna 的 pandas 文档