Python AttributeError: 'float' 对象没有属性 'split'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42224700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
AttributeError: 'float' object has no attribute 'split'
提问by Dhruv Ghulati
I am calling this line:
我打电话给这条线:
lang_modifiers = [keyw.strip() for keyw in row["language_modifiers"].split("|") if not isinstance(row["language_modifiers"], float)]
This seems to work where row["language_modifiers"]
is a word (atlas method
, central
), but not when it comes up as nan
.
这似乎适用于row["language_modifiers"]
单词 ( atlas method
, central
) 的位置,但当它出现为nan
.
I thought my if not isinstance(row["language_modifiers"], float)
could catch the time when things come up as nan
but not the case.
我以为我if not isinstance(row["language_modifiers"], float)
可以赶上事情出现的时间,nan
但事实并非如此。
Background: row["language_modifiers"]
is a cell in a tsv file, and comes up as nan
when that cell was empty in the tsv being parsed.
背景:row["language_modifiers"]
是 tsv 文件中的一个单元格,nan
当该单元格在被解析的 tsv 中为空时出现。
回答by Ozgur Ozturk
You are right, such errors mostly caused by NaN representing empty cells. It is common to filter out such data, before applying your further operations, using this idiom on your dataframe df:
您是对的,此类错误主要是由表示空单元格的 NaN 引起的。在应用您的进一步操作之前,在您的数据帧 df 上使用此习语来过滤掉此类数据是很常见的:
df_new = df[df['ColumnName'].notnull()]
Alternatively, it may be more handy to use fillna()
method to impute (to replace) null
values with something default.
E.g. all null
or NaN
's can be replaced with the average value for its column
或者,使用fillna()
方法null
用默认值来估算(替换)值可能更方便。例如,所有null
orNaN
可以替换为其列的平均值
housing['LotArea'] = housing['LotArea'].fillna(housing.mean()['LotArea'])
or can be replaced with a value like empty string "" or another default value
或者可以替换为空字符串 "" 或其他默认值之类的值
housing['GarageCond']=housing['GarageCond'].fillna("")
回答by hpl002
You might also use df = df.dropna(thresh=n)
where n
is the tolerance. Meaning, it requires n Non-NA values to not drop the row
您还可以使用df = df.dropna(thresh=n)
其中n
的公差。意思是,它需要n 个非 NA 值才能不删除行
Mind you, this approach will remove the row
请注意,这种方法将删除该行
For example: If you have a dataframe with 5 columns, df.dropna(thresh=5)
would drop any row that does not have 5 valid, or non-Na values.
例如:如果您有一个包含 5 列的数据框,df.dropna(thresh=5)
将删除没有 5 个有效值或非 Na 值的任何行。
In your case you might only want to keep valid rows; if so, you can set the threshold to the number of columns you have.
在您的情况下,您可能只想保留有效行;如果是这样,您可以将阈值设置为您拥有的列数。