在 Python Pandas 中删除 NaN 并转换为 float32

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32749211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:55:30  来源:igfitidea点击:

Remove NaN and convert to float32 in Python Pandas

pythoncsvpandas

提问by tsotsi

I am reading in data from a csv file into a data frame, trying to remove all rows that contain NaNs and then convert it from float64 to float32. I have tried various solutions I've found online, nothing seems to work. Any thoughts?

我正在将 csv 文件中的数据读入数据框中,尝试删除所有包含 NaN 的行,然后将其从 float64 转换为 float32。我尝试了在网上找到的各种解决方案,似乎没有任何效果。有什么想法吗?

采纳答案by DalekSec

I think this does what you want:

我认为这可以满足您的要求:

pd.read_csv('Filename.csv').dropna().astype(np.float32)

To keep rows that only have someNaN values, do this:

要保留只有一些NaN 值的行,请执行以下操作:

pd.read_csv('Filename.csv').dropna(how='all').astype(np.float32)

To replace each NaN with a number instead of dropping rows, do this:

要将每个 NaN 替换为数字而不是删除行,请执行以下操作:

pd.read_csv('Filename.csv').fillna(1e6).astype(np.float32)

(I replaced NaN with 1,000,000 just as an example.)

(作为示例,我用 1,000,000 替换了 NaN。)

回答by Alexander

You can also specify the dtypewhen you read the csv file:

您还可以指定dtype读取 csv 文件的时间:

dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}

dtype :类型名称或列的 dict -> 类型数据或列的数据类型。例如 {'a': np.float64, 'b': np.int32}

pd.read_csv(my_file, dtype={col: np.float32 for col in ['col_1', 'col_2']})

Example:

例子:

df_out = pd.DataFrame(np.random.random([5,5]), columns=list('ABCDE'))
df_out.iat[1,0] = np.nan 
df_out.iat[2,1] = np.nan
df_out.to_csv('my_file.csv')

df = pd.read_csv('my_file.csv', dtype={col: np.float32 for col in list('ABCDE')})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 6 columns):
Unnamed: 0    5 non-null int64
A             4 non-null float32
B             4 non-null float32
C             5 non-null float32
D             5 non-null float32
E             5 non-null float32
dtypes: float32(5), int64(1)
memory usage: 180.0 bytes

>>> df.dropna(axis=0, how='any')
   Unnamed: 0         A         B         C         D         E
0           0  0.176224  0.943918  0.322430  0.759862  0.028605
3           3  0.723643  0.105813  0.884290  0.589643  0.913065
4           4  0.654378  0.400152  0.763818  0.416423  0.847861