在 Python Pandas 中删除 NaN 并转换为 float32
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32749211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove NaN and convert to float32 in Python Pandas
提问by tsotsi
I am reading in data from a csv file into a data frame, trying to remove all rows that contain NaNs and then convert it from float64 to float32. I have tried various solutions I've found online, nothing seems to work. Any thoughts?
我正在将 csv 文件中的数据读入数据框中,尝试删除所有包含 NaN 的行,然后将其从 float64 转换为 float32。我尝试了在网上找到的各种解决方案,似乎没有任何效果。有什么想法吗?
采纳答案by DalekSec
I think this does what you want:
我认为这可以满足您的要求:
pd.read_csv('Filename.csv').dropna().astype(np.float32)
To keep rows that only have someNaN values, do this:
要保留只有一些NaN 值的行,请执行以下操作:
pd.read_csv('Filename.csv').dropna(how='all').astype(np.float32)
To replace each NaN with a number instead of dropping rows, do this:
要将每个 NaN 替换为数字而不是删除行,请执行以下操作:
pd.read_csv('Filename.csv').fillna(1e6).astype(np.float32)
(I replaced NaN with 1,000,000 just as an example.)
(作为示例,我用 1,000,000 替换了 NaN。)
回答by Alexander
You can also specify the dtypewhen you read the csv file:
您还可以指定dtype读取 csv 文件的时间:
dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
dtype :类型名称或列的 dict -> 类型数据或列的数据类型。例如 {'a': np.float64, 'b': np.int32}
pd.read_csv(my_file, dtype={col: np.float32 for col in ['col_1', 'col_2']})
Example:
例子:
df_out = pd.DataFrame(np.random.random([5,5]), columns=list('ABCDE'))
df_out.iat[1,0] = np.nan
df_out.iat[2,1] = np.nan
df_out.to_csv('my_file.csv')
df = pd.read_csv('my_file.csv', dtype={col: np.float32 for col in list('ABCDE')})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 6 columns):
Unnamed: 0 5 non-null int64
A 4 non-null float32
B 4 non-null float32
C 5 non-null float32
D 5 non-null float32
E 5 non-null float32
dtypes: float32(5), int64(1)
memory usage: 180.0 bytes
>>> df.dropna(axis=0, how='any')
Unnamed: 0 A B C D E
0 0 0.176224 0.943918 0.322430 0.759862 0.028605
3 3 0.723643 0.105813 0.884290 0.589643 0.913065
4 4 0.654378 0.400152 0.763818 0.416423 0.847861

