pandas 熊猫在 csv 列中读取为浮点数并将空单元格设置为 0
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43598862/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read in csv column as float and set empty cells to 0
提问by mgig
Is it possible to read in a CSV as a pandas DataFrame and set spaces (or empty cells) to 0 in one line? Below is an illustration of the problem.
是否可以将 CSV 作为 Pandas DataFrame 读取并在一行中将空格(或空单元格)设置为 0?下面是问题的说明。
Input:
输入:
$ csvlook data.csv
|------+---+------|
| a | b | c |
|------+---+------|
| | a | 0.0 |
| 0 | b | 1.0 |
| 1.5 | c | 2.5 |
| 2.1 | d | 3.0 |
|------+---+------|
What I Want:
我想要的是:
python% print(df)
a b c
0 0 a 0.0
1 0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0
What I've Tried:
我试过的:
df = pd.read_csv('data.csv', dtype={'a': float, 'b': str, 'c': float})
Which spits out a ValueError due to the whitespace in the 0th row of column a:
由于列 a 的第 0 行中的空格,它会吐出 ValueError:
ValueError: could not convert string to float:
Is there a way of replacing strings with 0s when reading in a CSV with pandas?
在用Pandas读取 CSV 时,有没有办法用 0 替换字符串?
Code to Generate Test Data:
生成测试数据的代码:
If you want to try it out, here are the lines I used to generate the test data in the above example:
如果你想尝试一下,这里是我在上面的例子中用来生成测试数据的行:
import pandas as pd
df = pd.DataFrame({'a':[' ', 0, 1.5, 2.1], 'b':['a', 'b', 'c', 'd'], 'c': [0, 1, 2.5, 3]})
df.to_csv('data.csv', index=False)
回答by VictorGGl
Pandas will automatically read the empty values with NaN, so from there just fill them with the fillna method, setting the desired new value(in this case 0).
Pandas 将自动使用 NaN 读取空值,因此只需使用 fillna 方法填充它们,设置所需的新值(在本例中为 0)。
import pandas as pd
df = pd.read_csv('data.csv').fillna(value = 0)
Which yields:
其中产生:
a b c
0 0.0 a 0.0
1 0.0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0
Also you can set different values for each column by passing a dict. Imagine we have the following csv file:
您还可以通过传递字典为每列设置不同的值。假设我们有以下 csv 文件:
a b c
0 NaN a 0.0
1 0.0 b 1.0
2 1.5 NaN 2.5
3 2.1 d NaN
If we want it to be the same as before we should do:
如果我们希望它和以前一样,我们应该这样做:
pd.read_csv('data.csv').fillna(value = {'a':0,'b':'c','c':3})
Yielding again:
再次屈服:
a b c
0 0.0 a 0.0
1 0.0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0
回答by paulochf
Almost in one line, and might not work in a real case.
几乎在一行中,在实际情况下可能不起作用。
You can set missing values to be mapped to NaN in read_csv
您可以将缺失值设置为映射到 NaN read_csv
import pandas as pd
df = pd.read_csv('data.csv', na_values=" ")
yielding
屈服
a b c
0 NaN a 0.0
1 0.0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0
Then, you can run a fillna
to change the NaN's to .0
.
然后,您可以运行 afillna
将 NaN 更改为.0
。
Hence, the following line does it all:
因此,以下行完成了所有工作:
df = pd.read_csv('data.csv', na_values=" ").fillna(0)
gives
给
a b c
0 0.0 a 0.0
1 0.0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0
回答by Aravind Krishnakumar
df.replace(r'\s+', 0, regex=True)
a b c
0 0.0 a 0.0
1 0.0 b 1.0
2 1.5 c 2.5
3 2.1 d 3.0