pandas 熊猫在 csv 列中读取为浮点数并将空单元格设置为 0

Question

提问by mgig

Is it possible to read in a CSV as a pandas DataFrame and set spaces (or empty cells) to 0 in one line? Below is an illustration of the problem.

是否可以将 CSV 作为 Pandas DataFrame 读取并在一行中将空格（或空单元格）设置为 0？下面是问题的说明。

Input:

输入：

$ csvlook data.csv    
|------+---+------|
|  a   | b | c    |
|------+---+------|
|      | a | 0.0  |
|  0   | b | 1.0  |
|  1.5 | c | 2.5  |
|  2.1 | d | 3.0  |
|------+---+------|

What I Want:

我想要的是：

python% print(df)
    a   b   c
0   0   a   0.0
1   0   b   1.0
2   1.5 c   2.5
3   2.1 d   3.0

What I've Tried:

我试过的：

df = pd.read_csv('data.csv', dtype={'a': float, 'b': str, 'c': float})

Which spits out a ValueError due to the whitespace in the 0th row of column a:

由于列 a 的第 0 行中的空格，它会吐出 ValueError：

ValueError: could not convert string to float:

Is there a way of replacing strings with 0s when reading in a CSV with pandas?

在用Pandas读取 CSV 时，有没有办法用 0 替换字符串？

Code to Generate Test Data:

生成测试数据的代码：

If you want to try it out, here are the lines I used to generate the test data in the above example:

如果你想尝试一下，这里是我在上面的例子中用来生成测试数据的行：

import pandas as pd
df = pd.DataFrame({'a':[' ', 0, 1.5, 2.1], 'b':['a', 'b', 'c', 'd'], 'c': [0, 1, 2.5, 3]})
df.to_csv('data.csv', index=False)

Answer 1

回答by VictorGGl

Pandas will automatically read the empty values with NaN, so from there just fill them with the fillna method, setting the desired new value(in this case 0).

Pandas 将自动使用 NaN 读取空值，因此只需使用 fillna 方法填充它们，设置所需的新值（在本例中为 0）。

import pandas as pd

df = pd.read_csv('data.csv').fillna(value = 0)

Which yields:

其中产生：

     a  b    c
0  0.0  a  0.0
1  0.0  b  1.0
2  1.5  c  2.5
3  2.1  d  3.0

Also you can set different values for each column by passing a dict. Imagine we have the following csv file:

您还可以通过传递字典为每列设置不同的值。假设我们有以下 csv 文件：

     a    b    c
0  NaN    a  0.0
1  0.0    b  1.0
2  1.5  NaN  2.5
3  2.1    d  NaN

If we want it to be the same as before we should do:

如果我们希望它和以前一样，我们应该这样做：

pd.read_csv('data.csv').fillna(value = {'a':0,'b':'c','c':3})

Yielding again:

再次屈服：

     a  b    c
0  0.0  a  0.0
1  0.0  b  1.0
2  1.5  c  2.5
3  2.1  d  3.0

Answer 2

回答by paulochf

Almost in one line, and might not work in a real case.

几乎在一行中，在实际情况下可能不起作用。

You can set missing values to be mapped to NaN in read_csv

您可以将缺失值设置为映射到 NaN read_csv

import pandas as pd
df = pd.read_csv('data.csv', na_values=" ")

yielding

屈服

     a  b    c
0  NaN  a  0.0
1  0.0  b  1.0
2  1.5  c  2.5
3  2.1  d  3.0

Then, you can run a fillnato change the NaN's to .0.

然后，您可以运行 afillna将 NaN 更改为.0。

Hence, the following line does it all:

因此，以下行完成了所有工作：

df = pd.read_csv('data.csv', na_values=" ").fillna(0)

gives

给

     a  b    c
0  0.0  a  0.0
1  0.0  b  1.0
2  1.5  c  2.5
3  2.1  d  3.0

Answer 3

回答by Aravind Krishnakumar

df.replace(r'\s+', 0, regex=True)

     a  b    c
0  0.0  a  0.0
1  0.0  b  1.0
2  1.5  c  2.5
3  2.1  d  3.0

pandas 熊猫在 csv 列中读取为浮点数并将空单元格设置为 0

提问by mgig

回答by VictorGGl

回答by paulochf

回答by Aravind Krishnakumar

相关推荐

最近更新

标签

pandas 熊猫在 csv 列中读取为浮点数并将空单元格设置为 0

提问by mgig

回答by VictorGGl

回答by paulochf

回答by Aravind Krishnakumar

相关推荐

pandas Panda .loc 或 .iloc 从数据集中选择列

pandas 熊猫在 groupby.apply(..) 之后删除组列

pandas 如何将数据框转换为一维数组？

pandas 为数据框的每一行应用 textblob

相关推荐

最近更新

标签