pandas 过滤数据框列值大于零?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46728593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:37:51  来源:igfitidea点击:

Filter dataframe columns values greater than zero?

pythonpandasdataframe

提问by Dheeraj

I have a csv file which i am reading as pd.read_csv(File) and i am trying to get only those rows which have values greater than zero.

我有一个 csv 文件,我将其读取为 pd.read_csv(File) 并且我试图只获取那些值大于零的行。

The dataframe hase some empty cells and some negative values and some exp numbers like -1.72E+10.

数据帧有一些空单元格和一些负值以及一些 exp 数字,如 -1.72E+10。

Time              A      B       C       D       E       F         G
9/8/2017 8:40   1.29    0.27    1.78    0.23    0.33    0.05    -13.72
9/8/2017 9:00   1.28    0.26    1.78    0.22    0.35    0.02    -13.59
9/8/2017 9:20   1.43                         
9/8/2017 9:40   1.44    0.29    1.93    0.25    0.28    0.01    -13.92
9/8/2017 10:00  1.36    0.27    1.84    0.23    0.31    0.02    -13.77
9/8/2017 10:20  1.38    0.27    1.89    0.23    0.31    0.01    -13.83
9/8/2017 10:40      -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10
9/8/2017 11:00  1.4 0.28    1.88    0.24    0.28    0.02    -13.92
9/8/2017 11:20  1.43    0.28    1.92    0.24    0.29    0.02    -13.83

Whenever i run the code it doesn't filter those data.

每当我运行代码时,它都不会过滤这些数据。

df = df[df > 0]

type of the column is str instead of numpy.float64

列的类型是 str 而不是 numpy.float64

Can someone tell me the problem?

有人可以告诉我问题吗?

I want to filter the whole dataframe rows whose values are graeter than 0.

我想过滤值大于 0 的整个数据帧行。

回答by jezrael

I think you need anyfor check at least one True:

我认为您any至少需要检查一个True

df = df[(df > 0).any(axis=1)]

Or allfor check if all Trues:

或者all检查是否全部为True

df = df[(df > 0).all(axis=1)]


#last row and first numeric column was modify for no negative values
print (df)
             Time             A             B             C             D  \
0   9/8/2017 8:40  1.290000e+00  2.700000e-01  1.780000e+00  2.300000e-01   
1   9/8/2017 9:00  1.280000e+00  2.600000e-01  1.780000e+00  2.200000e-01   
2   9/8/2017 9:20  1.430000e+00           NaN           NaN           NaN   
3   9/8/2017 9:40  1.440000e+00  2.900000e-01  1.930000e+00  2.500000e-01   
4  9/8/2017 10:00  1.360000e+00  2.700000e-01  1.840000e+00  2.300000e-01   
5  9/8/2017 10:20  1.380000e+00  2.700000e-01  1.890000e+00  2.300000e-01   
6  9/8/2017 10:40  1.720000e+10 -1.720000e+10 -1.720000e+10 -1.720000e+10   
7  9/8/2017 11:00  1.400000e+00  2.800000e-01  1.880000e+00  2.400000e-01   
8  9/8/2017 11:20  1.430000e+00  2.800000e-01  1.920000e+00  2.400000e-01   

              E             F      G  
0  3.300000e-01  5.000000e-02 -13.72  
1  3.500000e-01  2.000000e-02 -13.59  
2           NaN           NaN    NaN  
3  2.800000e-01  1.000000e-02 -13.92  
4  3.100000e-01  2.000000e-02 -13.77  
5  3.100000e-01  1.000000e-02 -13.83  
6 -1.720000e+10 -1.720000e+10    NaN  
7  2.800000e-01  2.000000e-02 -13.92  
8  2.900000e-01  2.000000e-02  13.83  


df1 = df[(df > 0).all(axis=1)]
print (df1)
             Time     A     B     C     D     E     F      G
8  9/8/2017 11:20  1.43  0.28  1.92  0.24  0.29  0.02  13.83


df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00

EDIT1:

编辑1:

For convert to floats all columns without Time:

对于转换为floats 的所有列,没有Time

cols = df.columns.difference(['Time'])
df[cols] = df[cols].astype(float)
print (df.dtypes)
Time     object
A       float64
B       float64
C       float64
D       float64
E       float64
F       float64
G       float64
dtype: object

df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00