pandas 根据某些列（熊猫）中的空值删除行

Question

提问by gesingle

I know how to drop a row from a DataFrame containing all nulls OR a single null but can you drop a row based on the nulls for a specified set of columns?

我知道如何从包含所有空值或单个空值的 DataFrame 中删除一行，但是您可以根据一组指定列的空值删除一行吗？

For example, say I am working with data containing geographical info (city, latitude, and longitude) in addition to numerous other fields. I want to keep the rows that at a minimum contain a value for city OR for lat and long but drop rows that have null values for all three.

例如，假设我正在处理包含地理信息（城市、纬度和经度）以及许多其他字段的数据。我想保留至少包含 city 值或 lat 和 long 值的行，但删除所有三个值都为空的行。

I am having trouble finding functionality for this in pandas documentation. Any guidance would be appreciated.

我无法在 pandas 文档中找到此功能。任何指导将不胜感激。

Answer 1

回答by Gene Burinsky

You can use pd.dropnabut instead of using how='all'and subset=[], you can use the threshparameter to require a minimum number of NAs in a row before a row gets dropped. In the city, long/lat example, a thresh=2will work because we only drop in case of 3 NAs. Using the great data example set up by MaxU, we would do

您可以使用pd.dropna但不是使用how='all'and subset=[]，而是可以使用该thresh参数在一行被删除之前要求最少数量的 NA。在城市中，long/lat 示例中，athresh=2会起作用，因为我们只在 3 个 NA 的情况下下降。使用 MaxU 设置的优秀数据示例，我们会做

## get the data
df = pd.read_clipboard()

## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)

This yields:

这产生：

In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

Answer 2

回答by MaxU

Try this:

尝试这个：

In [25]: df
Out[25]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
2  NaN       NaN        NaN  3  4
3  NaN   11.1111    33.3330  1  2
4  NaN       NaN    44.4440  1  1

In [26]: df.query("city == city or (latitude == latitude and longitude == longitude)")
Out[26]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

If i understand OP correctly the row with index 4must be dropped as not both coordinates are not-null. So dropna()won't work "properly" in this case:

如果我正确理解 OP，则4必须删除带有索引的行，因为不是两个坐标都不是空的。所以dropna()在这种情况下不会“正常”工作：

In [62]: df.dropna(subset=['city','latitude','longitude'], how='all')
Out[62]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2
4  NaN       NaN    44.4440  1  1   # this row should be dropped...

Answer 3

回答by Boud

dropna has a parameter to apply the tests only on a subset of columns:

dropna 有一个参数来仅对列的子集应用测试：

dropna(axis=0, how='all', subset=[your three columns in this list])

Answer 4

回答by piRSquared

Using a boolean mask and some clever dotproduct (this is for @Boud)

使用布尔掩码和一些聪明的dot产品（这是针对@Boud）

subset = ['city', 'latitude', 'longitude']
df[df[subset].notnull().dot([2, 1, 1]).ge(2)]

  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

Answer 5

回答by Jimmy C

You can perform selection by exploiting the bitwise operators.

您可以通过利用按位运算符来执行选择。

## create example data
df = pd.DataFrame({'City': ['Gothenburg', None, None], 'Long': [None, 1, 1], 'Lat': [1, None, 1]})

## bitwise/logical operators
~df.City.isnull() | (~df.Lat.isnull() & ~df.Long.isnull())
0     True
1    False
2     True
dtype: bool

## subset using above statement
df[~df.City.isnull() | (~df.Lat.isnull() & ~df.Long.isnull())]
         City  Lat  Long
0  Gothenburg  1.0   NaN
2        None  1.0   1.0

pandas 根据某些列（熊猫）中的空值删除行

提问by gesingle

回答by Gene Burinsky

回答by MaxU

回答by Boud

回答by piRSquared

回答by Jimmy C

相关推荐

最近更新

标签

pandas 根据某些列（熊猫）中的空值删除行

提问by gesingle

回答by Gene Burinsky

回答by MaxU

回答by Boud

回答by piRSquared

回答by Jimmy C

相关推荐

Pandas：将 dbf 表转换为数据框

pandas 无法在 Jupyter Notebook 上导入熊猫

pandas ValueError：不受支持的泡菜协议：4 与熊猫

Pandas dataframe.query 方法语法

相关推荐

最近更新

标签