Python pandas从一列字符串的数据选择中过滤掉nan

Question

提问by ccsv

Without using groupbyhow would I filter out data without NaN?

如果不使用groupby，我将如何过滤掉没有的数据NaN？

Let say I have a matrix where customers will fill in 'N/A','n/a' or any of its variations and others leave it blank:

假设我有一个矩阵，客户将在其中填写“N/A”、“n/a”或其任何变体，而其他人将其留空：

import pandas as pd
import numpy as np


df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],
                  'rating': [3., 4., 5., np.nan, np.nan, np.nan],
                  'name': ['John', np.nan, 'N/A', 'Graham', np.nan, np.nan]})

nbs = df['name'].str.extract('^(N/A|NA|na|n/a)')
nms=df[(df['name'] != nbs) ]

output:

输出：

>>> nms
  movie    name  rating
0   thg    John       3
1   thg     NaN       4
3   mol  Graham     NaN
4   lob     NaN     NaN
5   lob     NaN     NaN

How would I filter out NaN values so I can get results to work with like this:

我将如何过滤掉 NaN 值，以便我可以得到这样的结果：

  movie    name  rating
0   thg    John       3
3   mol  Graham     NaN

I am guessing I need something like ~np.isnanbut the tilda does not work with strings.

我猜我需要类似的东西，~np.isnan但 tilda 不适用于字符串。

Answer 1

采纳答案by EdChum

Just drop them:

只需放下它们：

nms.dropna(thresh=2)

this will drop all rows where there are at least two non-NaN.

这将删除至少有两个非NaN.

Then you could then drop where name is NaN:

然后你可以删除 name 所在的位置NaN：

In [87]:

nms
Out[87]:
  movie    name  rating
0   thg    John       3
1   thg     NaN       4
3   mol  Graham     NaN
4   lob     NaN     NaN
5   lob     NaN     NaN

[5 rows x 3 columns]
In [89]:

nms = nms.dropna(thresh=2)
In [90]:

nms[nms.name.notnull()]
Out[90]:
  movie    name  rating
0   thg    John       3
3   mol  Graham     NaN

[2 rows x 3 columns]

EDIT

编辑

Actually looking at what you originally want you can do just this without the dropnacall:

实际上，查看您最初想要的内容，您可以在不dropna调用的情况下执行此操作：

nms[nms.name.notnull()]

UPDATE

更新

Looking at this question 3 years later, there is a mistake, firstly thresharg looks for at least nnon-NaNvalues so in fact the output should be:

3 年后看这个问题，有一个错误，首先thresharg 寻找至少n非NaN值，所以实际上输出应该是：

In [4]:
nms.dropna(thresh=2)

Out[4]:
  movie    name  rating
0   thg    John     3.0
1   thg     NaN     4.0
3   mol  Graham     NaN

It's possible that I was either mistaken 3 years ago or that the version of pandas I was running had a bug, both scenarios are entirely possible.

可能是我 3 年前弄错了，或者我运行的 Pandas 版本有错误，这两种情况都是完全可能的。

Answer 2

回答by Gil Baggio

Simplest of all solutions:

最简单的解决方案：

filtered_df = df[df['name'].notnull()]

Thus, it filters out only rows that doesn't have NaN values in 'name' column.

因此，它仅过滤掉“名称”列中没有 NaN 值的行。

For multiple columns:

对于多列：

filtered_df = df[df[['name', 'country', 'region']].notnull().all(1)]

Answer 3

回答by Bashar Mohammad

df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],'rating': [3., 4., 5., np.nan, np.nan, np.nan],'name': ['John','James', np.nan, np.nan, np.nan,np.nan]})

for col in df.columns:
    df = df[~pd.isnull(df[col])]

Answer 4

回答by JacoSolari

df.dropna(subset=['columnName1', 'columnName2'])

Python pandas从一列字符串的数据选择中过滤掉nan

提问by ccsv

采纳答案by EdChum

回答by Gil Baggio

回答by Bashar Mohammad

回答by JacoSolari

相关推荐

最近更新

标签

Python pandas从一列字符串的数据选择中过滤掉nan

提问by ccsv

采纳答案by EdChum

回答by Gil Baggio

回答by Bashar Mohammad

回答by JacoSolari

相关推荐

Python 操作错误：数据库被锁定

Python 如何在 tkinter 中更新文本框“实时”？

mysql-python 安装错误：无法打开包含文件“config-win.h”

Python，从字符串中删除所有非字母字符

相关推荐

最近更新

标签