pandas python：删除熊猫数据框中包含字符串的所有行

Question

提问by natsuki_2002

I've got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the 'gdp' column has a string at index 3, and 'cap' at index 1.

我有一个名为 data 的 Pandas 数据框，我想删除任何列中包含字符串的所有行。例如，下面我们看到 'gdp' 列在索引 3 处有一个字符串，在索引 1 处有一个 'cap'。

data =

    y  gdp  cap
0   1    2    5
1   2    3    ab
2   8    7    2
3   3    bc   7
4   6    7    7
5   4    8    3
...

I've been trying to use something like this script because I will not know what is contained in exp_list ahead of time. Unfortunately, "data.var_name" throws out this error: 'DataFrame' object has no attribute 'var_name'. I also don't know what the strings will be ahead of time so is there anyway to generalize that as well?

我一直在尝试使用类似这个脚本的东西，因为我不会提前知道 exp_list 中包含什么。不幸的是，“data.var_name”抛出了这个错误：“DataFrame”对象没有属性“var_name”。我也不知道字符串会提前是什么，所以无论如何也可以概括一下吗？

exp_list = ['gdp', 'cap']

for var_name in exp_list:
    data = data[data.var_name != 'ab']

Answer 1

采纳答案by Acorbe

You can apply a function that tests row-wise your DataFramefor the presence of strings, e.g., say that dfis your DataFrame

您可以应用一个函数来逐行测试您DataFrame是否存在字符串，例如，说这df是您的DataFrame

 rows_with_strings  = df.apply(
       lambda row : 
          any([ isinstance(e, basestring) for e in row ])
       , axis=1)

This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask

这将为您的 DataFrame 生成一个掩码，指示哪些行至少包含一个字符串。因此，您可以通过相反的掩码选择没有字符串的行

 df_with_no_strings = df[~rows_with_strings]

.

Example:

例子：

 a = [[1,2],['a',2], [3,4], [7,'d']]
 df = pd.DataFrame(a,columns = ['a','b'])


 df 
   a  b
0  1  2
1  a  2
2  3  4
3  7  d

select  = df.apply(lambda r : any([isinstance(e, basestring) for e in r  ]),axis=1) 

df[~select]                                                                                                                                

    a  b
 0  1  2
 2  3  4

Answer 2

回答by JoeCondron

You can take the transpose, call ```convert_objects``, which works columns-wise, and then compare the data types to get a boolean key like this:

您可以进行转置，调用 ```convert_objects``，它按列工作，然后比较数据类型以获得如下所示的布尔键：

df[df.T.convert_objects().dtypes != object]

pandas python：删除熊猫数据框中包含字符串的所有行

提问by natsuki_2002

采纳答案by Acorbe

回答by JoeCondron

相关推荐

最近更新

标签

pandas python：删除熊猫数据框中包含字符串的所有行

提问by natsuki_2002

采纳答案by Acorbe

回答by JoeCondron

相关推荐

在不删除行的情况下过滤 Pandas DataFrame

Pandas Dataframe 添加标头而不替换当前标头

如何在 Pandas 数据帧的列中存储 numpy 数组？

Python Pandas 系列日期时间到自 Epoch 以来的秒数

相关推荐

最近更新

标签