pandas python:删除熊猫数据框中包含字符串的所有行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19860389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python: remove all rows in pandas dataframe that contain a string
提问by natsuki_2002
I've got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the 'gdp' column has a string at index 3, and 'cap' at index 1.
我有一个名为 data 的 Pandas 数据框,我想删除任何列中包含字符串的所有行。例如,下面我们看到 'gdp' 列在索引 3 处有一个字符串,在索引 1 处有一个 'cap'。
data =
y gdp cap
0 1 2 5
1 2 3 ab
2 8 7 2
3 3 bc 7
4 6 7 7
5 4 8 3
...
I've been trying to use something like this script because I will not know what is contained in exp_list ahead of time. Unfortunately, "data.var_name" throws out this error: 'DataFrame' object has no attribute 'var_name'. I also don't know what the strings will be ahead of time so is there anyway to generalize that as well?
我一直在尝试使用类似这个脚本的东西,因为我不会提前知道 exp_list 中包含什么。不幸的是,“data.var_name”抛出了这个错误:“DataFrame”对象没有属性“var_name”。我也不知道字符串会提前是什么,所以无论如何也可以概括一下吗?
exp_list = ['gdp', 'cap']
for var_name in exp_list:
data = data[data.var_name != 'ab']
采纳答案by Acorbe
You can apply a function that tests row-wise your DataFramefor the presence of strings, e.g., say that dfis your DataFrame
您可以应用一个函数来逐行测试您DataFrame是否存在字符串,例如,说这df是您的DataFrame
rows_with_strings = df.apply(
lambda row :
any([ isinstance(e, basestring) for e in row ])
, axis=1)
This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask
这将为您的 DataFrame 生成一个掩码,指示哪些行至少包含一个字符串。因此,您可以通过相反的掩码选择没有字符串的行
df_with_no_strings = df[~rows_with_strings]
.
.
Example:
例子:
a = [[1,2],['a',2], [3,4], [7,'d']]
df = pd.DataFrame(a,columns = ['a','b'])
df
a b
0 1 2
1 a 2
2 3 4
3 7 d
select = df.apply(lambda r : any([isinstance(e, basestring) for e in r ]),axis=1)
df[~select]
a b
0 1 2
2 3 4
回答by JoeCondron
You can take the transpose, call ```convert_objects``, which works columns-wise, and then compare the data types to get a boolean key like this:
您可以进行转置,调用 ```convert_objects``,它按列工作,然后比较数据类型以获得如下所示的布尔键:
df[df.T.convert_objects().dtypes != object]

