pandas 根据对象的类型(即 str )从 DataFrame 中选择行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39275533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select row from a DataFrame based on the type of the object(i.e. str)
提问by wolframalpha
So there's a DataFrame say:
所以有一个 DataFrame 说:
>>> df = pd.DataFrame({
... 'A':[1,2,'Three',4],
... 'B':[1,'Two',3,4]})
>>> df
A B
0 1 1
1 2 Two
2 Three 3
3 4 4
I want to select the rows whose datatype of particular row of a particular column is of type str
.
我想选择特定列的特定行的数据类型为 type 的行str
。
For example I want to select the row where type
of data in the column A
is a str
.
so it should print something like:
例如,我想选择列中type
数据A
为str
. 所以它应该打印如下内容:
A B
2 Three 3
Whose intuitive code would be like:
其直观的代码如下:
df[type(df.A) == str]
Which obviously doesn't works!
这显然不起作用!
Thanks please help!
谢谢请帮忙!
回答by DrTRD
This works:
这有效:
df[df['A'].apply(lambda x: isinstance(x, str))]
回答by Ami Tavory
You can do something similarto what you're asking with
你可以做一些类似于你要求的事情
In [14]: df[pd.to_numeric(df.A, errors='coerce').isnull()]
Out[14]:
A B
2 Three 3
Why only similar? Because Pandas stores things in homogeneous columns (all entries in a column are of the same type). Even though you constructed the DataFrame from heterogeneous types, they are all made into columns each of the lowest common denominator:
为什么只有相似?因为 Pandas 将事物存储在同构列中(列中的所有条目都属于同一类型)。即使您从异构类型构建了 DataFrame,它们也都被分成了每个最小公分母的列:
In [16]: df.A.dtype
Out[16]: dtype('O')
Consequently, you can't ask which rows are of what type - they will all be of the same type. What you cando is to try to convert the entries to numbers, and check where the conversion failed (this is what the code above does).
因此,您不能询问哪些行属于哪种类型 - 它们都属于同一类型。您可以做的是尝试将条目转换为数字,并检查转换失败的位置(这就是上面的代码所做的)。
回答by jpp
It's generally a bad idea to use a series to hold mixed numeric and non-numeric types. This will cause your series to have dtype object
, which is nothing more than a sequence of pointers. Much like list
and, indeed, many operations on such series can be more efficiently processed with list
.
使用系列来保存混合数字和非数字类型通常是一个坏主意。这将导致您的系列具有 dtype object
,它只不过是一个指针序列。很像list
,事实上,可以更有效地处理此类系列的许多操作list
。
With this disclaimer, you can use Boolean indexing via a list comprehension:
有了这个免责声明,您可以通过列表理解使用布尔索引:
res = df[[isinstance(value, str) for value in df['A']]]
print(res)
A B
2 Three 3
The equivalent is possible with pd.Series.apply
, but this is no more than a thinly veiled loop and may be slower than the list comprehension:
可以使用 等效pd.Series.apply
,但这只不过是一个隐蔽的循环,并且可能比列表理解慢:
res = df[df['A'].apply(lambda x: isinstance(x, str))]
If you are certain all non-numeric values must be strings, then you can convert to numeric and look for nulls, i.e. values that cannot be converted:
如果您确定所有非数字值都必须是字符串,那么您可以转换为数字并查找空值,即无法转换的值:
res = df[pd.to_numeric(df['A'], errors='coerce').isnull()]