pandas 根据对象的类型（即 str ）从 DataFrame 中选择行

Question

提问by wolframalpha

So there's a DataFrame say:

所以有一个 DataFrame 说：

>>> df = pd.DataFrame({
...                 'A':[1,2,'Three',4],
...                 'B':[1,'Two',3,4]})
>>> df
       A    B
0      1    1
1      2  Two
2  Three    3
3      4    4

I want to select the rows whose datatype of particular row of a particular column is of type str.

我想选择特定列的特定行的数据类型为 type 的行str。

For example I want to select the row where typeof data in the column Ais a str. so it should print something like:

例如，我想选择列中type数据A为str. 所以它应该打印如下内容：

   A      B
2  Three  3

Whose intuitive code would be like:

其直观的代码如下：

df[type(df.A) == str]

Which obviously doesn't works!

这显然不起作用！

Thanks please help!

谢谢请帮忙！

Answer 1

回答by DrTRD

This works:

这有效：

df[df['A'].apply(lambda x: isinstance(x, str))]

Answer 2

回答by Ami Tavory

You can do something similarto what you're asking with

你可以做一些类似于你要求的事情

In [14]: df[pd.to_numeric(df.A, errors='coerce').isnull()]
Out[14]: 
       A  B
2  Three  3

Why only similar? Because Pandas stores things in homogeneous columns (all entries in a column are of the same type). Even though you constructed the DataFrame from heterogeneous types, they are all made into columns each of the lowest common denominator:

为什么只有相似？因为 Pandas 将事物存储在同构列中（列中的所有条目都属于同一类型）。即使您从异构类型构建了 DataFrame，它们也都被分成了每个最小公分母的列：

In [16]: df.A.dtype
Out[16]: dtype('O')

Consequently, you can't ask which rows are of what type - they will all be of the same type. What you cando is to try to convert the entries to numbers, and check where the conversion failed (this is what the code above does).

因此，您不能询问哪些行属于哪种类型 - 它们都属于同一类型。您可以做的是尝试将条目转换为数字，并检查转换失败的位置（这就是上面的代码所做的）。

Answer 3

回答by jpp

It's generally a bad idea to use a series to hold mixed numeric and non-numeric types. This will cause your series to have dtype object, which is nothing more than a sequence of pointers. Much like listand, indeed, many operations on such series can be more efficiently processed with list.

使用系列来保存混合数字和非数字类型通常是一个坏主意。这将导致您的系列具有 dtype object，它只不过是一个指针序列。很像list，事实上，可以更有效地处理此类系列的许多操作list。

With this disclaimer, you can use Boolean indexing via a list comprehension:

有了这个免责声明，您可以通过列表理解使用布尔索引：

res = df[[isinstance(value, str) for value in df['A']]]

print(res)

       A  B
2  Three  3

The equivalent is possible with pd.Series.apply, but this is no more than a thinly veiled loop and may be slower than the list comprehension:

可以使用等效pd.Series.apply，但这只不过是一个隐蔽的循环，并且可能比列表理解慢：

res = df[df['A'].apply(lambda x: isinstance(x, str))]

If you are certain all non-numeric values must be strings, then you can convert to numeric and look for nulls, i.e. values that cannot be converted:

如果您确定所有非数字值都必须是字符串，那么您可以转换为数字并查找空值，即无法转换的值：

res = df[pd.to_numeric(df['A'], errors='coerce').isnull()]

pandas 根据对象的类型（即 str ）从 DataFrame 中选择行

提问by wolframalpha

回答by DrTRD

回答by Ami Tavory

回答by jpp

相关推荐

最近更新

标签

pandas 根据对象的类型（即 str ）从 DataFrame 中选择行

提问by wolframalpha

回答by DrTRD

回答by Ami Tavory

回答by jpp

相关推荐

在 Pandas Dataframe 中转换 HTML 表格

pandas Python - 如果两列是 NaN 则删除行

pandas 使用 matplotlib 条形图设置列的顺序

pandas 将 Python 列中每个单词的首字母大写

相关推荐

最近更新

标签