Python 大熊猫使用startswith从Dataframe中选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17957890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:34:50  来源:igfitidea点击:

pandas select from Dataframe using startswith

pythonnumpypandas

提问by dartdog

This works (using Pandas 12 dev)

这有效(使用 Pandas 12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS']

Then I realized I needed to select the field using "starts with" Since I was missing a bunch. So per the Pandas doc as near as I could follow I tried

然后我意识到我需要使用“开始于”来选择字段,因为我错过了一堆。所以根据 Pandas doc 尽可能接近我尝试过的

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

And got AttributeError: 'float' object has no attribute 'startswith'

并得到 AttributeError: 'float' object has no attribute 'startswith'

So I tried an alternate syntax with the same result

所以我尝试了一种具有相同结果的替代语法

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexingSection 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

参考http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing第 4 节:Series 的列表推导式和映射方法也可用于生成更复杂的标准:

What am I missing?

我错过了什么?

采纳答案by Andy Hayden

You can use the str.startswithDataFrame method to give more consistent results:

您可以使用str.startswithDataFrame 方法提供更一致的结果:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

并且布尔索引将工作得很好(我更喜欢使用loc,但没有它的工作原理相同):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

.

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

它看起来系列/列中至少有一个元素是浮点数,它没有startswith方法因此AttributeError,列表理解应该引发相同的错误......

回答by Vinoj John Hosan

To retrieve all the rows which startwithrequired string

检索以所需字符串开头的所有行

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

To retrieve all the rows which containsrequired string

检索包含所需字符串的所有行

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]

回答by AleAve81

You can use applyto easily apply any string matching function to your column elementwise.

您可以apply轻松地将任何字符串匹配函数应用于您的列元素。

table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]

this assuming that your "SUBDIVISION" column is of the correct type (string)

这假设您的“SUBDIVISION”列是正确的类型(字符串)

Edit: fixed missing parenthesis

编辑:修复缺少的括号

回答by Saurabh

Using startswith for a particular column value

对特定列值使用开始

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]