Pandas:从每一行获取字符串的第二个字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27020707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:40:59  来源:igfitidea点击:

Pandas: get second character of the string, from every row

pythonstringpandasdataframecharacter

提问by jjjayn

I've a array of data in Pandas and I'm trying to print second character of every string in col1. I can't figure out how to do it. I can easily print the second character of the each string individually, for example:

我在 Pandas 中有一组数据,我正在尝试打印 col1 中每个字符串的第二个字符。我不知道该怎么做。我可以轻松地单独打印每个字符串的第二个字符,例如:

array.col1[0][1]

However I'd like to print the second character from every row, so there would be a "list" of second characters.

但是我想打印每一行的第二个字符,所以会有一个第二个字符的“列表”。

I've tried

我试过了

array.col1[0:][1]

but that just returns the second line as a whole of col1.

但这只是将第二行作为整个 col1 返回。

Any advice?

有什么建议吗?

回答by Alex Riley

You can use strto access the string methods for the column/Series and then slice the strings as normal:

您可以使用str访问列/系列的字符串方法,然后像往常一样对字符串进行切片:

>>> df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])
>>> df
  col1
0  foo
1  bar
2  baz

>>> df.col1.str[1]
0    o
1    a
2    a

This strattribute also gives you access variety of very useful vectorised string methods, many of which are instantly recognisable from Python's own assortment of built-in string methods (split, replace, etc.).

str属性也为您提供了访问各种非常有用的向量化字符串方法,其中有许多是瞬间从Python的内置字符串方法(分类识别splitreplace等等)。

回答by jpp

As of Pandas 0.23.0, if your data is clean, you will find Pandas "vectorised" string methods via pd.Series.strwill generally underperformsimple iteration via a list comprehension or use of map.

由于大Pandas0.23.0,如果你的数据是干净的,你会发现大Pandas通过“矢量化”字符串方法pd.Series.str一般会弱于大盘通过列表理解或使用简单的迭代map

For example:

例如:

from operator import itemgetter

df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])

df = pd.concat([df]*100000, ignore_index=True)

%timeit pd.Series([i[1] for i in df['col1']])            # 33.7 ms
%timeit pd.Series(list(map(itemgetter(1), df['col1'])))  # 42.2 ms
%timeit df['col1'].str[1]                                # 214 ms

A special case is when you have a large number of repeated strings, in which case you can benefit from converting your series to a categorical:

一种特殊情况是当您有大量重复的字符串时,在这种情况下,您可以从将系列转换为categorical 中受益:

df['col1'] = df['col1'].astype('category')

%timeit df['col1'].str[1]  # 4.9 ms