pandas 使用另一列值的 len() 添加 DataFrame 列

Question

提问by halycos

I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.

我在尝试获取另一列中字符串值的字符计数列时遇到问题，但还没有想出如何有效地做到这一点。

for index in range(len(df)):
    df['char_length'][index] = len(df['string'][index]))

This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like

这显然涉及首先创建一列空值，然后重写它，并且在我的数据集上需要很长时间。那么获得类似东西的最有效方法是什么

'string'     'char_length'
abcd          4
abcde         5

I've checked around quite a bit, but I haven't been able to figure it out.

我已经检查了很多，但我一直无法弄清楚。

Answer 1

回答by Alex Riley

Pandas has a vectorised string methodfor this: str.len(). To create the new column you can write:

大Pandas有一个向量化字符串的方法为这样的：str.len()。要创建新列，您可以编写：

df['char_length'] = df['string'].str.len()

For example:

例如：

>>> df
  string
0   abcd
1  abcde

>>> df['char_length'] = df['string'].str.len()
>>> df
  string  char_length
0   abcd            4
1  abcde            5

This should be considerably faster than looping over the DataFrame with a Python forloop.

这应该比使用 Pythonfor循环遍历 DataFrame 快得多。

Many other familiar string methods from Python have been introduced to Pandas. For example, lower(for converting to lowercase letters), countfor counting occurrences of a particular substring, and replacefor swapping one substring with another.

许多其他熟悉的 Python 字符串方法已被引入 Pandas。例如，lower（用于转换为小写字母），count用于计算特定子字符串的出现次数，以及replace用于将一个子字符串与另一个交换。

Answer 2

回答by Zero

Here's one way to do it.

这是一种方法。

In [3]: df
Out[3]:
  string
0   abcd
1  abcde

In [4]: df['len'] = df['string'].str.len()

In [5]: df
Out[5]:
  string  len
0   abcd    4
1  abcde    5

pandas 使用另一列值的 len() 添加 DataFrame 列

提问by halycos

回答by Alex Riley

回答by Zero

相关推荐

最近更新

标签

pandas 使用另一列值的 len() 添加 DataFrame 列

提问by halycos

回答by Alex Riley

回答by Zero

相关推荐

Pandas 条形图中的自定义图例（matplotlib）

迭代组（Python pandas 数据框）

来自 numpy 或 pandas 邻接矩阵的 igraph 图

pandas 合并多个具有非唯一索引的数据帧

相关推荐

最近更新

标签