Python 使用 Pandas 为字符串列中的每个值添加字符串前缀

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20025882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:18:11  来源:igfitidea点击:

add a string prefix to each value in a string column using Pandas

pythonstringpandasdataframe

提问by TheChymera

I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly). I already figured out how to kind-of do this and I am currently using:

我想在 Pandas 数据帧的所述列中的每个值的开头附加一个字符串(优雅)。我已经想出了如何做到这一点,我目前正在使用:

df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']

This seems one hell of an inelegant thing to do - do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?

这似乎是一件非常不雅的事情 - 您知道其他任何方式吗(这可能还会将字符添加到该列为 0 或 NaN 的行中)?

In case this is yet unclear, I would like to turn:

如果这还不清楚,我想转:

    col 
1     a
2     0

into:

进入:

       col 
1     stra
2     str0

采纳答案by Roman Pekar

df['col'] = 'str' + df['col'].astype(str)

Example:

例子:

>>> df = pd.DataFrame({'col':['a',0]})
>>> df
  col
0   a
1   0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
    col
0  stra
1  str0

回答by Cleb

As an alternative, you can also use an applycombined with format(or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:

作为替代方案,您还可以使用applyformat(或更好地与 f-strings)结合使用,如果一个人还想添加后缀或操纵元素本身,我发现它的可读性稍强:

df = pd.DataFrame({'col':['a', 0]})

df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))

which also yields the desired output:

这也产生了所需的输出:

    col
0  stra
1  str0

If you are using Python 3.6+, you can also use f-strings:

如果您使用的是 Python 3.6+,您还可以使用 f-strings:

df['col'] = df['col'].apply(lambda x: f"str{x}")

yielding the same output.

产生相同的输出。

The f-string version is almost as fast as @RomanPekar's solution (python 3.6.4):

f-string 版本几乎与@RomanPekar 的解决方案(python 3.6.4)一样快:

df = pd.DataFrame({'col':['a', 0]*200000})

%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using format, however, is indeed far slower:

format但是,使用确实要慢得多:

%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

回答by Vasyl Vaskivskyi

If you load you table file with dtype=str
or convert column type to string df['a'] = df['a'].astype(str)
then you can use such approach:

如果您加载表文件dtype=str
或将列类型转换为字符串,df['a'] = df['a'].astype(str)
则可以使用这种方法:

df['a']= 'col' + df['a'].str[:]

This approach allows prepend, append, and subset string of df.
Works on Pandas v0.23.4, v0.24.1. Don't know about earlier versions.

这种方法允许在df.
适用于 Pandas v0.23.4、v0.24.1。不知道早期版本。

回答by Lukas

Another solution with .loc:

.loc 的另一个解决方案:

df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)

This is not as quick as solutions above (>1ms per loop slower) but may be useful in case you need conditional change, like:

这不像上面的解决方案那么快(每个循环慢> 1ms),但在您需要条件更改的情况下可能很有用,例如:

mask = (df['col'] == 0)
df.loc[mask, 'col'] = 'string' + df['col'].astype(str)

回答by Boxtell

You can use pandas.Series.map :

您可以使用 pandas.Series.map :

df['col'].map('str{}'.format)

It will apply the word "str" before all your values.

它将在所有值之前应用“str”一词。