pandas 数据框:从整个数据框的所有单元格值中添加和删除前缀/后缀
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41111955/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe : add & remove prefix/suffix from all cell values of entire dataframe
提问by murphy1310
To add a prefix/suffix to a dataframe, I usually do the following..
要向数据帧添加前缀/后缀,我通常会执行以下操作..
For instance, to add a suffix '@'
,
例如,要添加后缀'@'
,
df = df.astype(str) + '@'
This has basically appended a '@'
to all cell values.
这基本上已将 a 附加'@'
到所有单元格值。
I would like to know how to remove this suffix. Is there a method available with the pandas.DataFrame class directly that removes a particular prefix/suffix character from the entire DataFrame ?
我想知道如何删除这个后缀。pandas.DataFrame 类是否有直接从整个 DataFrame 中删除特定前缀/后缀字符的方法?
I've tried iterating through the rows (as series) while using rstrip('@')
as follows:
我尝试使用以下方法遍历行(作为系列)rstrip('@')
:
for index in range(df.shape[0]):
row = df.iloc[index]
row = row.str.rstrip('@')
Now, in order to make dataframe out of this series,
现在,为了从这个系列中制作数据框,
new_df = pd.DataFrame(columns=list(df))
new_df = new_df.append(row)
However, this doesn't work. Gives empty dataframe.
但是,这不起作用。给出空数据框。
Is there something really basic that I am missing?
有什么我缺少的非常基本的东西吗?
采纳答案by AlexG
You could use applymap to apply your string method to each element:
您可以使用 applymap 将字符串方法应用于每个元素:
df = df.applymap(lambda x: str(x).rstrip('@'))
Note: I wouldn't expect this to be as fast as the vectorized approach: pd.Series.str.rstrip
i.e. transforming each column separately
注意:我不希望这与矢量化方法一样快:pd.Series.str.rstrip
即分别转换每一列
回答by juanpa.arrivillaga
You can use apply
and the str.strip
method of pd.Series:
你可以使用pd.Seriesapply
的str.strip
方法:
In [13]: df
Out[13]:
a b c
0 dog quick the
1 lazy lazy fox
2 brown quick dog
3 quick the over
4 brown over lazy
5 fox brown quick
6 quick fox the
7 dog jumped the
8 lazy brown the
9 dog lazy the
In [14]: df = df + "@"
In [15]: df
Out[15]:
a b c
0 dog@ quick@ the@
1 lazy@ lazy@ fox@
2 brown@ quick@ dog@
3 quick@ the@ over@
4 brown@ over@ lazy@
5 fox@ brown@ quick@
6 quick@ fox@ the@
7 dog@ jumped@ the@
8 lazy@ brown@ the@
9 dog@ lazy@ the@
In [16]: df = df.apply(lambda S:S.str.strip('@'))
In [17]: df
Out[17]:
a b c
0 dog quick the
1 lazy lazy fox
2 brown quick dog
3 quick the over
4 brown over lazy
5 fox brown quick
6 quick fox the
7 dog jumped the
8 lazy brown the
9 dog lazy the
Note, your approach doesn't work because when you do the following assignment in your for-loop:
请注意,您的方法不起作用,因为当您在 for 循环中执行以下分配时:
row = row.str.rstrip('@')
This merely assigns the result of row.str.strip
to the name row
without mutating the DataFrame
. This is the same behavior for all python objects and simple name assignment:
这只是将 的结果分配给row.str.strip
名称row
而不改变DataFrame
. 这是所有 python 对象和简单名称分配的相同行为:
In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]]
In [19]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [20]: for row in rows:
...: row = ['look','at','me']
...:
In [21]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
To actually change the underlying data structure you need to use a mutator method:
要实际更改底层数据结构,您需要使用 mutator 方法:
In [22]: rows
Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [23]: for row in rows:
...: row.append("LOOKATME")
...:
In [24]: rows
Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']]
Note that slice-assignment is just syntactic sugar for a mutator method:
请注意,切片分配只是 mutator 方法的语法糖:
In [26]: rows
Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [27]: for row in rows:
...: row[:] = ['look','at','me']
...:
...:
In [28]: rows
Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']]
This is analogous to pandas
loc
or iloc
based assignment.
这类似于pandas
loc
或iloc
基于分配。
回答by SummerEla
You could make this real easy and just use pandas.DataFrame.replace()method to replace all "@" with a "":
您可以使这变得非常简单,只需使用pandas.DataFrame.replace()方法将所有“@”替换为“”:
df.replace("@", "")
If you are worried about "@" being replaced not just at the end of your values, you could use regex:
如果您担心“@”不仅在值的末尾被替换,您可以使用正则表达式:
df.replace("@$", "", regex=True)