从 Python 数据帧的列中的每一行中删除前 x 个字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42349572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:36:03  来源:igfitidea点击:

Remove first x number of characters from each row in a column of a Python dataframe

pythonstringpandasdataframereplace

提问by d84_n1nj4

I have a Python dataframe with about 1,500 rows and 15 columns. With one specific column I would like to remove the first 3 characters of each row. As a simple example here is a dataframe:

我有一个包含大约 1,500 行和 15 列的 Python 数据框。对于一个特定的列,我想删除每行的前 3 个字符。作为一个简单的例子,这里是一个数据框:

import pandas as pd

d = {
    'Report Number':['8761234567', '8679876543','8994434555'],
    'Name'         :['George', 'Bill', 'Sally']
     }

d = pd.DataFrame(d)

I would like to remove the first three characters from each field in the Report Numbercolumn of dataframe d.

我想从Report Numberdataframe 列中的每个字段中删除前三个字符d

回答by EdChum

Use vectorised strmethods to slice each string entry

使用矢量化str方法对每个字符串条目进行切片

In [11]:
d['Report Number'] = d['Report Number'].str[3:]
d

Out[11]:
     Name Report Number
0  George       1234567
1    Bill       9876543
2   Sally       4434555

回答by jpp

It is worth noting Pandas "vectorised" strmethods are no more than Python-level loops.

值得注意的是 Pandas 的“矢量化”str方法只不过是 Python 级别的循环。

Assuming clean data, you will often find a list comprehension more efficient:

假设数据干净,您通常会发现列表理解更有效:

# Python 3.6.0, Pandas 0.19.2

d = pd.concat([d]*10000, ignore_index=True)

%timeit d['Report Number'].str[3:]           # 12.1 ms per loop
%timeit [i[3:] for i in d['Report Number']]  # 5.78 ms per loop

Note these aren't equivalent, since the list comprehension does not deal with null data and other edge cases. For these situations, you may prefer the Pandas solution.

请注意,这些并不等效,因为列表推导式不处理空数据和其他边缘情况。对于这些情况,您可能更喜欢 Pandas 解决方案。