Python Pandas 在函数中处理数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20863323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:30:11  来源:igfitidea点击:

Python Pandas working with dataframes in functions

pythonpandas

提问by TristanMatthews

I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:

我有一个 DataFrame,我想将它传递给一个函数,从中获取一些信息,然后返回该信息。最初我设置我的代码如下:

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

def test_function(df):

    df['D'] = 0

    df.D = np.random.rand(len(df))

    grouped = df.groupby('A')
    df = grouped.first()
    df = df['D']

    return df


Ds = test_function(df)

print(df)
print(Ds)

Which returns:

返回:

    A  B  C         D
0   1  5  1  0.582319
1   1  5  1  0.269779
2   1  6  1  0.421593
3   1  7  1  0.797121
4   2  5  1  0.366410
5   2  6  1  0.486445
6   2  6  1  0.001217
7   3  7  1  0.262586
8   3  7  1  0.146543
9   4  6  1  0.985894
10  4  7  1  0.312070
11  4  7  1  0.498103
A
1    0.582319
2    0.366410
3    0.262586
4    0.985894
Name: D, dtype: float64

My thinking was along the lines of, I don't want to copy my large dataframe, so I will add a working column to it, and then just return the information I want with out affecting the original dataframe. This of course doesn't work, because I didn't copy the dataframe so adding a column is adding a column. Currently I'm doing something like:

我的想法是,我不想复制我的大数据框,所以我会向它添加一个工作列,然后只返回我想要的信息而不影响原始数据框。这当然不起作用,因为我没有复制数据框,所以添加一列就是添加一列。目前我正在做类似的事情:

add column
results = Derive information
delete column
return results

which feels a bit kludgy to me, but I can't think of a better way to do it without copying the dataframe. Any suggestions?

这对我来说有点笨拙,但如果不复制数据帧,我想不出更好的方法。有什么建议?

采纳答案by unutbu

If you do not want to add a column to your original DataFrame, you could create an independent Seriesand apply the groupbymethod to the Seriesinstead:

如果您不想向原始 DataFrame 添加一列,您可以创建一个独立的Series并将该groupby方法应用于Series

def test_function(df):
    ser = pd.Series(np.random.rand(len(df)))
    grouped = ser.groupby(df['A'])
    return grouped.first()

Ds = test_function(df)

yields

产量

A
1    0.017537
2    0.392849
3    0.451406
4    0.234016
dtype: float64

Thus, test_functiondoes not modify dfat all. Notice that ser.groupbycan be passed a sequence of values (such as df['A']) by which to group instead of the just the name of a column.

因此,test_function根本不修改df。请注意,ser.groupby可以传递一个值序列(例如df['A'])来分组,而不仅仅是列的名称。