Python Pandas 在函数中处理数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20863323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas working with dataframes in functions
提问by TristanMatthews
I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:
我有一个 DataFrame,我想将它传递给一个函数,从中获取一些信息,然后返回该信息。最初我设置我的代码如下:
df = pd.DataFrame( {
'A': [1,1,1,1,2,2,2,3,3,4,4,4],
'B': [5,5,6,7,5,6,6,7,7,6,7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1]
} );
def test_function(df):
df['D'] = 0
df.D = np.random.rand(len(df))
grouped = df.groupby('A')
df = grouped.first()
df = df['D']
return df
Ds = test_function(df)
print(df)
print(Ds)
Which returns:
返回:
A B C D
0 1 5 1 0.582319
1 1 5 1 0.269779
2 1 6 1 0.421593
3 1 7 1 0.797121
4 2 5 1 0.366410
5 2 6 1 0.486445
6 2 6 1 0.001217
7 3 7 1 0.262586
8 3 7 1 0.146543
9 4 6 1 0.985894
10 4 7 1 0.312070
11 4 7 1 0.498103
A
1 0.582319
2 0.366410
3 0.262586
4 0.985894
Name: D, dtype: float64
My thinking was along the lines of, I don't want to copy my large dataframe, so I will add a working column to it, and then just return the information I want with out affecting the original dataframe. This of course doesn't work, because I didn't copy the dataframe so adding a column is adding a column. Currently I'm doing something like:
我的想法是,我不想复制我的大数据框,所以我会向它添加一个工作列,然后只返回我想要的信息而不影响原始数据框。这当然不起作用,因为我没有复制数据框,所以添加一列就是添加一列。目前我正在做类似的事情:
add column
results = Derive information
delete column
return results
which feels a bit kludgy to me, but I can't think of a better way to do it without copying the dataframe. Any suggestions?
这对我来说有点笨拙,但如果不复制数据帧,我想不出更好的方法。有什么建议?
采纳答案by unutbu
If you do not want to add a column to your original DataFrame, you could create an independent Seriesand apply the groupbymethod to the Seriesinstead:
如果您不想向原始 DataFrame 添加一列,您可以创建一个独立的Series并将该groupby方法应用于Series:
def test_function(df):
ser = pd.Series(np.random.rand(len(df)))
grouped = ser.groupby(df['A'])
return grouped.first()
Ds = test_function(df)
yields
产量
A
1 0.017537
2 0.392849
3 0.451406
4 0.234016
dtype: float64
Thus, test_functiondoes not modify dfat all. Notice that ser.groupbycan be passed a sequence of values (such as df['A']) by which to group instead of the just the name of a column.
因此,test_function根本不修改df。请注意,ser.groupby可以传递一个值序列(例如df['A'])来分组,而不仅仅是列的名称。

