Python Pandas 在函数中处理数据帧

Question

提问by TristanMatthews

I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:

我有一个 DataFrame，我想将它传递给一个函数，从中获取一些信息，然后返回该信息。最初我设置我的代码如下：

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

def test_function(df):

    df['D'] = 0

    df.D = np.random.rand(len(df))

    grouped = df.groupby('A')
    df = grouped.first()
    df = df['D']

    return df


Ds = test_function(df)

print(df)
print(Ds)

Which returns:

返回：

    A  B  C         D
0   1  5  1  0.582319
1   1  5  1  0.269779
2   1  6  1  0.421593
3   1  7  1  0.797121
4   2  5  1  0.366410
5   2  6  1  0.486445
6   2  6  1  0.001217
7   3  7  1  0.262586
8   3  7  1  0.146543
9   4  6  1  0.985894
10  4  7  1  0.312070
11  4  7  1  0.498103
A
1    0.582319
2    0.366410
3    0.262586
4    0.985894
Name: D, dtype: float64

My thinking was along the lines of, I don't want to copy my large dataframe, so I will add a working column to it, and then just return the information I want with out affecting the original dataframe. This of course doesn't work, because I didn't copy the dataframe so adding a column is adding a column. Currently I'm doing something like:

我的想法是，我不想复制我的大数据框，所以我会向它添加一个工作列，然后只返回我想要的信息而不影响原始数据框。这当然不起作用，因为我没有复制数据框，所以添加一列就是添加一列。目前我正在做类似的事情：

add column
results = Derive information
delete column
return results

which feels a bit kludgy to me, but I can't think of a better way to do it without copying the dataframe. Any suggestions?

这对我来说有点笨拙，但如果不复制数据帧，我想不出更好的方法。有什么建议？

Answer 1

采纳答案by unutbu

If you do not want to add a column to your original DataFrame, you could create an independent Seriesand apply the groupbymethod to the Seriesinstead:

如果您不想向原始 DataFrame 添加一列，您可以创建一个独立的Series并将该groupby方法应用于Series：

def test_function(df):
    ser = pd.Series(np.random.rand(len(df)))
    grouped = ser.groupby(df['A'])
    return grouped.first()

Ds = test_function(df)

yields

产量

A
1    0.017537
2    0.392849
3    0.451406
4    0.234016
dtype: float64

Thus, test_functiondoes not modify dfat all. Notice that ser.groupbycan be passed a sequence of values (such as df['A']) by which to group instead of the just the name of a column.

因此，test_function根本不修改df。请注意，ser.groupby可以传递一个值序列（例如df['A']）来分组，而不仅仅是列的名称。

Python Pandas 在函数中处理数据帧

提问by TristanMatthews

采纳答案by unutbu

相关推荐

最近更新

标签

Python Pandas 在函数中处理数据帧

提问by TristanMatthews

采纳答案by unutbu

相关推荐

在多索引 Pandas DataFrame 上选择一列

pandas 熊猫日期时间列到序数

pandas 您如何从雅虎财经中提取每周历史数据？

基于标签的索引 Pandas (.loc)

相关推荐

最近更新

标签