pandas align() 函数:说明性示例

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51645195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:52:23  来源:igfitidea点击:

pandas align() function : illustrative example

pythonpandas

提问by ashunigion

I came across this line of code

我遇到了这行代码

app_train_poly, app_test_poly = app_train_poly.align(app_test_poly, join = 'inner', axis = 1)

here app_train_polyand app_test_polyare the pandas dataframe.

这里app_train_polyapp_test_poly是Pandas数据

I know that with align()you are able to perform some sort of combining of the two dataframes but I am not able to visualize how does it actually work.

我知道使用align()可以对两个数据帧进行某种组合,但我无法想象它实际上是如何工作的。

I searched the documentation but could not find any illustrative example.

我搜索了文档,但找不到任何说明性示例。

回答by Andrew Guy

You are on the right track, except that DataFrame.aligndoesn't combine two dataframes, rather it alignsthem so that the two dataframes have the same row and/or column configuration. Let's try an example:

您走在正确的轨道上,除了DataFrame.align不会组合两个数据框,而是它们对齐,以便两个数据框具有相同的行和/或列配置。让我们尝试一个例子:

Initialising two dataframes with some descriptive column names and toy data:

用一些描述性的列名和玩具数据初始化两个数据框:

df1 = pd.DataFrame([[1,2,3,4], [6,7,8,9]], columns=['D', 'B', 'E', 'A'], index=[1,2])
df2 = pd.DataFrame([[10,20,30,40], [60,70,80,90], [600,700,800,900]], columns=['A', 'B', 'C', 'D'], index=[2,3,4])

Now, let's view these data frames by themselves:

现在,让我们自己查看这些数据框:

print(df1)
   D  B  E  A
1  1  2  3  4
2  6  7  8  9
   D  B  E  A
1  1  2  3  4
2  6  7  8  9
print(df2)
     A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900
     A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900

Let's align these two dataframes, aligning by columns (axis=1), and performing an outer join on column labels (join='outer'):

让我们对齐这两个数据框,按列 ( axis=1)对齐,并对列标签 ( join='outer')执行外连接:

a1, a2 = df1.align(df2, join='outer', axis=1)
print(a1)
print(a2)
   A  B   C  D  E
1  4  2 NaN  1  3
2  9  7 NaN  6  8
     A    B    C    D   E
2   10   20   30   40 NaN
3   60   70   80   90 NaN
4  600  700  800  900 NaN
   A  B   C  D  E
1  4  2 NaN  1  3
2  9  7 NaN  6  8
     A    B    C    D   E
2   10   20   30   40 NaN
3   60   70   80   90 NaN
4  600  700  800  900 NaN

A few things to notice here:

这里有几点需要注意:

  • The columns in df1have been rearranged so they align with the columns in df2.
  • There is a column labelled 'C'that has been added to df1, and a column labelled 'E'that has been added to df2. These columns have been filled with NaN. This is because we performed an outer join on the column labels.
  • None of the values inside the DataFrames have been altered.
  • Note that the row labels are not aligned; df2has rows 3and 4, whereas df1does not. This is because we requested alignment on columns (axis=1).
  • 中的列df1已重新排列,因此它们与 中的列对齐df2
  • 有一个标记为'C'已添加到df1的列和一个标记为'E'已添加到的列df2。这些列已填充NaN。这是因为我们对列标签执行了外连接。
  • DataFrames 中的任何值都没有被改变。
  • 注意行标签没有对齐;df2有行34,而df1没有。这是因为我们要求对列 ( axis=1)进行对齐。

What happens if we align on both rows and columns, but change the joinparameter to 'right'?

如果我们在行和列上对齐,但将join参数更改为,会发生'right'什么?

a1, a2 = df1.align(df2, join='right', axis=None)
print(a1)
print(a2)
     A    B   C    D
2  9.0  7.0 NaN  6.0
3  NaN  NaN NaN  NaN
4  NaN  NaN NaN  NaN
     A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900
     A    B   C    D
2  9.0  7.0 NaN  6.0
3  NaN  NaN NaN  NaN
4  NaN  NaN NaN  NaN
     A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900

Note that:

注意:

  • Only the columns and rows that are found in the "right" dataframe (df2) are retained. Column 'E'is no longer present. This is because we made a right join on both the column and row labels.
  • Rows with labels 2and 3have been added to df1, filled with Nan. This is because we requested alignment on both rows and columns (axis=None).
  • Row labels are now aligned as well as column labels.
  • Again, note that none of the actual values within the dataframes have been altered.
  • df2保留在“正确”数据框 ( ) 中找到的列和行。列'E'不再存在。这是因为我们对列和行标签进行了右连接。
  • 带标签的行23已添加到df1,充满Nan。这是因为我们要求对齐行和列 ( axis=None)。
  • 行标签现在和列标签一样对齐。
  • 再次注意,数据帧中的实际值均未更改。

Finally, let's have a look at the code in the question, with join='inner'and axis=1:

最后,让我们看一下问题中的代码,使用join='inner'axis=1

a1, a2 = df1.align(df2, join='inner', axis=1)
print(a1)
print(a2)
   D  B  A
1  1  2  4
2  6  7  9
     D    B    A
2   40   20   10
3   90   70   60
4  900  700  600
   D  B  A
1  1  2  4
2  6  7  9
     D    B    A
2   40   20   10
3   90   70   60
4  900  700  600
  • Only column labels are aligned (axis=1).
  • Only column labels that are present in both df1and df2are retained (join='inner').
  • 仅列标签对齐 ( axis=1)。
  • 仅存在于两者中df1df2保留的列标签( join='inner')。

In summary, use DataFrame.align()when you want to make sure the arrangement of rows and/or columns is the same between two dataframes, without altering any of the data contained within the two dataframes.

总之,DataFrame.align()当您想要确保两个数据帧之间的行和/或列的排列相同时使用,而不改变两个数据帧中包含的任何数据。