pandas align() 函数:说明性示例
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51645195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas align() function : illustrative example
提问by ashunigion
I came across this line of code
我遇到了这行代码
app_train_poly, app_test_poly = app_train_poly.align(app_test_poly, join = 'inner', axis = 1)
here app_train_polyand app_test_polyare the pandas dataframe.
这里app_train_poly和app_test_poly是Pandas数据框。
I know that with align()you are able to perform some sort of combining of the two dataframes but I am not able to visualize how does it actually work.
我知道使用align()可以对两个数据帧进行某种组合,但我无法想象它实际上是如何工作的。
I searched the documentation but could not find any illustrative example.
我搜索了文档,但找不到任何说明性示例。
回答by Andrew Guy
You are on the right track, except that DataFrame.align
doesn't combine two dataframes, rather it alignsthem so that the two dataframes have the same row and/or column configuration. Let's try an example:
您走在正确的轨道上,除了DataFrame.align
不会组合两个数据框,而是将它们对齐,以便两个数据框具有相同的行和/或列配置。让我们尝试一个例子:
Initialising two dataframes with some descriptive column names and toy data:
用一些描述性的列名和玩具数据初始化两个数据框:
df1 = pd.DataFrame([[1,2,3,4], [6,7,8,9]], columns=['D', 'B', 'E', 'A'], index=[1,2])
df2 = pd.DataFrame([[10,20,30,40], [60,70,80,90], [600,700,800,900]], columns=['A', 'B', 'C', 'D'], index=[2,3,4])
Now, let's view these data frames by themselves:
现在,让我们自己查看这些数据框:
print(df1)
D B E A 1 1 2 3 4 2 6 7 8 9
D B E A 1 1 2 3 4 2 6 7 8 9
print(df2)
A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
Let's align these two dataframes, aligning by columns (axis=1
), and performing an outer join on column labels (join='outer'
):
让我们对齐这两个数据框,按列 ( axis=1
)对齐,并对列标签 ( join='outer'
)执行外连接:
a1, a2 = df1.align(df2, join='outer', axis=1)
print(a1)
print(a2)
A B C D E 1 4 2 NaN 1 3 2 9 7 NaN 6 8 A B C D E 2 10 20 30 40 NaN 3 60 70 80 90 NaN 4 600 700 800 900 NaN
A B C D E 1 4 2 NaN 1 3 2 9 7 NaN 6 8 A B C D E 2 10 20 30 40 NaN 3 60 70 80 90 NaN 4 600 700 800 900 NaN
A few things to notice here:
这里有几点需要注意:
- The columns in
df1
have been rearranged so they align with the columns indf2
. - There is a column labelled
'C'
that has been added todf1
, and a column labelled'E'
that has been added todf2
. These columns have been filled withNaN
. This is because we performed an outer join on the column labels. - None of the values inside the DataFrames have been altered.
- Note that the row labels are not aligned;
df2
has rows3
and4
, whereasdf1
does not. This is because we requested alignment on columns (axis=1
).
- 中的列
df1
已重新排列,因此它们与 中的列对齐df2
。 - 有一个标记为
'C'
已添加到df1
的列和一个标记为'E'
已添加到的列df2
。这些列已填充NaN
。这是因为我们对列标签执行了外连接。 - DataFrames 中的任何值都没有被改变。
- 注意行标签没有对齐;
df2
有行3
和4
,而df1
没有。这是因为我们要求对列 (axis=1
)进行对齐。
What happens if we align on both rows and columns, but change the join
parameter to 'right'
?
如果我们在行和列上对齐,但将join
参数更改为,会发生'right'
什么?
a1, a2 = df1.align(df2, join='right', axis=None)
print(a1)
print(a2)
A B C D 2 9.0 7.0 NaN 6.0 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
A B C D 2 9.0 7.0 NaN 6.0 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
Note that:
注意:
- Only the columns and rows that are found in the "right" dataframe (
df2
) are retained. Column'E'
is no longer present. This is because we made a right join on both the column and row labels. - Rows with labels
2
and3
have been added todf1
, filled withNan
. This is because we requested alignment on both rows and columns (axis=None
). - Row labels are now aligned as well as column labels.
- Again, note that none of the actual values within the dataframes have been altered.
- 仅
df2
保留在“正确”数据框 ( ) 中找到的列和行。列'E'
不再存在。这是因为我们对列和行标签进行了右连接。 - 带标签的行
2
和3
已添加到df1
,充满Nan
。这是因为我们要求对齐行和列 (axis=None
)。 - 行标签现在和列标签一样对齐。
- 再次注意,数据帧中的实际值均未更改。
Finally, let's have a look at the code in the question, with join='inner'
and axis=1
:
最后,让我们看一下问题中的代码,使用join='inner'
和axis=1
:
a1, a2 = df1.align(df2, join='inner', axis=1)
print(a1)
print(a2)
D B A 1 1 2 4 2 6 7 9 D B A 2 40 20 10 3 90 70 60 4 900 700 600
D B A 1 1 2 4 2 6 7 9 D B A 2 40 20 10 3 90 70 60 4 900 700 600
- Only column labels are aligned (
axis=1
). - Only column labels that are present in both
df1
anddf2
are retained (join='inner'
).
- 仅列标签对齐 (
axis=1
)。 - 仅存在于两者中
df1
并df2
保留的列标签(join='inner'
)。
In summary, use DataFrame.align()
when you want to make sure the arrangement of rows and/or columns is the same between two dataframes, without altering any of the data contained within the two dataframes.
总之,DataFrame.align()
当您想要确保两个数据帧之间的行和/或列的排列相同时使用,而不改变两个数据帧中包含的任何数据。