Python Pandas DataFrame 中“axis”属性的含义是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39283339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the meaning of "axis" attribute in a Pandas DataFrame?
提问by lxdthriller
Taking the following example:
以下面的例子为例:
>>> df1 = pd.DataFrame({"x":[1, 2, 3, 4, 5],
"y":[3, 4, 5, 6, 7]},
index=['a', 'b', 'c', 'd', 'e'])
>>> df2 = pd.DataFrame({"y":[1, 3, 5, 7, 9],
"z":[9, 8, 7, 6, 5]},
index=['b', 'c', 'd', 'e', 'f'])
>>> pd.concat([df1, df2], join='inner')
The output is:
输出是:
y
a 3
b 4
c 5
d 6
e 7
b 1
c 3
d 5
e 7
f 9
Since axis=0
is the columns, I think tha concat()
only considers columnsthat are found in both dataframes. But the acutal output considers rowsthat are found in both dataframes.
由于axis=0
是列,我认为 thaconcat()
只考虑在两个数据框中找到的列。但是实际输出会考虑在两个数据帧中找到的行。
What is the exactly meaning of axis
parameter?
axis
参数的确切含义是什么?
回答by MaxU
Data:
数据:
In [55]: df1
Out[55]:
x y
a 1 3
b 2 4
c 3 5
d 4 6
e 5 7
In [56]: df2
Out[56]:
y z
b 1 9
c 3 8
d 5 7
e 7 6
f 9 5
Concatenated horizontally(axis=1), using index elementsfound in both DFs (aligned by indexes for joining):
水平连接(轴 = 1),使用在两个 DF 中找到的索引元素(通过索引对齐以进行连接):
In [57]: pd.concat([df1, df2], join='inner', axis=1)
Out[57]:
x y y z
b 2 4 1 9
c 3 5 3 8
d 4 6 5 7
e 5 7 7 6
Concatenated vertically(DEFAULT: axis=0), using columnsfound in both DFs:
垂直串联(默认:axis=0),使用在两个 DF 中找到的列:
In [58]: pd.concat([df1, df2], join='inner')
Out[58]:
y
a 3
b 4
c 5
d 6
e 7
b 1
c 3
d 5
e 7
f 9
If you don't use the inner
join method - you will have it this way:
如果你不使用inner
join 方法 - 你会这样:
In [62]: pd.concat([df1, df2])
Out[62]:
x y z
a 1.0 3 NaN
b 2.0 4 NaN
c 3.0 5 NaN
d 4.0 6 NaN
e 5.0 7 NaN
b NaN 1 9.0
c NaN 3 8.0
d NaN 5 7.0
e NaN 7 6.0
f NaN 9 5.0
In [63]: pd.concat([df1, df2], axis=1)
Out[63]:
x y y z
a 1.0 3.0 NaN NaN
b 2.0 4.0 1.0 9.0
c 3.0 5.0 3.0 8.0
d 4.0 6.0 5.0 7.0
e 5.0 7.0 7.0 6.0
f NaN NaN 9.0 5.0
回答by Rod292
This is my trick with axis: just add the operation in your mind to make it sound clear:
这是我使用轴的技巧:只需在您的脑海中添加操作即可使其听起来清晰:
- axis 0 = rows
- axis 1 = columns
- 轴 0 = 行
- 轴 1 = 列
If you “sum” through axis=0, you are summing all rows, and the output will be a single row with the same number of columns. If you “sum” through axis=1, you are summing all columns, and the output will be a single column with the same number of rows.
如果通过axis = 0“求和”,则对所有行求和,输出将是具有相同列数的单行。如果通过axis=1“求和”,则对所有列求和,输出将是具有相同行数的单列。
回答by Tai
First, OP misunderstood the rows and columns in his/her dataframe.
首先,OP 误解了他/她的数据框中的行和列。
But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')
但是实际输出会考虑在两个数据帧中找到的行。(唯一的公共行元素“y”)
OP thought the label y
is for row. However, y
is a column name.
OP 认为标签y
是用于行的。但是,y
是列名。
df1 = pd.DataFrame(
{"x":[1, 2, 3, 4, 5], # <-- looks like row x but actually col x
"y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
index=['a', 'b', 'c', 'd', 'e'])
print(df1)
\col x y
index or row\
a 1 3 | a
b 2 4 v x
c 3 5 r i
d 4 6 o s
e 5 7 w 0
-> column
a x i s 1
It is very easy to be misled since in the dictionary, it looks like y
and x
are two rows.
很容易被误导,因为在字典中,它看起来像y
和x
是两行。
If you generate df1
from a list of list, it should be more intuitive:
如果df1
从列表列表生成,应该更直观:
df1 = pd.DataFrame([[1,3],
[2,4],
[3,5],
[4,6],
[5,7]],
index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
So back to the problem, concat
is a shorthand for concatenate(means to link together in a series or chain on this way [source]) Performing concat
alongaxis 0 means to linking two objects alongaxis 0.
因此,回到该问题,concat
为速记串连(手段在这种方式一系列或链连接在一起[源])执行concat
沿着轴线0表示连接两个物体沿着轴线0。
1
1 <-- series 1
1
^ ^ ^
| | | 1
c a a 1
o l x 1
n o i gives you 2
c n s 2
a g 0 2
t | |
| V V
v
2
2 <--- series 2
2
So... think you have the feeling now. What about sum
function in pandas? What does sum(axis=0)
means?
所以......觉得你现在有感觉了。sum
大熊猫的功能怎么样?是什么sum(axis=0)
意思?
Suppose data looks like
假设数据看起来像
1 2
1 2
1 2
Maybe...summing alongaxis 0, you may guess. Yes!!
也许...沿轴 0求和,您可能会猜到。是的!!
^ ^ ^
| | |
s a a
u l x
m o i gives you two values 3 6 !
| n s
v g 0
| |
V V
What about dropna
? Suppose you have data
怎么样dropna
?假设你有数据
1 2 NaN
NaN 3 5
2 4 6
and you only want to keep
而你只想保留
2
3
4
On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing
在文档中,它说在给定轴上带有标签的返回对象被省略,其中任何或所有数据丢失
Should you put dropna(axis=0)
or dropna(axis=1)
? Think about it and try it out with
你应该放dropna(axis=0)
还是dropna(axis=1)
?考虑一下并尝试一下
df = pd.DataFrame([[1, 2, np.nan],
[np.nan, 3, 5],
[2, 4, 6]])
# df.dropna(axis=0) or df.dropna(axis=1) ?
Hint: think about the word along.
提示:想想“沿”这个词。
回答by Boud
Interpret axis=0 to apply the algorithm down each column, or to the row labels (the index).. A more detailed schema here.
解释 axis=0 以将算法应用于每一列,或应用于行标签(索引)。这里有更详细的架构。
If you apply that general interpretation to your case, the algorithm here is concat
. Thus for axis=0, it means:
如果您将这种一般解释应用于您的案例,那么这里的算法是concat
。因此对于axis=0,这意味着:
for each column, take all the rows down (across all the dataframes for concat
) , and do contact them when they are in common (because you selected join=inner
).
对于每一列,取下所有行(跨所有数据框concat
),并在它们相同时联系它们(因为您选择了join=inner
)。
So the meaning would be to take all columns x
and concat them down the rows which would stack each chunk of rows one after another. However, here x
is not present everywhere, so it is not kept for the final result. The same applies for z
. For y
the result is kept as y
is in all dataframes. This is the result you have.
所以意思是取所有列x
并将它们连接到行中,这些行将一个接一个地堆叠每一块行。但是,此处x
并非无处不在,因此不会为最终结果保留。这同样适用于z
。因为y
结果y
在所有数据帧中保持原样。这就是你的结果。