Python Pandas DataFrame 中“axis”属性的含义是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39283339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:05:00  来源:igfitidea点击:

What is the meaning of "axis" attribute in a Pandas DataFrame?

pythonpandasdataframeaxis

提问by lxdthriller

Taking the following example:

以下面的例子为例:

>>> df1 = pd.DataFrame({"x":[1, 2, 3, 4, 5], 
                        "y":[3, 4, 5, 6, 7]}, 
                      index=['a', 'b', 'c', 'd', 'e'])

>>> df2 = pd.DataFrame({"y":[1, 3, 5, 7, 9], 
                        "z":[9, 8, 7, 6, 5]}, 
                      index=['b', 'c', 'd', 'e', 'f'])

>>> pd.concat([df1, df2], join='inner')

The output is:

输出是:

   y
a  3
b  4
c  5
d  6
e  7
b  1
c  3
d  5
e  7
f  9

Since axis=0is the columns, I think tha concat()only considers columnsthat are found in both dataframes. But the acutal output considers rowsthat are found in both dataframes.

由于axis=0是列,我认为 thaconcat()只考虑在两个数据框中找到的。但是实际输出会考虑在两个数据帧中找到的

What is the exactly meaning of axisparameter?

axis参数的确切含义是什么?

回答by debaonline4u

If someone needs visual description, here is the image:

如果有人需要视觉描述,这里是图像:

Axis 0 or 1 in Pandas Python

Pandas Python 中的轴 0 或 1

回答by MaxU

Data:

数据:

In [55]: df1
Out[55]:
   x  y
a  1  3
b  2  4
c  3  5
d  4  6
e  5  7

In [56]: df2
Out[56]:
   y  z
b  1  9
c  3  8
d  5  7
e  7  6
f  9  5

Concatenated horizontally(axis=1), using index elementsfound in both DFs (aligned by indexes for joining):

水平连接(轴 = 1),使用在两个 DF 中找到的索引元素(通过索引对齐以进行连接):

In [57]: pd.concat([df1, df2], join='inner', axis=1)
Out[57]:
   x  y  y  z
b  2  4  1  9
c  3  5  3  8
d  4  6  5  7
e  5  7  7  6

Concatenated vertically(DEFAULT: axis=0), using columnsfound in both DFs:

垂直串联(默认:axis=0),使用在两个 DF 中找到的

In [58]: pd.concat([df1, df2], join='inner')
Out[58]:
   y
a  3
b  4
c  5
d  6
e  7
b  1
c  3
d  5
e  7
f  9

If you don't use the innerjoin method - you will have it this way:

如果你不使用innerjoin 方法 - 你会这样:

In [62]: pd.concat([df1, df2])
Out[62]:
     x  y    z
a  1.0  3  NaN
b  2.0  4  NaN
c  3.0  5  NaN
d  4.0  6  NaN
e  5.0  7  NaN
b  NaN  1  9.0
c  NaN  3  8.0
d  NaN  5  7.0
e  NaN  7  6.0
f  NaN  9  5.0

In [63]: pd.concat([df1, df2], axis=1)
Out[63]:
     x    y    y    z
a  1.0  3.0  NaN  NaN
b  2.0  4.0  1.0  9.0
c  3.0  5.0  3.0  8.0
d  4.0  6.0  5.0  7.0
e  5.0  7.0  7.0  6.0
f  NaN  NaN  9.0  5.0

回答by Rod292

This is my trick with axis: just add the operation in your mind to make it sound clear:

这是我使用轴的技巧:只需在您的脑海中添加操作即可使其听起来清晰:

  • axis 0 = rows
  • axis 1 = columns
  • 轴 0 = 行
  • 轴 1 = 列

If you “sum” through axis=0, you are summing all rows, and the output will be a single row with the same number of columns. If you “sum” through axis=1, you are summing all columns, and the output will be a single column with the same number of rows.

如果通过axis = 0“求和”,则对所有行求和,输出将是具有相同列数的单行。如果通过axis=1“求和”,则对所有列求和,输出将是具有相同行数的单列。

回答by Tai

First, OP misunderstood the rows and columns in his/her dataframe.

首先,OP 误解了他/她的数据框中的行和列。

But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')

但是实际输出会考虑在两个数据帧中找到的行。(唯一的公共行元素“y”)

OP thought the label yis for row. However, yis a column name.

OP 认为标签y是用于行的。但是,y是列名。

df1 = pd.DataFrame(
         {"x":[1, 2, 3, 4, 5],  # <-- looks like row x but actually col x
          "y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
          index=['a', 'b', 'c', 'd', 'e'])
print(df1)

            \col   x    y
 index or row\
          a       1     3   |   a
          b       2     4   v   x
          c       3     5   r   i
          d       4     6   o   s
          e       5     7   w   0

               -> column
                 a x i s 1

It is very easy to be misled since in the dictionary, it looks like yand xare two rows.

很容易被误导,因为在字典中,它看起来像yx是两行。

If you generate df1from a list of list, it should be more intuitive:

如果df1从列表列表生成,应该更直观:

df1 = pd.DataFrame([[1,3], 
                    [2,4],
                    [3,5],
                    [4,6],
                    [5,7]],
                    index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])

So back to the problem, concatis a shorthand for concatenate(means to link together in a series or chain on this way [source]) Performing concatalongaxis 0 means to linking two objects alongaxis 0.

因此,回到该问题,concat为速记串连(手段在这种方式一系列或链连接在一起[源])执行concat沿着轴线0表示连接两个物体沿着轴线0。

   1
   1   <-- series 1
   1
^  ^  ^
|  |  |               1
c  a  a               1
o  l  x               1
n  o  i   gives you   2
c  n  s               2
a  g  0               2
t  |  |
|  V  V
v 
   2
   2   <--- series 2
   2

So... think you have the feeling now. What about sumfunction in pandas? What does sum(axis=0)means?

所以......觉得你现在有感觉了。sum大熊猫的功能怎么样?是什么sum(axis=0)意思?

Suppose data looks like

假设数据看起来像

   1 2
   1 2
   1 2

Maybe...summing alongaxis 0, you may guess. Yes!!

也许...沿轴 0求和,您可能会猜到。是的!!

^  ^  ^
|  |  |               
s  a  a               
u  l  x                
m  o  i   gives you two values 3 6 !
|  n  s               
v  g  0               
   |  |
   V  V

What about dropna? Suppose you have data

怎么样dropna?假设你有数据

   1  2  NaN
  NaN 3   5
   2  4   6

and you only want to keep

而你只想保留

2
3
4

On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing

在文档中,它说在给定轴上带有标签的返回对象被省略,其中任何或所有数据丢失

Should you put dropna(axis=0)or dropna(axis=1)? Think about it and try it out with

你应该放dropna(axis=0)还是dropna(axis=1)?考虑一下并尝试一下

df = pd.DataFrame([[1, 2, np.nan],
                   [np.nan, 3, 5],
                   [2, 4, 6]])

# df.dropna(axis=0) or df.dropna(axis=1) ?

Hint: think about the word along.

提示:想想“沿”这个词。

回答by Boud

Interpret axis=0 to apply the algorithm down each column, or to the row labels (the index).. A more detailed schema here.

解释 axis=0 以将算法应用于每一列,或应用于行标签(索引)。这里有更详细的架构。

If you apply that general interpretation to your case, the algorithm here is concat. Thus for axis=0, it means:

如果您将这种一般解释应用于您的案例,那么这里的算法是concat。因此对于axis=0,这意味着:

for each column, take all the rows down (across all the dataframes for concat) , and do contact them when they are in common (because you selected join=inner).

对于每一列,取下所有行(跨所有数据框concat),并在它们相同时联系它们(因为您选择了join=inner)。

So the meaning would be to take all columns xand concat them down the rows which would stack each chunk of rows one after another. However, here xis not present everywhere, so it is not kept for the final result. The same applies for z. For ythe result is kept as yis in all dataframes. This is the result you have.

所以意思是取所有列x并将它们连接到行中,这些行将一个接一个地堆叠每一块行。但是,此处x并非无处不在,因此不会为最终结果保留。这同样适用于z。因为y结果y在所有数据帧中保持原样。这就是你的结果。