Python Pandas 数据框/Numpy 数组“轴”定义中的歧义
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/25773245/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ambiguity in Pandas Dataframe / Numpy Array "axis" definition
提问by hlin117
I've been very confused about how python axes are defined, and whether they refer to a DataFrame's rows or columns. Consider the code below:
我一直很困惑如何定义 python 轴,以及它们是指 DataFrame 的行还是列。考虑下面的代码:
>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"])
>>> df
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
So if we call df.mean(axis=1), we'll get a mean across the rows:
因此,如果我们调用df.mean(axis=1),我们将得到各行的均值:
>>> df.mean(axis=1)
0    1
1    2
2    3
However, if we call df.drop(name, axis=1), we actually drop a column, not a row:
但是,如果我们调用df.drop(name, axis=1),我们实际上删除了一个列,而不是一行:
>>> df.drop("col4", axis=1)
   col1  col2  col3
0     1     1     1
1     2     2     2
2     3     3     3
Can someone help me understand what is meant by an "axis" in pandas/numpy/scipy?
有人能帮我理解 pandas/numpy/scipy 中的“轴”是什么意思吗?
A side note, DataFrame.meanjust might be defined wrong. It says in the documentation for DataFrame.meanthat axis=1is supposed to mean a mean over the columns, not the rows...
旁注,DataFrame.mean只是可能定义错误。它在文档中DataFrame.mean说这axis=1应该意味着列的平均值,而不是行......
采纳答案by Alex Riley
It's perhaps simplest to remember it as 0=downand 1=across.
将其记为0=down和1=across可能是最简单的。
This means:
这意味着:
- Use axis=0to apply a method down each column, or to the row labels (the index).
- Use axis=1to apply a method across each row, or to the column labels.
- 使用axis=0的方法应用于沿着每列,或行标签(索引)。
- 使用axis=1的方法应用在每个行,或列标签。
Here's a picture to show the parts of a DataFrame that each axis refers to:
这是一张图片,显示了每个轴所指的 DataFrame 的各个部分:


It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:
记住 Pandas 遵循 NumPy 对单词的使用也很有用axis。NumPy 的术语表中解释了用法:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
轴是为多维数组定义的。二维数组有两个对应的轴:第一个垂直向下跨行(轴 0),第二个水平跨列(轴 1)。[我的重点]
So, concerning the method in the question, df.mean(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0)would be an operation acting vertically downwards across rows.
因此,关于问题中的方法,df.mean(axis=1), 似乎被正确定义。它采用水平跨列的条目的平均值,即沿着每个单独的行。另一方面,df.mean(axis=0)将是跨行垂直向下作用的操作。
Similarly, df.drop(name, axis=1)refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0would make the method act on rows instead.
同样,df.drop(name, axis=1)指的是对列标签的操作,因为它们直观地跨过水平轴。指定axis=0将使方法改为作用于行。
回答by o0omycomputero0o
Another way to explain:
另一种解释方式:
// Not realistic but ideal for understanding the axis parameter 
df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
                  columns=["idx1", "idx2", "idx3", "idx4"],
                  index=["idx1", "idx2", "idx3"]
                 )
---------------------------------------1
|          idx1  idx2  idx3  idx4
|    idx1     1     1     1     1
|    idx2     2     2     2     2
|    idx3     3     3     3     3
0
About df.drop(axis means the position)
关于df.drop(轴表示位置)
A: I wanna remove idx3.
B: **Which one**? // typing while waiting response: df.drop("idx3",
A: The one which is on axis 1
B: OK then it is >> df.drop("idx3", axis=1)
// Result
---------------------------------------1
|          idx1  idx2     idx4
|    idx1     1     1     1
|    idx2     2     2     2
|    idx3     3     3     3
0
About df.apply(axis means direction)
关于df.apply(轴表示方向)
A: I wanna apply sum.
B: Which direction? // typing while waiting response: df.apply(lambda x: x.sum(),
A: The one which is on *parallel to axis 0*
B: OK then it is >> df.apply(lambda x: x.sum(), axis=0)
// Result
idx1    6
idx2    6
idx3    6
idx4    6
回答by jeongmin.cha
There are already proper answers, but I give you another example with > 2 dimensions.
已经有正确的答案,但我给你另一个> 2维的例子。
The parameter axismeans axis to be changed.
For example, consider that there is a dataframe with dimension a x b x c.  
该参数axis表示要更改的轴。
例如,考虑有一个维度为axbxc的数据框。  
- df.mean(axis=1)returns a dataframe with dimenstion a x 1 x c.
- df.drop("col4", axis=1)returns a dataframe with dimension a x (b-1) x c.
- df.mean(axis=1)返回一个维度ax 1 xc的数据帧。
- df.drop("col4", axis=1)返回维度为ax (b-1) xc的数据帧。
Here, axis=1means the second axis which is b, so bvalue will be changed in these examples. 
在这里,axis=1表示第二个轴是b,因此b在这些示例中值将被更改。
回答by Ted Petrou
It should be more widely known that the string aliases 'index'and 'columns'can be used in place of the integers 0/1. The aliases are much more explicit and help me remember how the calculations take place. Another alias for 'index' is 'rows'.
更广为人知的是,字符串别名“index”和“columns”可以用来代替整数 0/1。别名更加明确,帮助我记住计算是如何进行的。'index' 的另一个别名是'rows'。
When axis='index'is used, then the calculations happen down the columns, which is confusing. But, I remember it as getting a result that is the same size as another row.
当axis='index'被使用,那么计算下来发生在列,这是混淆。但是,我记得它得到的结果与另一行的大小相同。
Let's get some data on the screen to see what I am talking about:
让我们在屏幕上获取一些数据,看看我在说什么:
df = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
          a         b         c         d
0  0.990730  0.567822  0.318174  0.122410
1  0.144962  0.718574  0.580569  0.582278
2  0.477151  0.907692  0.186276  0.342724
3  0.561043  0.122771  0.206819  0.904330
4  0.427413  0.186807  0.870504  0.878632
5  0.795392  0.658958  0.666026  0.262191
6  0.831404  0.011082  0.299811  0.906880
7  0.749729  0.564900  0.181627  0.211961
8  0.528308  0.394107  0.734904  0.961356
9  0.120508  0.656848  0.055749  0.290897
When we want to take the mean of all the columns, we use axis='index'to get the following:
当我们想取所有列的平均值时,我们使用axis='index'以下内容:
df.mean(axis='index')
a    0.562664
b    0.478956
c    0.410046
d    0.546366
dtype: float64
The same result would be gotten by:
将得到相同的结果:
df.mean() # default is axis=0
df.mean(axis=0)
df.mean(axis='rows')
To get use an operation left to right on the rows, use axis='columns'. I remember it by thinking that an additional column may be added to my DataFrame:
要在行上使用从左到右的操作,请使用axis='columns'。我记得我认为可以向我的 DataFrame 添加一个额外的列:
df.mean(axis='columns')
0    0.499784
1    0.506596
2    0.478461
3    0.448741
4    0.590839
5    0.595642
6    0.512294
7    0.427054
8    0.654669
9    0.281000
dtype: float64
The same result would be gotten by:
将得到相同的结果:
df.mean(axis=1)
Add a new row with axis=0/index/rows
添加一个 axis=0/index/rows 的新行
Let's use these results to add additional rows or columns to complete the explanation. So, whenever using axis = 0/index/rows, its like getting a new row of the DataFrame. Let's add a row:
让我们使用这些结果添加额外的行或列来完成解释。因此,每当使用axis = 0/index/rows 时,就像获取DataFrame 的新行一样。让我们添加一行:
df.append(df.mean(axis='rows'), ignore_index=True)
           a         b         c         d
0   0.990730  0.567822  0.318174  0.122410
1   0.144962  0.718574  0.580569  0.582278
2   0.477151  0.907692  0.186276  0.342724
3   0.561043  0.122771  0.206819  0.904330
4   0.427413  0.186807  0.870504  0.878632
5   0.795392  0.658958  0.666026  0.262191
6   0.831404  0.011082  0.299811  0.906880
7   0.749729  0.564900  0.181627  0.211961
8   0.528308  0.394107  0.734904  0.961356
9   0.120508  0.656848  0.055749  0.290897
10  0.562664  0.478956  0.410046  0.546366
Add a new column with axis=1/columns
添加一个axis=1/columns的新列
Similarly, when axis=1/columns it will create data that can be easily made into its own column:
类似地,当 axis=1/columns 时,它会创建可以轻松放入自己的列中的数据:
df.assign(e=df.mean(axis='columns'))
          a         b         c         d         e
0  0.990730  0.567822  0.318174  0.122410  0.499784
1  0.144962  0.718574  0.580569  0.582278  0.506596
2  0.477151  0.907692  0.186276  0.342724  0.478461
3  0.561043  0.122771  0.206819  0.904330  0.448741
4  0.427413  0.186807  0.870504  0.878632  0.590839
5  0.795392  0.658958  0.666026  0.262191  0.595642
6  0.831404  0.011082  0.299811  0.906880  0.512294
7  0.749729  0.564900  0.181627  0.211961  0.427054
8  0.528308  0.394107  0.734904  0.961356  0.654669
9  0.120508  0.656848  0.055749  0.290897  0.281000
It appears that you can see all the aliases with the following private variables:
似乎您可以看到所有具有以下私有变量的别名:
df._AXIS_ALIASES
{'rows': 0}
df._AXIS_NUMBERS
{'columns': 1, 'index': 0}
df._AXIS_NAMES
{0: 'index', 1: 'columns'}
回答by newbie
When axis='rows' or axis=0, it means access elements in the direction of the rows, up to down. If applying sum along axis=0, it will give us totals of each column.
当axis='rows' 或axis=0 时,表示按行的方向从上到下访问元素。如果沿轴应用 sum=0,它将为我们提供每列的总数。
When axis='columns' or axis=1, it means access elements in the direction of the columns, left to right. If applying sum along axis=1, we will get totals of each row.
当axis='columns' 或axis=1 时,表示按列的方向从左到右访问元素。如果沿轴应用 sum=1,我们将得到每行的总数。
Still confusing! But the above makes it a bit easier for me.
还是一头雾水!但以上让我更容易一些。

