Python 熊猫中的轴是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22149584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does axis in pandas mean?
提问by jerry_sjtu
Here is my code to generate a dataframe:
这是我生成数据框的代码:
import pandas as pd
import numpy as np
dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))
then I got the dataframe:
然后我得到了数据框:
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|
+------------+---------+--------+
When I type the commmand :
当我输入命令时:
dff.mean(axis=1)
I got :
我有 :
0 1.074821
dtype: float64
According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be
根据pandas的参考,axis=1代表列,我希望命令的结果是
A 0.626386
B 1.523255
dtype: float64
So here is my question: what does axis in pandas mean?
所以这是我的问题:熊猫中的轴是什么意思?
采纳答案by zhangxaochen
It specifies the axis along whichthe means are computed. By default axis=0. This is consistent with the numpy.meanusage when axisis specified explicitly(in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0along the rows(namely, indexin pandas), and axis=1along the columns. For added clarity, one may choose to specify axis='index'(instead of axis=0) or axis='columns'(instead of axis=1).
它指定计算均值所沿的轴。默认情况下axis=0。这与显式指定numpy.mean时的用法一致(在,axis==None 默认情况下,它计算扁平数组的平均值),其中沿行(即熊猫中的索引)和沿列。为了更加清晰,可以选择指定(而不是)或(而不是)。axisnumpy.meanaxis=0axis=1axis='index'axis=0axis='columns'axis=1
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
| |
| axis=0 |
↓ ↓
回答by nos
The designer of pandas, Wes McKinney, used to work intensively on finance data. Think of columns as stock names and index as daily prices. You can then guess what the default behavior is (i.e., axis=0) with respect to this finance data. axis=1can be simply thought as 'the other direction'.
Pandas 的设计者 Wes McKinney 曾经专注于金融数据。将列视为股票名称,将指数视为每日价格。然后,您可以猜测axis=0关于此财务数据的默认行为是什么(即)。axis=1可以简单地认为是“另一个方向”。
For example, the statistics functions, such as mean(), sum(), describe(), count()all default to column-wise because it makes more sense to do them for each stock. sort_index(by=)also defaults to column. fillna(method='ffill')will fill along column because it is the same stock. dropna()defaults to row because you probably just want to discard the price on that day instead of throw away all prices of that stock.
例如,统计功能,如mean(),sum(),describe(),count()都默认为列明智的,因为它更有意义,做他们每个股票。sort_index(by=)也默认为列。fillna(method='ffill')将沿列填充,因为它是相同的库存。dropna()默认为 row ,因为您可能只想丢弃当天的价格,而不是丢弃该股票的所有价格。
Similarly, the square brackets indexing refers to the columns since it's more common to pick a stock instead of picking a day.
同样,方括号索引指的是列,因为选择股票而不是选择一天更为常见。
回答by Michael
The easiest way for me to understand is to talk about whether you are calculating a statistic for each column (axis = 0) or each row (axis = 1). If you calculate a statistic, say a mean, with axis = 0you will get that statistic for each column. So if each observation is a row and each variable is in a column, you would get the mean of each variable. If you set axis = 1then you will calculate your statistic for each row. In our example, you would get the mean for each observation across all of your variables (perhaps you want the average of related measures).
对我来说,最容易理解的方法是谈论您是为每列 ( axis = 0) 还是每行 ( axis = 1)计算统计量。如果你计算一个统计数据,说一个平均值,axis = 0你会得到每一列的统计数据。因此,如果每个观察值都是一行,而每个变量都在一列中,您将获得每个变量的平均值。如果您设置,axis = 1那么您将计算每一行的统计数据。在我们的示例中,您将获得所有变量的每个观察值的平均值(也许您想要相关度量的平均值)。
axis = 0: by column = column-wise = along the rows
axis = 0: 按列 = 按列 = 沿行
axis = 1: by row = row-wise = along the columns
axis = 1: 按行 = 按行 = 沿列
回答by Safak Ozkan
axisrefers to the dimension of the array, in the case of pd.DataFrames axis=0is the dimension that points downwards and axis=1the one that points to the right.
axis指的是数组的维度,在pd.DataFrames的情况下axis=0是向下指向的维度,axis=1向右指向的维度。
Example:Think of an ndarraywith shape (3,5,7).
示例:考虑一个ndarraywith shape (3,5,7)。
a = np.ones((3,5,7))
ais a 3 dimensional ndarray, i.e. it has 3 axes("axes" is plural of "axis"). The configuration of awill look like 3 slices of bread where each slice is of dimension 5-by-7. a[0,:,:]will refer to the 0-th slice, a[1,:,:]will refer to the 1-st slice etc.
a是 3 维的ndarray,即它有3 个轴(“轴”是“轴”的复数形式)。的配置a看起来像 3 片面包,其中每片的尺寸为 5×7。a[0,:,:]将引用第 0 个切片,a[1,:,:]将引用第一个切片等。
a.sum(axis=0)will apply sum()along the 0-th axis of a. You will add all the slices and end up with one slice of shape (5,7).
a.sum(axis=0)将sum()沿 的第 0 轴应用a。您将添加所有切片并最终得到一个 shape 切片(5,7)。
a.sum(axis=0)is equivalent to
a.sum(axis=0)相当于
b = np.zeros((5,7))
for i in range(5):
for j in range(7):
b[i,j] += a[:,i,j].sum()
band a.sum(axis=0)will both look like this
b并且a.sum(axis=0)都看起来像这样
array([[ 3., 3., 3., 3., 3., 3., 3.],
[ 3., 3., 3., 3., 3., 3., 3.],
[ 3., 3., 3., 3., 3., 3., 3.],
[ 3., 3., 3., 3., 3., 3., 3.],
[ 3., 3., 3., 3., 3., 3., 3.]])
In a pd.DataFrame, axes work the same way as in numpy.arrays: axis=0will apply sum()or any other reduction function for each column.
在 a 中pd.DataFrame,轴的工作方式与在numpy.arrays:中的工作方式相同:axis=0将应用sum()或任何其他归约函数用于每一列。
N.B.In @zhangxaochen's answer, I find the phrases "along the rows" and "along the columns" slightly confusing. axis=0should refer to "along each column", and axis=1"along each row".
注意在@zhangxaochen 的回答中,我发现“沿行”和“沿列”这两个短语有点令人困惑。axis=0应指“沿每一列”和axis=1“沿每一行”。
回答by Mark09
Axis in view of programming is the position in the shape tuple. Here is an example:
编程中的轴是形状元组中的位置。下面是一个例子:
import numpy as np
a=np.arange(120).reshape(2,3,4,5)
a.shape
Out[3]: (2, 3, 4, 5)
np.sum(a,axis=0).shape
Out[4]: (3, 4, 5)
np.sum(a,axis=1).shape
Out[5]: (2, 4, 5)
np.sum(a,axis=2).shape
Out[6]: (2, 3, 5)
np.sum(a,axis=3).shape
Out[7]: (2, 3, 4)
Mean on the axis will cause that dimension to be removed.
轴上的平均值将导致该尺寸被删除。
Referring to the original question, the dff shape is (1,2). Using axis=1 will change the shape to (1,).
参考原始问题,dff 形状是 (1,2)。使用 axis=1 会将形状更改为 (1,)。
回答by Ken Wallace
These answers do help explain this, but it still isn't perfectly intuitive for a non-programmer (i.e. someone like me who is learning Python for the first time in context of data science coursework). I still find using the terms "along" or "for each" wrt to rows and columns to be confusing.
这些答案确实有助于解释这一点,但对于非程序员(即像我这样第一次在数据科学课程中学习 Python 的人)来说,它仍然不是完全直观的。我仍然发现将术语“沿着”或“每个”用于行和列是令人困惑的。
What makes more sense to me is to say it this way:
对我来说更有意义的是这样说:
- Axis 0 will act on all the ROWS in each COLUMN
- Axis 1 will act on all the COLUMNS in each ROW
- Axis 0 将作用于每个 COLUMN 中的所有 ROWS
- 轴 1 将作用于每个 ROW 中的所有 COLUMNS
So a mean on axis 0 will be the mean of all the rows in each column, and a mean on axis 1 will be a mean of all the columns in each row.
因此,轴 0 上的平均值将是每列中所有行的平均值,轴 1 上的平均值将是每行中所有列的平均值。
Ultimately this is saying the same thing as @zhangxaochen and @Michael, but in a way that is easier for me to internalize.
最终,这与@zhangxaochen 和@Michael 说的是同一件事,但以一种更容易让我内化的方式。
回答by HeadAndTail
axis = 0 means up to down axis = 1 means left to right
轴 = 0 表示从上到下轴 = 1 表示从左到右
sums[key] = lang_sets[key].iloc[:,1:].sum(axis=0)
Given example is taking sum of all the data in column == key.
给定的示例是对 column == key 中的所有数据求和。
回答by Patrick
Arrays are designed with so-called axis=0 and rows positioned vertically versus axis=1 and columns positioned horizontally. Axis refers to the dimension of the array.

数组设计为所谓的轴 = 0,行垂直放置,轴 = 1,列水平放置。轴是指数组的维度。

回答by Anu
Let's visualize (you gonna remember always),
In Pandas:
在熊猫中:
- axis=0 means along "indexes". It's a row-wise operation.
- axis=0 表示沿着“索引”。这是一个逐行操作。
Suppose, to perform concat() operation on dataframe1 & dataframe2, we will take dataframe1 & take out 1st row from dataframe1 and place into the new DF, then we take out another row from dataframe1 and put into new DF, we repeat this process until we reach to the bottom of dataframe1. Then, we do the same process for dataframe2.
假设,要对 dataframe1 & dataframe2 执行 concat() 操作,我们将获取 dataframe1 并从 dataframe1 中取出第一行并放入新的 DF,然后我们从 dataframe1 中取出另一行放入新的 DF,我们重复这个过程直到我们到达dataframe1的底部。然后,我们对 dataframe2 执行相同的过程。
Basically, stacking dataframe2 on top of dataframe1 or vice a versa.
基本上,将 dataframe2 堆叠在 dataframe1 之上,反之亦然。
E.g making a pile of books on a table or floor
例如,在桌子或地板上堆放一堆书
- axis=1 means along "columns". It's a column-wise operation.
- axis=1 表示沿着“列”。这是一个按列操作。
Suppose, to perform concat() operation on dataframe1 & dataframe2, we will take out the 1st complete column(a.k.a 1st series) of dataframe1 and place into new DF, then we take out the second column of dataframe1 and keep adjacent to it (sideways), we have to repeat this operation until all columns are finished. Then, we repeat the same process on dataframe2. Basically, stacking dataframe2 sideways.
假设,要对 dataframe1 和 dataframe2 执行 concat() 操作,我们将取出 dataframe1 的第一个完整列(又名第一个系列)并放入新的 DF,然后取出 dataframe1 的第二列并保持相邻(横向),我们必须重复此操作,直到所有列都完成。然后,我们在 dataframe2 上重复相同的过程。基本上, 将 dataframe2 横向堆叠。
E.g arranging books on a bookshelf.
例如,在书架上整理书籍。
More to it, since arrays are better representations to represent a nested n-dimensional structure compared to matrices! so below can help you more to visualize how axis plays an important role when you generalize to more than one dimension. Also, you can actually print/write/draw/visualize any n-dim array but, writing or visualizing the same in a matrix representation(3-dim) is impossible on a paper more than 3-dimensions.
更重要的是,因为与矩阵相比,数组是表示嵌套 n 维结构的更好表示!所以下面可以帮助您更多地了解当您推广到多个维度时轴如何发挥重要作用。此外,您实际上可以打印/写入/绘制/可视化任何 n-dim 数组,但是,在超过 3 维的纸上以矩阵表示(3-dim)写入或可视化相同的数组是不可能的。
回答by Nkrish
My thinking : Axis = n, where n = 0, 1, etc. means that the matrix is collapsed (folded) along that axis. So in a 2D matrix, when you collapse along 0 (rows), you are really operating on one column at a time. Similarly for higher order matrices.
我的想法:Axis = n,其中 n = 0、1 等意味着矩阵沿该轴折叠(折叠)。因此,在 2D 矩阵中,当您沿 0(行)折叠时,您实际上是一次对一列进行操作。对于高阶矩阵也是如此。
This is not the same as the normal reference to a dimension in a matrix, where 0 -> row and 1 -> column. Similarly for other dimensions in an N dimension array.
这与对矩阵中维度的正常引用不同,其中 0 -> 行和 1 -> 列。对于 N 维数组中的其他维度也是如此。

