Python 选择最后 n 列并排除数据框中的最后 n 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33042633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting last n columns and excluding last n columns in dataframe
提问by Toly
How do I:
我如何能:
- Select last 3 columns in a dataframe and create a new dataframe?
- 选择数据框中的最后 3 列并创建一个新的数据框?
I tried:
我试过:
y = dataframe.iloc[:,-3:]
- Exclude last 3 columns and create a new dataframe?
- 排除最后 3 列并创建一个新的数据框?
I tried:
我试过:
X = dataframe.iloc[:,:-3]
Is this correct?
这样对吗?
I am getting array dimensional errors further in my code and want to make sure this step is correct.
我的代码中进一步出现数组维度错误,并希望确保这一步是正确的。
Thank you
谢谢
回答by EdChum
just do:
做就是了:
y = dataframe[dataframe.columns[-3:]]
This slices the columns so you can sub-select from the df
这对列进行切片,以便您可以从 df 中进行子选择
Example:
例子:
In [221]:
df = pd.DataFrame(columns=np.arange(10))
df[df.columns[-3:]]
Out[221]:
Empty DataFrame
Columns: [7, 8, 9]
Index: []
I think the issue here is that because you have taken a slice of the df, it's returned a view but depending on what the rest of your code is doing it's raising a warning. You can make an explicit copy by calling .copy()
to remove the warnings.
我认为这里的问题是,因为您已经获取了 df 的一部分,它返回了一个视图,但是根据您的其余代码正在执行的操作,它会发出警告。您可以通过调用.copy()
删除警告来制作显式副本。
So if we take a copy then assignment only affects the copy and not the original df:
因此,如果我们获取副本,则赋值仅影响副本而不影响原始 df:
In [15]:
df = pd.DataFrame(np.random.randn(5,10), columns= np.arange(10))
df
Out[15]:
0 1 2 3 4 5 6 \
0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308
1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115
2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485
3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028
4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
In [16]:
y = df[df.columns[-3:]].copy()
y
Out[16]:
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
In [17]:
y[y>0] = 0
print(y)
df
7 8 9
0 0.000000 -0.475619 -0.525770
1 0.000000 0.000000 0.000000
2 0.000000 -0.144652 0.000000
3 -0.609804 -0.833186 -1.033656
4 0.000000 0.000000 0.000000
Out[17]:
0 1 2 3 4 5 6 \
0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308
1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115
2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485
3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028
4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
Here no warning is raised and the original df is untouched.
这里没有发出警告,原始 df 未受影响。
回答by Ananta R. Pant
This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).
这是因为使用整数索引(ix 通过 -3 上的标签而不是位置来选择那些索引,这是设计使然:请参阅 pandas“gotchas”* 中的整数索引)。
*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:
*在较新版本的熊猫中,更喜欢使用 loc 或 iloc 来消除 ix 作为位置或标签的歧义:
df.iloc[-3:] see the docs.
df.iloc[-3:] 见文档。
As Wes points out, in this specific case you should just use tail!
正如 Wes 指出的那样,在这种特定情况下,您应该只使用 tail!
It should also be noted that in Pandas pre-0.14 iloc will raise an IndexError on an out-of-bounds access, while .head() and .tail() will not:
还应该注意的是,在 Pandas 0.14 之前的 iloc 会在越界访问时引发 IndexError,而 .head() 和 .tail() 不会:
pd.version'0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 Old answer (depreciated method):
pd。版本'0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 旧答案(折旧法):
You can use the irows DataFrame method to overcome this ambiguity:
您可以使用 irows DataFrame 方法来克服这种歧义:
In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 21.102 NaN 10 568 20080930 26.00 30.769 NaN Note: Series has a similar iget method.
In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 26030 30.09 2609 N 207 N 07 301.00 301.07类似的 iget 方法。
回答by decision_scientist_noah
The most efficient way:
最有效的方法:
1. Select last n columns
1.选择最后n列
df1 = df.iloc[:,-n:]
df1 = df.iloc[:,-n:]
2. Exclude last n columns
2. 排除最后 n 列
df1 = df.iloc[:,:-n]
df1 = df.iloc[:,:-n]