Python 选择最后 n 列并排除数据框中的最后 n 列

Question

提问by Toly

How do I:

我如何能：

Select last 3 columns in a dataframe and create a new dataframe?

选择数据框中的最后 3 列并创建一个新的数据框？

I tried:

我试过：

y = dataframe.iloc[:,-3:]

Exclude last 3 columns and create a new dataframe?

排除最后 3 列并创建一个新的数据框？

I tried:

我试过：

X = dataframe.iloc[:,:-3]

Is this correct?

这样对吗？

I am getting array dimensional errors further in my code and want to make sure this step is correct.

我的代码中进一步出现数组维度错误，并希望确保这一步是正确的。

Thank you

谢谢

Answer 1

回答by EdChum

just do:

做就是了：

y = dataframe[dataframe.columns[-3:]]

This slices the columns so you can sub-select from the df

这对列进行切片，以便您可以从 df 中进行子选择

Example:

例子：

In [221]:
df = pd.DataFrame(columns=np.arange(10))
df[df.columns[-3:]]

Out[221]:
Empty DataFrame
Columns: [7, 8, 9]
Index: []

I think the issue here is that because you have taken a slice of the df, it's returned a view but depending on what the rest of your code is doing it's raising a warning. You can make an explicit copy by calling .copy()to remove the warnings.

我认为这里的问题是，因为您已经获取了 df 的一部分，它返回了一个视图，但是根据您的其余代码正在执行的操作，它会发出警告。您可以通过调用.copy()删除警告来制作显式副本。

So if we take a copy then assignment only affects the copy and not the original df:

因此，如果我们获取副本，则赋值仅影响副本而不影响原始 df：

In [15]:
df = pd.DataFrame(np.random.randn(5,10), columns= np.arange(10))
df

Out[15]:
          0         1         2         3         4         5         6  \
0  0.568284 -1.488447  0.970365 -1.406463 -0.413750 -0.934892 -1.421308   
1  1.186414 -0.417366 -1.007509 -1.620530 -1.322004  0.294540  1.205115   
2 -1.073894 -0.214972  1.516563 -0.705571  0.068666  1.690654 -0.252485   
3  0.923524 -0.856752  0.226294 -0.660085  1.259145  0.400596  0.559028   
4  0.259807  0.135300  1.130347 -0.317305 -1.031875  0.232262  0.709244   

          7         8         9  
0  1.741925 -0.475619 -0.525770  
1  2.137546  0.215665  1.908362  
2  1.180281 -0.144652  0.870887  
3 -0.609804 -0.833186 -1.033656  
4  0.480943  1.971933  1.928037  

In [16]:    
y = df[df.columns[-3:]].copy()
y

Out[16]:
          7         8         9
0  1.741925 -0.475619 -0.525770
1  2.137546  0.215665  1.908362
2  1.180281 -0.144652  0.870887
3 -0.609804 -0.833186 -1.033656
4  0.480943  1.971933  1.928037

In [17]:    
y[y>0] = 0
print(y)
df

          7         8         9
0  0.000000 -0.475619 -0.525770
1  0.000000  0.000000  0.000000
2  0.000000 -0.144652  0.000000
3 -0.609804 -0.833186 -1.033656
4  0.000000  0.000000  0.000000
Out[17]:
          0         1         2         3         4         5         6  \
0  0.568284 -1.488447  0.970365 -1.406463 -0.413750 -0.934892 -1.421308   
1  1.186414 -0.417366 -1.007509 -1.620530 -1.322004  0.294540  1.205115   
2 -1.073894 -0.214972  1.516563 -0.705571  0.068666  1.690654 -0.252485   
3  0.923524 -0.856752  0.226294 -0.660085  1.259145  0.400596  0.559028   
4  0.259807  0.135300  1.130347 -0.317305 -1.031875  0.232262  0.709244   

          7         8         9  
0  1.741925 -0.475619 -0.525770  
1  2.137546  0.215665  1.908362  
2  1.180281 -0.144652  0.870887  
3 -0.609804 -0.833186 -1.033656  
4  0.480943  1.971933  1.928037

Here no warning is raised and the original df is untouched.

这里没有发出警告，原始 df 未受影响。

Answer 2

回答by Ananta R. Pant

This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).

这是因为使用整数索引（ix 通过 -3 上的标签而不是位置来选择那些索引，这是设计使然：请参阅 pandas“gotchas”* 中的整数索引）。

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

*在较新版本的熊猫中，更喜欢使用 loc 或 iloc 来消除 ix 作为位置或标签的歧义：

df.iloc[-3:] see the docs.

df.iloc[-3:] 见文档。

As Wes points out, in this specific case you should just use tail!

正如 Wes 指出的那样，在这种特定情况下，您应该只使用 tail！

It should also be noted that in Pandas pre-0.14 iloc will raise an IndexError on an out-of-bounds access, while .head() and .tail() will not:

还应该注意的是，在 Pandas 0.14 之前的 iloc 会在越界访问时引发 IndexError，而 .head() 和 .tail() 不会：

pd.version'0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 Old answer (depreciated method):

pd。版本'0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 旧答案（折旧法）：

You can use the irows DataFrame method to overcome this ambiguity:

您可以使用 irows DataFrame 方法来克服这种歧义：

In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 21.102 NaN 10 568 20080930 26.00 30.769 NaN Note: Series has a similar iget method.

In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 26030 30.09 2609 N 207 N 07 301.00 301.07类似的 iget 方法。

Answer 3

回答by decision_scientist_noah

The most efficient way:

最有效的方法：

1. Select last n columns

1.选择最后n列

df1 = df.iloc[:,-n:]

2. Exclude last n columns

2. 排除最后 n 列

df1 = df.iloc[:,:-n]

Python 选择最后 n 列并排除数据框中的最后 n 列

提问by Toly

回答by EdChum

回答by Ananta R. Pant

回答by decision_scientist_noah

相关推荐

最近更新

标签

Python 选择最后 n 列并排除数据框中的最后 n 列

提问by Toly

回答by EdChum

回答by Ananta R. Pant

回答by decision_scientist_noah

相关推荐

如何使用密钥而不是基本身份验证用户名和密码将 Python 连接到 RESTful API？

Python pip 无法安装 numpy 错误代码 1

Python 如何检查 anaconda 包是否已正确安装

Python - 重新排序 csv 中的列

相关推荐

最近更新

标签