Python pandas:将选定的列保留为 DataFrame 而不是 Series

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16782323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:40:50  来源:igfitidea点击:

Python pandas: Keep selected column as DataFrame instead of Series

pythonpandas

提问by

When selecting a single column from a pandas DataFrame(say df.iloc[:, 0], df['A'], or df.A, etc), the resulting vector is automatically converted to a Series instead of a single-column DataFrame. However, I am writing some functions that takes a DataFrame as an input argument. Therefore, I prefer to deal with single-column DataFrame instead of Series so that the function can assume say df.columns is accessible. Right now I have to explicitly convert the Series into a DataFrame by using something like pd.DataFrame(df.iloc[:, 0]). This doesn't seem like the most clean method. Is there a more elegant way to index from a DataFrame directly so that the result is a single-column DataFrame instead of Series?

当从数据帧熊猫选择单个柱(比如说df.iloc[:, 0]df['A']df.A等),所得到的向量被自动转换为一个系列,而不是一个单一的柱数据帧。但是,我正在编写一些将 DataFrame 作为输入参数的函数。因此,我更喜欢处理单列 DataFrame 而不是 Series,以便函数可以假设 df.columns 是可访问的。现在我必须使用类似pd.DataFrame(df.iloc[:, 0]). 这似乎不是最干净的方法。是否有更优雅的方法直接从 DataFrame 进行索引,以便结果是单列 DataFrame 而不是 Series?

采纳答案by Andy Hayden

As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if your trying something ambiguous):

正如@Jeff 提到的,有几种方法可以做到这一点,但我建议使用 loc/iloc 更明确(如果您尝试一些模棱两可的事情,并尽早提出错误):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
   A  B
0  1  2
1  3  4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]:  # they all return the same thing:
   A
0  1
1  3

The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:

后两个选择消除了整数列名情况下的歧义(正是创建 loc/iloc 的原因)。例如:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
   A  0
0  1  2
1  3  4

In [18]: df[[0]]  # ambiguous
Out[18]:
   A
0  1
1  3

回答by Sumanth Lazarus

As Andy Haydenrecommends, utilizing .iloc/.loc to index out (single-columned) dataframe is the way to go; another point to note is how to express the index positions. Use a listed Index labels/positionswhilst specifying the argument values to index out as Dataframe; failure to do so will return a 'pandas.core.series.Series'

正如Andy Hayden 所建议的那样,使用 .iloc/.loc 索引出(单列)数据帧是一种可行的方法;还有一点需要注意的是如何表达索引位置。使用列出的索引标签/位置,同时指定要索引为 Dataframe 的参数值;不这样做将返回一个“pandas.core.series.Series”

Input:

输入:

    A_1 = train_data.loc[:,'Fraudster']
    print('A_1 is of type', type(A_1))
    A_2 = train_data.loc[:, ['Fraudster']]
    print('A_2 is of type', type(A_2))
    A_3 = train_data.iloc[:,12]
    print('A_3 is of type', type(A_3))
    A_4 = train_data.iloc[:,[12]]
    print('A_4 is of type', type(A_4))

Output:

输出:

    A_1 is of type <class 'pandas.core.series.Series'>
    A_2 is of type <class 'pandas.core.frame.DataFrame'>
    A_3 is of type <class 'pandas.core.series.Series'>
    A_4 is of type <class 'pandas.core.frame.DataFrame'>

回答by Snehil

You can use df.iloc[:, 0:1], in this case the resulting vector will be a DataFrameand not series.

您可以使用df.iloc[:, 0:1],在这种情况下,结果向量将是 aDataFrame而不是系列。

As you can see:

如你看到的:

enter image description here

在此处输入图片说明

回答by Null_Vallue_

These three approaches have been mentioned:

已经提到了这三种方法:

pd.DataFrame(df.loc[:, 'A'])  # Approach of the original post
df.loc[:,[['A']]              # Approach 2 (note: use iloc for positional indexing)
df[['A']]                     # Approach 3

pd.Series.to_frame() is another approach.

Because it is a method, it can be used in situations where the second and third approaches above do not apply. In particular, it is useful when applying some method to a column in your dataframe and you want to convert the output into a dataframe instead of a series. For instance, in a Jupyter Notebook a series will not have pretty output, but a dataframe will.

pd.Series.to_frame() 是另一种方法。

因为它是一种方法,所以可以在上述第二种和第三种方法不适用的情况下使用。特别是,当将某种方法应用于数据帧中的列并且您希望将输出转换为数据帧而不是系列时,它非常有用。例如,在 Jupyter Notebook 中,系列不会有漂亮的输出,但数据帧会有。

# Basic use case: 
df['A'].to_frame()

# Use case 2 (this will give you pretty output in a Jupyter Notebook): 
df['A'].describe().to_frame()

# Use case 3: 
df['A'].str.strip().to_frame()

# Use case 4: 
def some_function(num): 
    ...

df['A'].apply(some_function).to_frame()