为什么我得到只有一列与系列的 Pandas 数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25920932/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:29:20  来源:igfitidea点击:

Why do I get Pandas data frame with only one column vs Series?

pythonpandasdataframeseries

提问by paulsef11

I've noticed single-column data frames a couple of times to much chagrin (examples below); but in most other cases a one-column data frame would just be a Series. Is there any rhyme or reason as to why a one column DF would be returned?

我多次注意到单列数据帧,这让我非常懊恼(下面的例子);但在大多数其他情况下,一列数据框只是一个系列。为什么会返回一列 DF 是否有任何押韵或原因?

Examples:

例子:

1) when indexing columns by a boolean mask where the mask only has one true value:

1) 当通过布尔掩码索引列,掩码只有一个真值

df = pd.DataFrame([list('abc'), list('def')], columns = ['foo', 'bar', 'tar'])
mask = [False, True, False]
type(df.ix[:,mask])

2) when setting an index on DataFrame that only has two columnsto begin with:

2)在 DataFrame 上设置索引时,该索引只有两列开头:

df = pd.DataFrame([list('ab'), list('de'), list('fg')], columns = ['foo', 'bar']
type(df.set_index('foo'))

I feel like if I'm expecting a DF with only one column, I can deal with it by just calling

我觉得如果我期待一个只有一列的 DF,我可以通过调用来处理它

pd.Series(df.values().ravel(), index = df.index)

But in most other cases a one-column data frame would just be a Series. Is there any rhyme or reason as to why a one column DF would be returned?

但在大多数其他情况下,一列数据框只是一个系列。为什么会返回一列 DF 是否有任何押韵或原因?

回答by BrenBarn

In general, a one-column DataFrame will be returned when the operation couldreturn a multicolumn DataFrame. For instance, when you use a boolean column index, a multicolumn DataFrame would have to be returned if there was more than one True value, so a DataFrame will always be returned, even if it has only one column. Likewise when setting an index, if your DataFrame had more than two columns, the result would still have to be a DataFrame after removing one for the index, so it will still be a DataFrame even if it has only one column left.

通常,当操作可以返回多列DataFrame 时,返回一个单列DataFrame。例如,当您使用布尔列索引时,如果有多个 True 值,则必须返回多列 DataFrame,因此即使只有一列,也将始终返回 DataFrame。同样,在设置索引时,如果您的 DataFrame 有两列以上,则在为索引删除一列后,结果仍然必须是 DataFrame,因此即使它只剩下一列,它仍然是 DataFrame。

In contrast, if you do something like df.ix[:,'col'], it returns a Series, because there is no way that passing one column name to select can ever select more than one column.

相比之下,如果你执行类似的操作df.ix[:,'col'],它会返回一个系列,因为通过一个列名来选择是不可能选择多于一列的。

The idea is that doing an operation should not sometimes return a DataFrame and sometimes a Series based on features specific to the operands (i.e., how many columns they happen to have, how many values are True in your boolean mask). When you do df.set_index('col'), it's simpler if you know that you will always get a DataFrame, without having to worry about how many columns the original happened to have.

这个想法是,做一个操作不应该有时返回一个数据帧,有时不应该返回一个基于特定于操作数的特征的系列(即,它们碰巧有多少列,布尔掩码中有多少值为真)。当你这样做时df.set_index('col'),如果你知道你总是会得到一个 DataFrame 就更简单了,而不必担心原始数据碰巧有多少列。

Note that there is also the DataFrame method .squeeze()for turning a one-column DataFrame into a Series.

请注意,还有将单列.squeeze()DataFrame 转换为 Series的 DataFrame 方法。