使用一行访问 Pandas 数据框中的数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36803632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Accessing data in a Pandas dataframe with one row
提问by user1718097
I use Pandas dataframes to manipulate data and I usually visualise them as virtual spreadsheets, with rows and columns defining the positions of individual cells. I'm happy with the methods to slice and dice the dataframes but there seems to be some odd behaviour when the dataframe contains a single row. Basically, I want to select rows of data from a large parent dataframe that meet certain criteria and then pass those results as a daughter dataframe to a separate function for further processing. Sometimes there will only be a single record in the parent dataframe that meets the defined criteria and, therefore, the daughter dataframe will only contain a single row. Nevertheless, I still need to be able to access data in the daughter in the same way as for the parent database. To illustrate may point, consider the following dataframe:
我使用 Pandas 数据框来操作数据,我通常将它们可视化为虚拟电子表格,行和列定义单个单元格的位置。我对切片和切块数据帧的方法感到满意,但是当数据帧包含单行时似乎有一些奇怪的行为。基本上,我想从满足特定条件的大型父数据帧中选择数据行,然后将这些结果作为子数据帧传递给单独的函数以进行进一步处理。有时,父数据帧中只有一条记录符合定义的标准,因此,子数据帧将只包含一行。尽管如此,我仍然需要能够以与父数据库相同的方式访问子数据库中的数据。为了说明可能的观点,请考虑以下数据框:
import pandas as pd
tempDF = pd.DataFrame({'group':[1,1,1,1,2,2,2,2],
'string':['a','b','c','d','a','b','c','d']})
print(tempDF)
Which looks like:
看起来像:
group string
0 1 a
1 1 b
2 1 c
3 1 d
4 2 a
5 2 b
6 2 c
7 2 d
As an example, I can now select those rows where 'group' == 2 and 'string' == 'c', which yields just a single row. As expected, the length of dataframe is 1 and it's possible to print just a single cell using .ix() based on index values in the original dataframe:
例如,我现在可以选择那些“group”==2 和“string”==“c”的行,它们只产生一行。正如预期的那样,数据帧的长度为 1,并且可以根据原始数据帧中的索引值使用 .ix() 仅打印单个单元格:
tempDF2 = tempDF.loc[((tempDF['group']==2) & (tempDF['string']=='c')),['group','string']]
print(tempDF2)
print('Length of tempDF2 = ',tempDF2.index.size)
print(tempDF2.loc[6,['string']])
Output:
输出:
group string
6 2 c
Length of tempDF2 = 1
string c
However, if I select a single row using .loc, then the dataframe is printed in a transposed form and the length of the dataframe is now given as 2 (rather than 1). Clearly, it's no longer possible to select single cell values based on index of original parent dataframe:
但是,如果我使用 .loc 选择单行,则数据帧将以转置形式打印,并且数据帧的长度现在指定为 2(而不是 1)。显然,不再可能根据原始父数据框的索引选择单个单元格值:
tempDF3 = tempDF.loc[6,['group','string']]
print(tempDF3)
print('Length of tempDF3 = ',tempDF3.index.size)
Output:
输出:
group 2
string c
Name: 7, dtype: object
Length of tempDF3 = 2
In my mind, both these methods are actually doing the same thing, namely selecting a single row of data. However, in the second example, the rows and columns are transposed making it impossible to extract data in an expected way.
在我看来,这两种方法实际上都在做同样的事情,即选择一行数据。但是,在第二个示例中,行和列被调换,因此无法以预期的方式提取数据。
Why should these 2 behaviours exist? What is the point of transposing a single row of a dataframe as a default behaviour? How can I make sure that a dataframe containing a single row isn't transposed when I pass it to another function?
为什么要存在这两种行为?将数据帧的单行转置为默认行为有什么意义?当我将包含单行的数据帧传递给另一个函数时,如何确保它不会被转置?
回答by piRSquared
tempDF3 = tempDF.loc[6,['group','string']]
The 6
in the first position of the .loc
selection dictates that the return type will be a Series and hence your problem. Instead use [6]
:
将6
在第一位置.loc
的选择决定了返回类型将是一个系列,因此您的问题。而是使用[6]
:
tempDF3 = tempDF.loc[[6],['group','string']]