基于列 y 中的唯一值的 Python Pandas 子集列 x 值

Question

提问by Cole Robertson

I have a dataframe ( "df") equivalent to:

我有一个数据框（“df”）相当于：

In other words I have a category column and a data column, and the data values do not vary within values of the category column, but they may repeat themselves between different categories (i.e. the values in categories 'x' and 'z' are the same -- 0.112). This means that I need to select one data point from each category, rather than just subsetting on unique values of "Data".

换句话说，我有一个类别列和一个数据列，数据值在类别列的值内不会变化，但它们可能会在不同类别之间重复（即类别“x”和“z”中的值是相同 - 0.112）。这意味着我需要从每个类别中选择一个数据点，而不仅仅是对“数据”的唯一值进行子集化。

The way I've done it is like this:

我的做法是这样的：

    aLst = []
    bLst = []
    for i in df.index:
        if df.loc[i,'Cat'] not in aLst:
            aLst += [df.loc[i,'Cat']]
            bLst += [i]

    new_series = pd.Series(df.loc[bLst,'Data'])

Then I can do whatever I want with it. But the problem is this just seems like a clunky, un-pythonic way of doing things. Any suggestions?

然后我可以用它做任何我想做的事。但问题是这似乎是一种笨拙的、非 Python 式的做事方式。有什么建议？

Answer 1

回答by jezrael

I think you need drop_duplicates:

我认为你需要drop_duplicates：

#by column Cat
print (df.drop_duplicates(['Cat']))
  Cat   Data
0   x  0.112
2   y  0.223
4   z  0.112

Or:

或者：

#by columns Cat and Value
print (df.drop_duplicates(['Cat','Data']))
  Cat   Data
0   x  0.112
2   y  0.223
4   z  0.112

基于列 y 中的唯一值的 Python Pandas 子集列 x 值

提问by Cole Robertson

回答by jezrael

相关推荐

最近更新

标签

基于列 y 中的唯一值的 Python Pandas 子集列 x 值

提问by Cole Robertson

回答by jezrael

相关推荐

pandas pyspark：ValueError：推断后无法确定某些类型

pandas 使用 python {census} 计算每个州的县数

pandas 如何在熊猫数据框中尽可能用 0 替换空单元格并将字符串更改为整数？

pandas 在熊猫中合并多索引数据框

相关推荐

最近更新

标签