Pandas：更改交叉表结果的顺序

Question

提问by Denis Kulagin

How to change order in the result of pd.crosstab:

如何更改pd.crosstab结果中的顺序：

pd.crosstab(df['col1'], df['col2'])

I would like to be able to sort by:

我希望能够排序：

unique values of either df['col1']or df['col2'](cols/rows of the crosstab result)
by marginal values (e.g. showing higher-count values of df['col1']closer to the top)

df['col1']或df['col2'] 的唯一值（交叉表结果的列/行）
通过边际值（例如显示更接近顶部的df['col1'] 的更高计数值）

Answer 1

回答by epattaro

Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.

好吧，如果您提供数据示例，那么为您提供解决方案会更容易，因为它可能会相应地变化很大。我将尝试在下面构建一个案例场景和可能的解决方案。

If we take the example data and crosstab:

如果我们以示例数据和交叉表为例：

a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
       'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)

c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
       'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)

CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])

CT

We have the following output:

我们有以下输出：

Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.

那是一个常规的数据框对象，它只是相应地“交叉表”或更好的“数据透视表”。

You would like to show:

你想展示：

unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)

df['col1'] 或 df['col2'] 的唯一值（交叉表结果的列/行）
通过边际值（例如显示更接近顶部的 df['col1'] 的更高计数值）

So lets start with "1":

所以让我们从“1”开始：

There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;

有多种方法可以做到这一点，一个简单的解决方案是使用布尔值显示相同的数据帧对象，用于奇异情况；

[CT == 1]

However, that format might not be what you desire in case of large dataframes.

但是，对于大型数据帧，这种格式可能不是您想要的。

You could just print the positive cases, or list/append 'em, a simple example would be:

您可以只打印正面案例，或列出/附加它们，一个简单的例子是：

for col in CT.columns:

    for index in CT.index:

        if CT.loc[index,col] == 1:

            print (index,col,'singular')

Output:

输出：

('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')

The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).

第二个项目/愿望更复杂。您想按更高的价值订购。但可能会有分歧。与一组索引相关联的一列中的较高值很可能与第二列（也在相同索引中相关联）的顺序不同。

Hence, you can choose to order by one specific column:

因此，您可以选择按一个特定的列排序：

CT.sort_values('column_name', ascending=False)

Or, you can define a metric by which you want to order (row mean value) and sort accordingly.

或者，您可以定义要用来排序的指标（行平均值）并相应地排序。

Hope that helps!

希望有帮助！

Pandas：更改交叉表结果的顺序

提问by Denis Kulagin

回答by epattaro

相关推荐

最近更新

标签

Pandas：更改交叉表结果的顺序

提问by Denis Kulagin

回答by epattaro

相关推荐

pandas 如何将`style` 与DataFrame 上的`to_html` 类结合使用？

pandas 如何使用python选择前X列和最后Y列

pandas 如何将字典作为一行添加到DataFrame？

pandas Python - 类型错误：需要字符串或字节对象

相关推荐

最近更新

标签