Pandas:更改交叉表结果的顺序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42770379/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: change order of crosstab result
提问by Denis Kulagin
How to change order in the result of pd.crosstab:
如何更改pd.crosstab结果中的顺序:
pd.crosstab(df['col1'], df['col2'])
I would like to be able to sort by:
我希望能够排序:
- unique values of either df['col1']or df['col2'](cols/rows of the crosstab result)
- by marginal values (e.g. showing higher-count values of df['col1']closer to the top)
- df['col1']或df['col2'] 的唯一值(交叉表结果的列/行)
- 通过边际值(例如显示更接近顶部的df['col1'] 的更高计数值)
回答by epattaro
Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.
好吧,如果您提供数据示例,那么为您提供解决方案会更容易,因为它可能会相应地变化很大。我将尝试在下面构建一个案例场景和可能的解决方案。
If we take the example data and crosstab:
如果我们以示例数据和交叉表为例:
a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)
c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)
CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])
CT
We have the following output:
我们有以下输出:
Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.
那是一个常规的数据框对象,它只是相应地“交叉表”或更好的“数据透视表”。
You would like to show:
你想展示:
- unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
- by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)
- df['col1'] 或 df['col2'] 的唯一值(交叉表结果的列/行)
- 通过边际值(例如显示更接近顶部的 df['col1'] 的更高计数值)
So lets start with "1":
所以让我们从“1”开始:
There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;
有多种方法可以做到这一点,一个简单的解决方案是使用布尔值显示相同的数据帧对象,用于奇异情况;
[CT == 1]
However, that format might not be what you desire in case of large dataframes.
但是,对于大型数据帧,这种格式可能不是您想要的。
You could just print the positive cases, or list/append 'em, a simple example would be:
您可以只打印正面案例,或列出/附加它们,一个简单的例子是:
for col in CT.columns:
for index in CT.index:
if CT.loc[index,col] == 1:
print (index,col,'singular')
Output:
输出:
('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')
The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).
第二个项目/愿望更复杂。您想按更高的价值订购。但可能会有分歧。与一组索引相关联的一列中的较高值很可能与第二列(也在相同索引中相关联)的顺序不同。
Hence, you can choose to order by one specific column:
因此,您可以选择按一个特定的列排序:
CT.sort_values('column_name', ascending=False)
Or, you can define a metric by which you want to order (row mean value) and sort accordingly.
或者,您可以定义要用来排序的指标(行平均值)并相应地排序。
Hope that helps!
希望有帮助!