pandas 单变量类别散点图熊猫

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37194968/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:14:17  来源:igfitidea点击:

Single variable category scatter plot pandas

pythonpandasmatplotlibseabornbokeh

提问by Sitz Blogz

Is It possible to plot single value as scatter plot? I can very well plot it in line by getting the ccdfs with markers but I want to know if any alternative is available?

是否可以将单个值绘制为散点图?我可以通过使用标记获取 ccdfs 来很好地绘制它,但我想知道是否有任何替代方法可用?

Input:

输入:

Input 1

输入 1

tweetcricscore 51 high active

Input 2

输入 2

tweetcricscore 46 event based
tweetcricscore 12 event based
tweetcricscore 46 event based

Input 3

输入 3

tweetcricscore 1 viewers 
tweetcricscore 178 viewers

Input 4

输入 4

tweetcricscore 46 situational
tweetcricscore 23 situational
tweetcricscore 1 situational
tweetcricscore 8 situational
tweetcricscore 56 situational

I can very much write scatter plot code with bokehand pandasusing xand yvalues. But in case of single value ?

我可以用非常写散点图代码bokehpandas使用xy值。但是在单个值的情况下?

When all the inputs are merged as one input and are to be grouped by col[3], values are col[2].

当所有输入合并为一个输入并按 分组时col[3],值为col[2]

The code below is for data set with 2 variables

下面的代码适用于具有 2 个变量的数据集

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd
from bokeh.charts import Scatter, output_file, show

df = pd.read_csv('input.csv', header = None)

df.columns = ['col1','col2','col3','col4']

scatter = Scatter( df, x='col2', y='col3', color='col4', marker='col4', title='plot', legend=True)

output_file('output.html', title='output')

show(scatter)

Sample Output

样本输出

enter image description here

在此处输入图片说明

采纳答案by MaxU

UPDATE:

更新:

look at Bokehand Seaborngalleries - it might help you to understand what kind of plot fits your needs

看看BokehSeaborn画廊——它可能会帮助你了解什么样的情节适合你的需要

you may try violinplot like this:

你可以像这样尝试 violinplot:

sns.violinplot(x="category", y="val", data=df)

enter image description here

在此处输入图片说明

or HeatMaps:

或热图:

import numpy as np
import pandas as pd
from bokeh.charts import HeatMap, output_file, show

cats = ['active', 'based', 'viewers', 'situational']
df = pd.DataFrame({'val': np.random.randint(1,100, 1000), 'category': np.random.choice(cats, 1000)})

hm = HeatMap(df)
output_file('d:/temp/heatmap.html')
show(hm)

回答by Grr

You could try a boxplotor violinplot. Alternatively if you don't like these and just want a vertical distribution of dots you could force a scatter to plot along a single x value. To do this you would need to create an array of a fixed value (say 1) that is the same length as the array you will be plotting:

你可以尝试一箱线图violinplot。或者,如果您不喜欢这些并且只想要点的垂直分布,您可以强制沿着单个 x 值绘制散点图。为此,您需要创建一个与要绘制的数组长度相同的固定值数组(例如 1):

ones = []
for range(len(data)):
    ones.append(1)

plt.scatter(ones,data)
plt.show()

That will give you something like this:

这会给你这样的东西:

enter image description here

在此处输入图片说明

回答by H_J

You can plot index on x-axis and column value on y-axis

您可以在 x 轴上绘制索引,在 y 轴上绘制列值

df = pd.DataFrame(np.random.randint(0,10,size=(100, 1)), columns=list('A'))
sns.scatterplot(data=df['A'])

enter image description here

在此处输入图片说明

回答by Yaakov Bressler

Something I use rather regularly is a "size plot" – a visualization similar to the one you're requesting where a single feature can be compared across groups. Here is an example using your data:

我经常使用的是“大小图”——一种类似于您请求的可视化,其中可以跨组比较单个特征。以下是使用您的数据的示例:

a size plot made using matplotlib

使用 matplotlib 制作的尺寸图

Here is the code to achieve this size plot:

这是实现此大小图的代码:

fig, ax = plt.subplots(1,1, figsize=(8,5))

colors = ['blue','green','orange','pink']

yticks = {"ticks":[],"labels":[]}
xticks = {"ticks":[],"labels":[]}

agg_functions = ["mean","std","sum"]

# Set size plot
for i, (label, group_df) in enumerate(df.groupby('type', as_index=False)):

    # Set tick
    yticks["ticks"].append(i)
    yticks["labels"].append(label)

    agg_values = group_df["tweetcricscore"].aggregate(agg_functions)

    for ii, (agg_f, x) in enumerate(agg_values.iteritems()):
        ax.scatter(x=ii, y = i, label=agg_f, s=x, color=colors[i])


        # Add your x axis
        if ii not in xticks["ticks"]:
            xticks["ticks"].append(ii)
            xticks["labels"].append(agg_f)


# Set yticks:
ax.set_yticks(yticks["ticks"]) 
ax.set_yticklabels(yticks["labels"], fontsize=12)

ax.set_xticks(xticks["ticks"]) 
ax.set_xticklabels(xticks["labels"], fontsize=12)


plt.show()