pandas 如何使用尽可能少的代码在 Jupyter notebook 中使用 Python 创建给定数据的频率分布表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41551658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:45:18  来源:igfitidea点击:

How to create a frequency distribution table on given data with Python in Jupyter notebook with as few code as possible?

pythonpandasstatisticsjupyter-notebook

提问by Mainul Islam

Develop a frequency distribution summarizing this data.This data is a demand for an object over a period of 20 days.

开发总结此数据的频率分布。此数据是一个对象在 20 天内的需求。

2 1 0 2 1 3 0 2 4 0 3 2 3 4 2 2 2 4 3 0. The task is to create a table in the jupyter notebook with columns Demand and Frequency. Note: Demand has to be in ascending order. This is what I did.

2 1 0 2 1 3 0 2 4 0 3 2 3 4 2 2 2 4 3 0. 任务是在 jupyter notebook 中创建一个表格,其中包含需求和频率列。注意:需求必须按升序排列。这就是我所做的。

list_of_days = [2, 1, 0, 2, 1, 3, 0, 2, 4, 0, 3, 2 ,3, 4, 2, 2, 2, 4, 3, 0] # created a list of the data
import pandas as pd
series_of_days = pd.Series(list_of_days) # converted the list to series
series_of_days.value_counts(ascending = True) # the frequency was ascending but not the demand
test = dict(series_of_days.value_counts())
freq_table =  pd.Series(test)
pd.DataFrame({"Demand":freq_table.index, "Frequency":freq_table.values})

The output has to be like this:

输出必须是这样的:

<table border = "1">

  <tr>
    <td>Demand</td>
    <td>Frequency</td>
  </tr>
  <tr>
    <td>0</td>
    <td>4</td>
  </tr>
  <tr>
    <td>1</td>
    <td>2</td>
  </tr>
  <tr>
    <td>2</td>
    <td>7</td>
  </tr>
<table>

and so on. Is there a better way to shorten the Python code? Or make it more efficient?

等等。有没有更好的方法来缩短 Python 代码?还是让它更有效率?

回答by jezrael

You can use value_countswith reset_indexand sorting by sort_values:

您可以使用value_countswithreset_index和排序方式sort_values

df1 = pd.Series(list_of_days).value_counts()
        .reset_index()
        .sort_values('index')
        .reset_index(drop=True)
df1.columns = ['Demand', 'Frequency']
print (df1)
   Demand  Frequency
0       0          4
1       1          2
2       2          7
3       3          4
4       4          3

Another similar solution with sorting by sort_index:

排序方式的另一个类似解决方案sort_index

df1 = pd.Series(list_of_days)
        .value_counts()
        .sort_index()
        .reset_index()
        .reset_index(drop=True)
df1.columns = ['Demand', 'Frequency']
print (df1)
   Demand  Frequency
0       0          4
1       1          2
2       2          7
3       3          4
4       4          3

回答by Mohammad Athar

import collections
collections.Counter(list_of_days)

Should do what you're describing

应该做你所描述的

回答by piRSquared

I'm going for the literal creation of the HTML table you posted

我要创建您发布的 HTML 表格

pd.value_counts([2,1,0,2,1,3,0,2,4,0,3,2,3,4,2,2,2,4,3,0]).to_frame(name='Frequency').rename_axis('Demand', 1).sort_index()

enter image description here

在此处输入图片说明

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Demand</th>
      <th>Frequency</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>4</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
    </tr>
    <tr>
      <th>2</th>
      <td>7</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
    </tr>
    <tr>
      <th>4</th>
      <td>3</td>
    </tr>
  </tbody>
</table>

回答by Po Stevanus Andrianta

if you want shortest, probably this code, Counter by default will sort the key in ascending.

如果你想要最短的,可能是这个代码,默认情况下 Counter 会按升序对键进行排序。

list_of_days = [2, 1, 0, 2, 1, 3, 0, 2, 4, 0, 3, 2, 3, 4, 2, 2, 2, 4, 3, 0]  
day_counter = Counter(list_of_days).items()
data = [ [a,b] for a,b in day_counter ]
print(data)

[[0, 4], [1, 2], [2, 7], [3, 4], [4, 3]]

[[0, 4], [1, 2], [2, 7], [3, 4], [4, 3]]