Python 如何将值从循环存储到数据帧？

Question

提问by JOHN

I have created a loop that generates some values. I want to store those values in a data frame. For example, completed one loop, append to the first row.

我创建了一个生成一些值的循环。我想将这些值存储在数据框中。例如，完成一个循环，追加到第一行。

def calculate (allFiles):

    result = pd.DataFrame(columns = ['Date','Mid Ebb Total','Mid Flood Total','Mid Ebb Control','Mid Flood Control'])

    total_Mid_Ebb = 0
    total_Mid_Flood = 0
    total_Mid_EbbControl = 0
    total_Mid_FloodControl = 0

    for file_ in allFiles:
        xls = pd.ExcelFile(file_)
        df = xls.parse('General Impact')
        Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter 
        Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
        Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
        Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
        total_Mid_Ebb += Mid_Ebb.Station.nunique() #count unique stations = sample number
        total_Mid_Flood += Mid_Flood.Station.nunique()
        total_Mid_EbbControl += Mid_Ebb_control.Station.nunique()
        total_Mid_FloodControl += Mid_Flood_control.Station.nunique()

    Mid_Ebb_withoutControl = total_Mid_Ebb - total_Mid_EbbControl
    Mid_Flood_withoutControl = total_Mid_Flood - total_Mid_FloodControl

    print('Ebb Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl))
    print('Flood Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Flood, Mid_Flood_withoutControl, total_Mid_FloodControl))

The dataframe result contains 4 columns. The date is fixed. I would like to put total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl to the dataframe.

数据框结果包含 4 列。日期是固定的。我想将 total_Mid_Ebb、Mid_Ebb_withoutControl、total_Mid_EbbControl 放入数据帧。

Answer 1

采纳答案by jezrael

I believe you need append scalars in loop to list of tuples and then use DataFrameconstructor. Last count differences in resultDataFrame:

我相信您需要将循环中的标量附加到元组列表中，然后使用DataFrame构造函数。resultDataFrame 中的最后计数差异：

def calculate (allFiles):

    data = []
    for file_ in allFiles:
        xls = pd.ExcelFile(file_)
        df = xls.parse('General Impact')
        Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter 
        Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
        Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
        Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
        total_Mid_Ebb = Mid_Ebb.Station.nunique() #count unique stations = sample number
        total_Mid_Flood = Mid_Flood.Station.nunique()
        total_Mid_EbbControl = Mid_Ebb_control.Station.nunique()
        total_Mid_FloodControl = Mid_Flood_control.Station.nunique()
        data.append((total_Mid_Ebb, 
                     total_Mid_Flood, 
                     total_Mid_EbbControl, 
                     total_Mid_FloodControl))

    cols=['total_Mid_Ebb','total_Mid_Flood','total_Mid_EbbControl','total_Mid_FloodControl']

    result = pd.DataFrame(data, columns=cols)
    result['Mid_Ebb_withoutControl'] = result.total_Mid_Ebb - result.total_Mid_EbbControl
    result['Mid_Flood_withoutControl']=result.total_Mid_Flood-result.total_Mid_FloodControl

    #if want check all totals
    total = result.sum()
    print (total)


    return result

Answer 2

回答by Anil_M

Here is an example of loading data per column in a dataframe after each iteration of a loop. While this is not THE only method, it's one that helps understand concept better.

这是在每次循环迭代后加载数据帧中每列数据的示例。虽然这不是唯一的方法，但它有助于更好地理解概念。

Necessary imports

必要的进口

import pandas as pd
from random import randint

First define an empty data-frame of 5 columns to match your problem

首先定义一个 5 列的空数据框来匹配您的问题

df = pd.DataFrame(columns=['A','B','C','D','E'])

Next we iterate through for loop and generate value using randint()and add one value at a time to each column Staring with 'A' all the way to 'E',

接下来，我们遍历 for 循环并使用生成值randint()并一次向每一列添加一个值从“A”一直盯着“E”，

for i in range(5): #add 5 rows of data
    df.loc[i, ['A']] = randint(0,99)
    df.loc[i, ['B']] = randint(0,99)
    df.loc[i, ['C']] = randint(0,99)
    df.loc[i, ['D']] = randint(0,99)
    df.loc[i, ['E']] = randint(0,99)

We get a DF whose 5 rows are populated.

我们得到一个填充了 5 行的 DF。

>>> df
    A   B   C   D   E
0   4  74  71  37  90
1  41  80  77  81   8
2  14  16  82  98  89
3   1  77   3  56  91
4  34   9  85  44  19

Hope above helps and you are able to tailor to your needs.

希望以上有所帮助，您可以根据自己的需求进行定制。

Answer 3

回答by chthonicdaemon

Notethis does not produce a row per file as requested, but it more of a comment about general use of Pandas for problems like this - it is often easier to read all the data then process using the pandas files than to write your own loops over different cases.

请注意，这不会按要求为每个文件生成一行，但它更多的是对 Pandas 对此类问题的一般使用的评论 - 读取所有数据然后使用 Pandas 文件进行处理通常比编写自己的循环更容易不同的情况。

I think you are not using pandas in the idiomatic way here. I think you will save a lot of code and get a more understandable result if you do it this way:

我认为您在这里没有以惯用的方式使用熊猫。我认为如果您这样做，您将节省大量代码并获得更易于理解的结果：

controlstations = ['C1', 'C2', 'C3', 'SR2']
df = pd.concat(pd.read_excel(file_, sheetname='General Impact') for file_ in files)
df['Control'] = df.Station.isin(controlstations)
counts = df.groupby(['Control', 'Tidal Mode']).Station.agg('nunique')

So here you are reading all the excel files into a single dataframe first, then adding a column to indicate if that is a control station or not, then using groupby to count the different combinations.

因此，在这里您首先将所有 excel 文件读入单个数据帧，然后添加一列以指示它是否是控制站，然后使用 groupby 计算不同的组合。

countsis a series with a two-dimensional index (for some made up data):

counts是一个带有二维索引的系列（对于一些合成数据）：

Control  Tidal Mode
False    Mid-Ebb       2
         Mid-Flood     2
True     Mid-Ebb       2
         Mid-Flood     2

You can access the values you have in your function like this:

您可以像这样访问函数中的值：

total_Mid_Ebb = counts['Mid-Ebb'].sum()
total_Mid_Ebb_Control = counts['Mid-Ebb', True]
total_Mid_Flood = counts['Mid-Flood'].sum()
total_Mid_Flood_Control = counts['Mid-Flood', True]

After which you can easily add them to a DataFrame:

之后，您可以轻松地将它们添加到 DataFrame 中：

import datetime
today = datetime.datetime.today()
totals = [total_Mid_Ebb, total_Mid_Flood, total_Mid_Ebb_Control, total_Mid_Flood_Control]
result = pd.DataFrame(data=[totals], columns=['Mid Ebb Total', 'Mid Flood Total', 'Mid Ebb Control', 'Mid Flood Control'],
                       index=[today])

Python 如何将值从循环存储到数据帧？

提问by JOHN

采纳答案by jezrael

回答by Anil_M

回答by chthonicdaemon

相关推荐

最近更新

标签

Python 如何将值从循环存储到数据帧？

提问by JOHN

采纳答案by jezrael

回答by Anil_M

回答by chthonicdaemon

相关推荐

Python 什么时候在 Django 中使用 get、get_queryset、get_context_data？

在python中使用特定列名过滤pandas数据框

Python 如何在 Keras 中返回验证丢失的历史记录

Python 二进制交叉熵损失计算中 np.dot 和 np.multiply 与 np.sum 的区别

相关推荐

最近更新

标签