Python 如何将值从循环存储到数据帧?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48314657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:38:59  来源:igfitidea点击:

How to store values from loop to a dataframe?

pythonpandas

提问by JOHN

I have created a loop that generates some values. I want to store those values in a data frame. For example, completed one loop, append to the first row.

我创建了一个生成一些值的循环。我想将这些值存储在数据框中。例如,完成一个循环,追加到第一行。

def calculate (allFiles):

    result = pd.DataFrame(columns = ['Date','Mid Ebb Total','Mid Flood Total','Mid Ebb Control','Mid Flood Control'])

    total_Mid_Ebb = 0
    total_Mid_Flood = 0
    total_Mid_EbbControl = 0
    total_Mid_FloodControl = 0

    for file_ in allFiles:
        xls = pd.ExcelFile(file_)
        df = xls.parse('General Impact')
        Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter 
        Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
        Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
        Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
        total_Mid_Ebb += Mid_Ebb.Station.nunique() #count unique stations = sample number
        total_Mid_Flood += Mid_Flood.Station.nunique()
        total_Mid_EbbControl += Mid_Ebb_control.Station.nunique()
        total_Mid_FloodControl += Mid_Flood_control.Station.nunique()

    Mid_Ebb_withoutControl = total_Mid_Ebb - total_Mid_EbbControl
    Mid_Flood_withoutControl = total_Mid_Flood - total_Mid_FloodControl

    print('Ebb Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl))
    print('Flood Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Flood, Mid_Flood_withoutControl, total_Mid_FloodControl))

The dataframe result contains 4 columns. The date is fixed. I would like to put total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl to the dataframe.

数据框结果包含 4 列。日期是固定的。我想将 total_Mid_Ebb、Mid_Ebb_withoutControl、total_Mid_EbbControl 放入数据帧。

采纳答案by jezrael

I believe you need append scalars in loop to list of tuples and then use DataFrameconstructor. Last count differences in resultDataFrame:

我相信您需要将循环中的标量附加到元组列表中,然后使用DataFrame构造函数。resultDataFrame 中的最后计数差异:

def calculate (allFiles):

    data = []
    for file_ in allFiles:
        xls = pd.ExcelFile(file_)
        df = xls.parse('General Impact')
        Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter 
        Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
        Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
        Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
        total_Mid_Ebb = Mid_Ebb.Station.nunique() #count unique stations = sample number
        total_Mid_Flood = Mid_Flood.Station.nunique()
        total_Mid_EbbControl = Mid_Ebb_control.Station.nunique()
        total_Mid_FloodControl = Mid_Flood_control.Station.nunique()
        data.append((total_Mid_Ebb, 
                     total_Mid_Flood, 
                     total_Mid_EbbControl, 
                     total_Mid_FloodControl))

    cols=['total_Mid_Ebb','total_Mid_Flood','total_Mid_EbbControl','total_Mid_FloodControl']

    result = pd.DataFrame(data, columns=cols)
    result['Mid_Ebb_withoutControl'] = result.total_Mid_Ebb - result.total_Mid_EbbControl
    result['Mid_Flood_withoutControl']=result.total_Mid_Flood-result.total_Mid_FloodControl

    #if want check all totals
    total = result.sum()
    print (total)


    return result

回答by Anil_M

Here is an example of loading data per column in a dataframe after each iteration of a loop. While this is not THE only method, it's one that helps understand concept better.

这是在每次循环迭代后加载数据帧中每列数据的示例。虽然这不是唯一的方法,但它有助于更​​好地理解概念。

Necessary imports

必要的进口

import pandas as pd
from random import randint

First define an empty data-frame of 5 columns to match your problem

首先定义一个 5 列的空数据框来匹配您的问题

df = pd.DataFrame(columns=['A','B','C','D','E'])

Next we iterate through for loop and generate value using randint()and add one value at a time to each column Staring with 'A' all the way to 'E',

接下来,我们遍历 for 循环并使用生成值randint()并一次向每一列添加一个值从“A”一直盯着“E”,

for i in range(5): #add 5 rows of data
    df.loc[i, ['A']] = randint(0,99)
    df.loc[i, ['B']] = randint(0,99)
    df.loc[i, ['C']] = randint(0,99)
    df.loc[i, ['D']] = randint(0,99)
    df.loc[i, ['E']] = randint(0,99)

We get a DF whose 5 rows are populated.

我们得到一个填充了 5 行的 DF。

>>> df
    A   B   C   D   E
0   4  74  71  37  90
1  41  80  77  81   8
2  14  16  82  98  89
3   1  77   3  56  91
4  34   9  85  44  19

Hope above helps and you are able to tailor to your needs.

希望以上有所帮助,您可以根据自己的需求进行定制。

回答by chthonicdaemon

Notethis does not produce a row per file as requested, but it more of a comment about general use of Pandas for problems like this - it is often easier to read all the data then process using the pandas files than to write your own loops over different cases.

请注意,这不会按要求为每个文件生成一行,但它更多的是对 Pandas 对此类问题的一般使用的评论 - 读取所有数据然后使用 Pandas 文件进行处理通常比编写自己的循环更容易不同的情况。

I think you are not using pandas in the idiomatic way here. I think you will save a lot of code and get a more understandable result if you do it this way:

我认为您在这里没有以惯用的方式使用熊猫。我认为如果您这样做,您将节省大量代码并获得更易于理解的结果:

controlstations = ['C1', 'C2', 'C3', 'SR2']
df = pd.concat(pd.read_excel(file_, sheetname='General Impact') for file_ in files)
df['Control'] = df.Station.isin(controlstations)
counts = df.groupby(['Control', 'Tidal Mode']).Station.agg('nunique')

So here you are reading all the excel files into a single dataframe first, then adding a column to indicate if that is a control station or not, then using groupby to count the different combinations.

因此,在这里您首先将所有 excel 文件读入单个数据帧,然后添加一列以指示它是否是控制站,然后使用 groupby 计算不同的组合。

countsis a series with a two-dimensional index (for some made up data):

counts是一个带有二维索引的系列(对于一些合成数据):

Control  Tidal Mode
False    Mid-Ebb       2
         Mid-Flood     2
True     Mid-Ebb       2
         Mid-Flood     2

You can access the values you have in your function like this:

您可以像这样访问函数中的值:

total_Mid_Ebb = counts['Mid-Ebb'].sum()
total_Mid_Ebb_Control = counts['Mid-Ebb', True]
total_Mid_Flood = counts['Mid-Flood'].sum()
total_Mid_Flood_Control = counts['Mid-Flood', True]

After which you can easily add them to a DataFrame:

之后,您可以轻松地将它们添加到 DataFrame 中:

import datetime
today = datetime.datetime.today()
totals = [total_Mid_Ebb, total_Mid_Flood, total_Mid_Ebb_Control, total_Mid_Flood_Control]
result = pd.DataFrame(data=[totals], columns=['Mid Ebb Total', 'Mid Flood Total', 'Mid Ebb Control', 'Mid Flood Control'],
                       index=[today])