Python 如何将值从循环存储到数据帧?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48314657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to store values from loop to a dataframe?
提问by JOHN
I have created a loop that generates some values. I want to store those values in a data frame. For example, completed one loop, append to the first row.
我创建了一个生成一些值的循环。我想将这些值存储在数据框中。例如,完成一个循环,追加到第一行。
def calculate (allFiles):
result = pd.DataFrame(columns = ['Date','Mid Ebb Total','Mid Flood Total','Mid Ebb Control','Mid Flood Control'])
total_Mid_Ebb = 0
total_Mid_Flood = 0
total_Mid_EbbControl = 0
total_Mid_FloodControl = 0
for file_ in allFiles:
xls = pd.ExcelFile(file_)
df = xls.parse('General Impact')
Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter
Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
total_Mid_Ebb += Mid_Ebb.Station.nunique() #count unique stations = sample number
total_Mid_Flood += Mid_Flood.Station.nunique()
total_Mid_EbbControl += Mid_Ebb_control.Station.nunique()
total_Mid_FloodControl += Mid_Flood_control.Station.nunique()
Mid_Ebb_withoutControl = total_Mid_Ebb - total_Mid_EbbControl
Mid_Flood_withoutControl = total_Mid_Flood - total_Mid_FloodControl
print('Ebb Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl))
print('Flood Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Flood, Mid_Flood_withoutControl, total_Mid_FloodControl))
The dataframe result contains 4 columns. The date is fixed. I would like to put total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl to the dataframe.
数据框结果包含 4 列。日期是固定的。我想将 total_Mid_Ebb、Mid_Ebb_withoutControl、total_Mid_EbbControl 放入数据帧。
采纳答案by jezrael
I believe you need append scalars in loop to list of tuples and then use DataFrame
constructor. Last count differences in result
DataFrame:
我相信您需要将循环中的标量附加到元组列表中,然后使用DataFrame
构造函数。result
DataFrame 中的最后计数差异:
def calculate (allFiles):
data = []
for file_ in allFiles:
xls = pd.ExcelFile(file_)
df = xls.parse('General Impact')
Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter
Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
total_Mid_Ebb = Mid_Ebb.Station.nunique() #count unique stations = sample number
total_Mid_Flood = Mid_Flood.Station.nunique()
total_Mid_EbbControl = Mid_Ebb_control.Station.nunique()
total_Mid_FloodControl = Mid_Flood_control.Station.nunique()
data.append((total_Mid_Ebb,
total_Mid_Flood,
total_Mid_EbbControl,
total_Mid_FloodControl))
cols=['total_Mid_Ebb','total_Mid_Flood','total_Mid_EbbControl','total_Mid_FloodControl']
result = pd.DataFrame(data, columns=cols)
result['Mid_Ebb_withoutControl'] = result.total_Mid_Ebb - result.total_Mid_EbbControl
result['Mid_Flood_withoutControl']=result.total_Mid_Flood-result.total_Mid_FloodControl
#if want check all totals
total = result.sum()
print (total)
return result
回答by Anil_M
Here is an example of loading data per column in a dataframe after each iteration of a loop. While this is not THE only method, it's one that helps understand concept better.
这是在每次循环迭代后加载数据帧中每列数据的示例。虽然这不是唯一的方法,但它有助于更好地理解概念。
Necessary imports
必要的进口
import pandas as pd
from random import randint
First define an empty data-frame of 5 columns to match your problem
首先定义一个 5 列的空数据框来匹配您的问题
df = pd.DataFrame(columns=['A','B','C','D','E'])
Next we iterate through for loop and generate value using randint()
and add one value at a time to each column Staring with 'A' all the way to 'E',
接下来,我们遍历 for 循环并使用生成值randint()
并一次向每一列添加一个值从“A”一直盯着“E”,
for i in range(5): #add 5 rows of data
df.loc[i, ['A']] = randint(0,99)
df.loc[i, ['B']] = randint(0,99)
df.loc[i, ['C']] = randint(0,99)
df.loc[i, ['D']] = randint(0,99)
df.loc[i, ['E']] = randint(0,99)
We get a DF whose 5 rows are populated.
我们得到一个填充了 5 行的 DF。
>>> df
A B C D E
0 4 74 71 37 90
1 41 80 77 81 8
2 14 16 82 98 89
3 1 77 3 56 91
4 34 9 85 44 19
Hope above helps and you are able to tailor to your needs.
希望以上有所帮助,您可以根据自己的需求进行定制。
回答by chthonicdaemon
Notethis does not produce a row per file as requested, but it more of a comment about general use of Pandas for problems like this - it is often easier to read all the data then process using the pandas files than to write your own loops over different cases.
请注意,这不会按要求为每个文件生成一行,但它更多的是对 Pandas 对此类问题的一般使用的评论 - 读取所有数据然后使用 Pandas 文件进行处理通常比编写自己的循环更容易不同的情况。
I think you are not using pandas in the idiomatic way here. I think you will save a lot of code and get a more understandable result if you do it this way:
我认为您在这里没有以惯用的方式使用熊猫。我认为如果您这样做,您将节省大量代码并获得更易于理解的结果:
controlstations = ['C1', 'C2', 'C3', 'SR2']
df = pd.concat(pd.read_excel(file_, sheetname='General Impact') for file_ in files)
df['Control'] = df.Station.isin(controlstations)
counts = df.groupby(['Control', 'Tidal Mode']).Station.agg('nunique')
So here you are reading all the excel files into a single dataframe first, then adding a column to indicate if that is a control station or not, then using groupby to count the different combinations.
因此,在这里您首先将所有 excel 文件读入单个数据帧,然后添加一列以指示它是否是控制站,然后使用 groupby 计算不同的组合。
counts
is a series with a two-dimensional index (for some made up data):
counts
是一个带有二维索引的系列(对于一些合成数据):
Control Tidal Mode
False Mid-Ebb 2
Mid-Flood 2
True Mid-Ebb 2
Mid-Flood 2
You can access the values you have in your function like this:
您可以像这样访问函数中的值:
total_Mid_Ebb = counts['Mid-Ebb'].sum()
total_Mid_Ebb_Control = counts['Mid-Ebb', True]
total_Mid_Flood = counts['Mid-Flood'].sum()
total_Mid_Flood_Control = counts['Mid-Flood', True]
After which you can easily add them to a DataFrame:
之后,您可以轻松地将它们添加到 DataFrame 中:
import datetime
today = datetime.datetime.today()
totals = [total_Mid_Ebb, total_Mid_Flood, total_Mid_Ebb_Control, total_Mid_Flood_Control]
result = pd.DataFrame(data=[totals], columns=['Mid Ebb Total', 'Mid Flood Total', 'Mid Ebb Control', 'Mid Flood Control'],
index=[today])