pandas 用熊猫创建空的 csv 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35916378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:50:43  来源:igfitidea点击:

Create empty csv file with pandas

pythoncsvpandasis-empty

提问by PaulBarr

I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?

我正在通过多个 csv 文件进行交互,并希望将平均温度附加到一个空白的 csv 文件中。你如何用Pandas创建一个空的 csv 文件?

for EachMonth in MonthsInAnalysis:
    TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
    MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
    with open('my_csv.csv', 'a') as f:
        df.to_csv(f, header=False)

So in the above code how do I create the my_csv.csvprior to the forloop?

那么在上面的代码中我如何创建循环my_csv.csv之前的for

Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.

请注意,我知道您可以创建一个数据框,然后将数据框保存到 csv,但我对您是否可以跳过此步骤感兴趣。

In terms of context I have the following csv files:

就上下文而言,我有以下 csv 文件:

enter image description here

在此处输入图片说明

Each of which have the following structure:

每个都具有以下结构:

enter image description here

在此处输入图片说明

The Day column reads up to 30 days for each file.

日期列为每个文件读取最多 30 天。

I would like to output a csv file that looks like this:

我想输出一个如下所示的 csv 文件:

enter image description here

在此处输入图片说明

But obviously includes all the days for all the months.

但显然包括所有月份的所有天数。

My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.

我的问题是我不知道每个分析中包含哪些月份,因此我想使用一个 for 循环,该循环使用一个包含该信息的列表来访问相关的 csvs,计算平均温度,然后将其全部保存为一个.csv

Input as text:

输入为文本:

    Unnamed: 0  AirTemperature  AirHumidity SoilTemperature SoilMoisture    LightIntensity  WindSpeed   Year    Month   Day Hour    Minute  Second  TimeStamp   MonthCategorical    TimeOfDay
6   6   18  84  17  41  40  4   2016    1   1   6   1   1   10106   January Day
7   7   20  88  22  92  31  0   2016    1   1   7   1   1   10107   January Day
8   8   23  1   22  59  3   0   2016    1   1   8   1   1   10108   January Day
9   9   23  3   22  72  41  4   2016    1   1   9   1   1   10109   January Day
10  10  24  63  23  83  85  0   2016    1   1   10  1   1   10110   January Day
11  11  29  73  27  50  1   4   2016    1   1   11  1   1   10111   January Day

采纳答案by MaxU

I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean()and save resulting DF into CSV file:

我会这样做:首先将所有 CSV 文件(但只有您真正需要的列)读入一个 DF,然后groupby(['Year','Month','Day']).mean()将生成的 DF 生成并保存到 CSV 文件中:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')

and if want to ignore the year:

如果想忽略年份:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')

Some details:

一些细节:

(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))

will generate tuple of data frames from all your CSV files

将从您的所有 CSV 文件生成数据框元组

pd.concat(...)

will concatenate them into resulting single DF

将它们连接成结果单个 DF

df.groupby(['Year','Month','Day']).mean()

will produce wanted report as a data frame, which might be saved into new CSV file:

将生成想要的报告作为数据框,它可能会保存到新的 CSV 文件中:

.to_csv('my_csv.csv')

回答by Stop harming Monica

Just open the file in write mode to create it.

只需以写入模式打开文件即可创建它。

with open('my_csv.csv', 'w'):
    pass

Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.

无论如何,我认为您不应该多次打开和关闭文件。你最好打开文件一次,多写几遍。

with open('my_csv.csv', 'w') as f:
    for EachMonth in MonthsInAnalysis:
        TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
        MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
        df.to_csv(f, header=False)

回答by Shinto Joseph

Creating a blank csv file is as simple as this one

创建一个空白的 csv 文件就像这个一样简单

import pandas as pd

pd.DataFrame({}).to_csv("filename.csv")

回答by Chris

The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:

问题有点不清楚,但假设您必须逐月迭代,并按照说明应用 groupby,只需使用:

 #Before loops
 dflist=[]

Then in each loop do something like:

然后在每个循环中执行以下操作:

 dflist.append(MeanDailyTemperaturesForCurrentMonth)

Then at the end:

然后在最后:

 final_df = pd.concat([dflist], axis=1)

and this will join everything into one dataframe.

这会将所有内容合并为一个数据帧。

Look at:

看着:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

http://pandas.pydata.org/pandas-docs/stable/merging.html

http://pandas.pydata.org/pandas-docs/stable/merging.html