pandas 用熊猫创建空的 csv 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35916378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create empty csv file with pandas
提问by PaulBarr
I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?
我正在通过多个 csv 文件进行交互,并希望将平均温度附加到一个空白的 csv 文件中。你如何用Pandas创建一个空的 csv 文件?
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
So in the above code how do I create the my_csv.csv
prior to the for
loop?
那么在上面的代码中我如何创建循环my_csv.csv
之前的for
?
Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.
请注意,我知道您可以创建一个数据框,然后将数据框保存到 csv,但我对您是否可以跳过此步骤感兴趣。
In terms of context I have the following csv files:
就上下文而言,我有以下 csv 文件:
Each of which have the following structure:
每个都具有以下结构:
The Day column reads up to 30 days for each file.
日期列为每个文件读取最多 30 天。
I would like to output a csv file that looks like this:
我想输出一个如下所示的 csv 文件:
But obviously includes all the days for all the months.
但显然包括所有月份的所有天数。
My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.
我的问题是我不知道每个分析中包含哪些月份,因此我想使用一个 for 循环,该循环使用一个包含该信息的列表来访问相关的 csvs,计算平均温度,然后将其全部保存为一个.csv
Input as text:
输入为文本:
Unnamed: 0 AirTemperature AirHumidity SoilTemperature SoilMoisture LightIntensity WindSpeed Year Month Day Hour Minute Second TimeStamp MonthCategorical TimeOfDay
6 6 18 84 17 41 40 4 2016 1 1 6 1 1 10106 January Day
7 7 20 88 22 92 31 0 2016 1 1 7 1 1 10107 January Day
8 8 23 1 22 59 3 0 2016 1 1 8 1 1 10108 January Day
9 9 23 3 22 72 41 4 2016 1 1 9 1 1 10109 January Day
10 10 24 63 23 83 85 0 2016 1 1 10 1 1 10110 January Day
11 11 29 73 27 50 1 4 2016 1 1 11 1 1 10111 January Day
采纳答案by MaxU
I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean()
and save resulting DF into CSV file:
我会这样做:首先将所有 CSV 文件(但只有您真正需要的列)读入一个 DF,然后groupby(['Year','Month','Day']).mean()
将生成的 DF 生成并保存到 CSV 文件中:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')
and if want to ignore the year:
如果想忽略年份:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')
Some details:
一些细节:
(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))
will generate tuple of data frames from all your CSV files
将从您的所有 CSV 文件生成数据框元组
pd.concat(...)
will concatenate them into resulting single DF
将它们连接成结果单个 DF
df.groupby(['Year','Month','Day']).mean()
will produce wanted report as a data frame, which might be saved into new CSV file:
将生成想要的报告作为数据框,它可能会保存到新的 CSV 文件中:
.to_csv('my_csv.csv')
回答by Stop harming Monica
Just open the file in write mode to create it.
只需以写入模式打开文件即可创建它。
with open('my_csv.csv', 'w'):
pass
Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.
无论如何,我认为您不应该多次打开和关闭文件。你最好打开文件一次,多写几遍。
with open('my_csv.csv', 'w') as f:
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
df.to_csv(f, header=False)
回答by Shinto Joseph
Creating a blank csv file is as simple as this one
创建一个空白的 csv 文件就像这个一样简单
import pandas as pd
pd.DataFrame({}).to_csv("filename.csv")
回答by Chris
The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:
问题有点不清楚,但假设您必须逐月迭代,并按照说明应用 groupby,只需使用:
#Before loops
dflist=[]
Then in each loop do something like:
然后在每个循环中执行以下操作:
dflist.append(MeanDailyTemperaturesForCurrentMonth)
Then at the end:
然后在最后:
final_df = pd.concat([dflist], axis=1)
and this will join everything into one dataframe.
这会将所有内容合并为一个数据帧。
Look at:
看着:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html