Python - 计算 csv 文件中每一列的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25597477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - Calculate average for every column in a csv file
提问by Pabloo LR
I'm new in Python and I'm trying to get the average of every (column or row) of a csv file for then select the values that are higher than the double of the average of its column (o row). My file have hundreds of columns, and have float values like these:
我是 Python 新手,我正在尝试获取 csv 文件的每个(列或行)的平均值,然后选择高于其列(o 行)平均值的两倍的值。我的文件有数百列,并且具有如下浮点值:
845.123,452.234,653.23,...
432.123,213.452.421.532,...
743.234,532,432.423,...
I've tried several changes to my code to get the average for every column (separately), but at the moment my code is like this one:
我已经尝试对我的代码进行多次更改以获得每一列的平均值(单独),但目前我的代码是这样的:
def AverageColumn (c):
f=open(csv,"r")
average=0
Sum=0
column=len(f)
for i in range(0,column):
for n in i.split(','):
n=float(n)
Sum += n
average = Sum / len(column)
return 'The average is:', average
f.close()
csv="MDT25.csv"
print AverageColumn(csv)
But I always get a error like " f has no len()" or "'int' object is not iterable"...
但我总是收到类似“f 没有 len()”或“'int' 对象不可迭代”之类的错误...
I'd really appreciate if someone show me how to get the average for every column (or row, as you want), and then select the values that are higher than the double of the average of its column (or row). I'd rather without importing modules as csv, but as you prefer. Thanks!
如果有人告诉我如何获得每一列(或行,如您所愿)的平均值,然后选择高于其列(或行)平均值的两倍的值,我将不胜感激。我宁愿不将模块作为 csv 导入,而是根据您的喜好。谢谢!
采纳答案by monkut
Here's a clean up of your function, but it probably doesn't do what you want it to do. Currently, it is getting the average of all values in all columns:
这是您的功能的清理,但它可能不会做您想要它做的事情。目前,它正在获取所有列中所有值的平均值:
def average_column (csv):
f = open(csv,"r")
average = 0
Sum = 0
row_count = 0
for row in f:
for column in row.split(','):
n=float(column)
Sum += n
row_count += 1
average = Sum / len(column)
f.close()
return 'The average is:', average
I would use the csvmodule (which makes csv parsing easier), with a Counterobject to manage the column totals and a context managerto open the file (no need for a close()):
我会使用该csv模块(这使得 csv 解析更容易),用一个Counter对象来管理列总数和一个上下文管理器来打开文件(不需要close()):
import csv
from collections import Counter
def average_column (csv_filepath):
column_totals = Counter()
with open(csv_filepath,"rb") as f:
reader = csv.reader(f)
row_count = 0.0
for row in reader:
for column_idx, column_value in enumerate(row):
try:
n = float(column_value)
column_totals[column_idx] += n
except ValueError:
print "Error -- ({}) Column({}) could not be converted to float!".format(column_value, column_idx)
row_count += 1.0
# row_count is now 1 too many so decrement it back down
row_count -= 1.0
# make sure column index keys are in order
column_indexes = column_totals.keys()
column_indexes.sort()
# calculate per column averages using a list comprehension
averages = [column_totals[idx]/row_count for idx in column_indexes]
return averages
回答by Adam Smith
If you want to do it without stdlib modules for some reason:
如果您出于某种原因想在没有 stdlib 模块的情况下执行此操作:
with open('path/to/csv') as infile:
columns = list(map(float,next(infile).split(',')))
for how_many_entries, line in enumerate(infile,start=2):
for (idx,running_avg), new_data in zip(enumerate(columns), line.split(',')):
columns[idx] += (float(new_data) - running_avg)/how_many_entries
回答by Amadan
First of all, as people say - CSV format looks simple, but it can be quite nontrivial, especially once strings enter play. monkut already gave you two solutions, the cleaned-up version of your code, and one more that uses CSV library. I'll give yet another option: no libraries, but plenty of idiomatic code to chew on, which gives you averages for all columns at once.
首先,正如人们所说 - CSV 格式看起来很简单,但它可能非常重要,尤其是当字符串进入游戏时。Monkut 已经为您提供了两种解决方案,一种是代码的清理版本,另一种是使用 CSV 库的解决方案。我将提供另一种选择:没有库,但有大量惯用代码可供阅读,这可以同时为您提供所有列的平均值。
def get_averages(csv):
column_sums = None
with open(csv) as file:
lines = file.readlines()
rows_of_numbers = [map(float, line.split(',')) for line in lines]
sums = map(sum, zip(*rows_of_numbers))
averages = [sum_item / len(lines) for sum_item in sums]
return averages
Things to note: In your code, fis a file object. You try to close it after you have already returned the value. This code will never be reached: nothing executes after a returnhas been processed, unless you have a try...finallyconstruct, or withconstruct (like I am using - which will automatically close the stream).
注意事项:在你的代码中,f是一个文件对象。您在返回值后尝试关闭它。永远不会到达此代码:return处理a 后不会执行任何操作,除非您有一个try...finally构造或with构造(就像我正在使用的那样 - 它将自动关闭流)。
map(f, l), or equivalent [f(x) for x in l], creates a new list whose elements are obtained by applying function fon each element on l.
map(f, l),或等价物[f(x) for x in l],创建一个新列表,其元素是通过f对 上的每个元素应用函数来获得的l。
f(*l)will "unpack" the list lbefore function invocation, giving to function feach element as a separate argument.
f(*l)将l在函数调用之前“解压缩”列表,将f每个元素作为单独的参数来执行。
回答by Code-Apprentice
I suggest breaking this into several smaller steps:
我建议将其分解为几个较小的步骤:
- Read the CSV file into a 2D list or 2D array.
- Calculate the averages of each column.
- 将 CSV 文件读入二维列表或二维数组。
- 计算每列的平均值。
Each of these steps can be implemented as two separate functions. (In a realistic situation where the CSV file is large, reading the complete file into memory might be prohibitive due to space constraints. However, for a learning exercise, this is a great way to gain an understanding of writing your own functions.)
这些步骤中的每一个都可以作为两个独立的功能来实现。(在 CSV 文件很大的现实情况下,由于空间限制,将完整文件读入内存可能会令人望而却步。但是,对于学习练习,这是了解编写自己的函数的好方法。)
回答by John Wilkins
I hope this helps you out......Some help....here is what I would do - which is use numpy:
我希望这可以帮助你......一些帮助......这是我会做的 - 这是使用numpy:
# ==========================
import numpy as np
import csv as csv
# Assume that you have 2 columns and a header-row: The Columns are (1)
# question # ...1; (2) question 2
# ========================================
readdata = csv.reader(open('filename.csv', 'r')) #this is the file you
# ....will write your original file to....============
data = []
for row in readdata:
data.append(row)
Header = data[0]
data.pop(0)
q1 = []
q2 = []
# ========================================
for i in range(len(data)):
q1.append(int(data[i][1]))
q2.append(int(data[i][2]))
# ========================================
# ========================================
# === Means/Variance - Work-up Section ===
# ========================================
print ('Mean - Question-1: ', (np.mean(q1)))
print ('Variance,Question-1: ', (np.var(q1)))
print ('==============================================')
print ('Mean - Question-2: ', (np.mean(q2)))
print ('Variance,Question-2: ', (np.var(q2)))
回答by Sahana M
This definitely worked for me!
这绝对对我有用!
import numpy as np
import csv
readdata = csv.reader(open('C:\...\your_file_name.csv', 'r'))
data = []
for row in readdata:
data.append(row)
#incase you have a header/title in the first row of your csv file, do the next line else skip it
data.pop(0)
q1 = []
for i in range(len(data)):
q1.append(int(data[i][your_column_number]))
print ('Mean of your_column_number : ', (np.mean(q1)))

