Python 不能用灵活的类型执行reduce
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43442415/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
cannot perform reduce with flexible type
提问by chamscab
I have this dataset:
我有这个数据集:
Game1 Game2 Game3 Game4 Game5
Player1 2 6 5 2 2
Player2 6 4 1 8 4
Player3 8 3 2 1 5
Player4 4 9 4 7 9
I want to calcultate the sum of the 5 games for every player.
我想计算每个玩家的 5 场比赛的总和。
This is my code :
这是我的代码:
import csv
f=open('Games','rb')
f=csv.reader(f,delimiter=';')
lst=list(f)
lst
import numpy as np
myarray = np.asarray(lst)
x=myarray[1,1:] #First player
y=np.sum(x)
I had the error "cannot perform reduce with flexible type". Im really very new to python and I need your help.
我有错误“无法使用灵活类型执行reduce”。我对python真的很陌生,我需要你的帮助。
Thank you
谢谢
采纳答案by MaxU
Consider using Pandas module:
考虑使用Pandas 模块:
import pandas as pd
df = pd.read_csv('/path/to.file.csv', sep=';')
Resulting DataFrame:
结果数据帧:
In [196]: df
Out[196]:
Game1 Game2 Game3 Game4 Game5
Player1 2 6 5 2 2
Player2 6 4 1 8 4
Player3 8 3 2 1 5
Player4 4 9 4 7 9
Sum:
和:
In [197]: df.sum(axis=1)
Out[197]:
Player1 17
Player2 23
Player3 19
Player4 33
dtype: int64
In [198]: df.sum(1).values
Out[198]: array([17, 23, 19, 33], dtype=int64)
回答by NaN
You can still use a structured array as long as you familiarize yourself with the dtypes. Since your data set is extremely small, the following may serve as an example of using numpy in conjunction with list comprehensions when your dtype is uniform but named
只要您熟悉 dtypes,您仍然可以使用结构化数组。由于您的数据集非常小,因此当您的 dtype 是统一的但已命名时,以下可以作为将 numpy 与列表推导结合使用的示例
dt = [('Game1', '<i4'), ('Game2', '<i4'), ('Game3', '<i4'),
('Game4', '<i4'), ('Game5', '<i4')]
a = np.array([(2, 6, 5, 2, 2),
(6, 4, 1, 8, 4),
(8, 3, 2, 1, 5),
(4, 9, 4, 7, 9)], dtype= dt)
nms = a.dtype.names
by_col = [(i, a[i].sum()) for i in nms if a[i].dtype.kind in ('i', 'f')]
by_col
[('Game1', 20), ('Game2', 22), ('Game3', 12), ('Game4', 18), ('Game5', 20)]
by_row = [("player {}".format(i), sum(a[i])) for i in range(a.shape[0])]
by_row
[('player 0', 17), ('player 1', 23), ('player 2', 19), ('player 3', 33)]
In this example, it would be a real pain to get each sum individually for each column name. That is where the ... a[i] for i in nms bit is useful since the list of names was retrieved by nms = a.dtype.names. Since you are doing a 'sum' then you want to restrict the summation to only integer and float types, hence the a[i].dtype.kind portion.
在这个例子中,为每个列名单独获取每个总和将是一个真正的痛苦。这就是 ... a[i] for i in nms 位很有用的地方,因为名称列表是由 nms = a.dtype.names 检索的。由于您正在执行“总和”,因此您希望将总和限制为仅整数和浮点类型,因此 a[i].dtype.kind 部分。
Summing by row is just as easy but you will notice that I didn't use this syntax but a slightly different one to avoid the error message
按行求和同样简单,但您会注意到我没有使用这种语法,而是使用了稍微不同的语法以避免出现错误消息
a[0].sum() # massive failure
....snip out huge error stuff...
TypeError: cannot perform reduce with flexible type
# whereas, this works....
sum(a[0]) # use list/tuple summation
Perhaps 'flexible' data types don't live up to their name. So you can still work with structured and recarrays if that is the way that your data comes in. You can become adept at simply reformatting your data by slicing and altering dtypes to suit your purpose. For example, since your data type are all the same and you don't have a monstrous dataset, then you can use many methods to convert to a simple structured array.
也许“灵活”的数据类型名不副实。因此,如果这是您的数据进入的方式,您仍然可以使用结构化和重新排列。您可以熟练地通过切片和更改 dtype 来满足您的目的,从而简单地重新格式化您的数据。例如,由于您的数据类型都相同并且您没有庞大的数据集,那么您可以使用多种方法转换为简单的结构化数组。
b = np.array([list(a[i]) for i in range(a.shape[0])])
b
array([[2, 6, 5, 2, 2],
[6, 4, 1, 8, 4],
[8, 3, 2, 1, 5],
[4, 9, 4, 7, 9]])
b.sum(axis=0)
array([20, 22, 12, 18, 20])
b.sum(axis=1)
array([17, 23, 19, 33])
So you have many options when dealing with structured arrays and depending on whether you need to work in pure python, numpy, pandas or a hybrid, then you should familiarize yourself with all the options.
因此,在处理结构化数组时,您有很多选择,并且取决于您是需要在纯 python、numpy、pandas 还是混合体中工作,那么您应该熟悉所有选项。
ADDENDUM
附录
As a shortcut, I failed to mention taking 'views' of arrays that are structured in nature, but have the same dtype. In the above case, a simple way to produce the requirements for simple array calculations by row or column are as follows... a copy of the array was made, but not necessary
作为一种快捷方式,我没有提到对本质上结构化但具有相同 dtype 的数组进行“视图”。在上面的例子中,产生按行或列进行简单数组计算的要求的一种简单方法如下...制作了数组的副本,但不是必需的
b = a.view(np.int32).reshape(len(a), -1)
b
array([[2, 6, 5, 2, 2],
[6, 4, 1, 8, 4],
[8, 3, 2, 1, 5],
[4, 9, 4, 7, 9]])
b.dtype
dtype('int32')
b.sum(axis=0)
array([20, 22, 12, 18, 20])
b.sum(axis=1)
array([17, 23, 19, 33])
回答by Sigve Karolius
The complication with using numpy is that one has two sources of error (and documentation to read), namely python itself as well as numpy.
使用 numpy 的复杂之处在于它有两个错误来源(和要阅读的文档),即 python 本身和 numpy。
I believe your problem here is that you are working with a so-called structured (numpy) array.
我相信您的问题是您正在使用所谓的结构化(numpy)数组。
Consider the following example:
考虑以下示例:
>>> import numpy as np
>>> a = np.array([(1,2), (4,5)], dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a.sum()
TypeError: cannot perform reduce with flexible type
Now, I first select the data I want to use:
现在,我首先选择我要使用的数据:
>>> import numpy as np
>>> a = np.array([(1,2), (4,5)], dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a["Game 1"].sum()
5.0
Which is what I wanted.
这就是我想要的。
Maybe you would consider using pandas(python library), or change language to R.
也许您会考虑使用pandas(python 库),或者将语言更改为R。
Personal opinions
个人意见
Even though "numpy" certainly is a mighty library I still avoid using it for data-science and other "activities" where the program is designed around "flexible" data-types. Personally I use numpy when I need something to be fast and maintainable (it is easy to write "code for the future"), but I do not have the time to write a C program.
尽管“numpy”确实是一个强大的库,但我仍然避免将它用于数据科学和其他“活动”,其中程序是围绕“灵活的”数据类型设计的。就我个人而言,当我需要一些快速且可维护的东西时,我会使用 numpy(编写“面向未来的代码”很容易),但我没有时间编写 C 程序。
As far as Pandas goes it is convenient for us "Python hackers" because it is "R data structures implemented in Python", whereas "R" is (obviously) an entirely new language. I personally use R as I consider Pandas to be under rapid development, which makes it difficult to write "code with the future in mind".
就 Pandas 而言,它对我们“Python 黑客”来说很方便,因为它是“在 Python 中实现的 R 数据结构”,而“R”(显然)是一种全新的语言。我个人使用 R,因为我认为 Pandas 正在快速发展,这使得编写“考虑到未来的代码”变得困难。
As suggested in a comment (@jorijnsmit I believe) there is no need to introduce large dependencies, such as pandas, for "simple" cases. The minimalistic example below, which is compatible to both Python 2 and 3, uses "typical" Python tricks to massage the data it the question.
正如评论中所建议的(我相信@jorijnsmit),对于“简单”的情况,不需要引入大的依赖项,例如熊猫。下面的简约示例与 Python 2 和 3 都兼容,它使用“典型”的 Python 技巧来处理问题中的数据。
import csv
## Data-file
data = \
'''
, Game1, Game2, Game3, Game4, Game5
Player1, 2, 6, 5, 2, 2
Player2, 6, 4 , 1, 8, 4
Player3, 8, 3 , 2, 1, 5
Player4, 4, 9 , 4, 7, 9
'''
# Write data to file
with open('data.csv', 'w') as FILE:
FILE.write(data)
print("Raw data:")
print(data)
# 1) Read the data-file (and strip away spaces), the result is data by column:
with open('data.csv','rb') as FILE:
raw = [ [ item.strip() for item in line] \
for line in list(csv.reader(FILE,delimiter=',')) if line]
print("Data after Read:")
print(raw)
# 2) Convert numerical data to integers ("float" would also work)
for (i, line) in enumerate(raw[1:], 1):
for (j, item) in enumerate(line[1:], 1):
raw[i][j] = int(item)
print("Data after conversion:")
print(raw)
# 3) Use the data...
print("Use the data")
for i in range(1, len(raw)):
print("Sum for Player %d: %d" %(i, sum(raw[i][1:])) )
for i in range(1, len(raw)):
print("Total points in Game %d: %d" %(i, sum(list(zip(*raw))[i][1:])) )
The output would be:
输出将是:
Raw data:
, Game1, Game2, Game3, Game4, Game5
Player1, 2, 6, 5, 2, 2
Player2, 6, 4 , 1, 8, 4
Player3, 8, 3 , 2, 1, 5
Player4, 4, 9 , 4, 7, 9
Data after Read:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', '2', '6', '5', '2', '2'], ['Player2', '6', '4', '1', '8', '4'], ['Player3', '8', '3', '2', '1', '5'], ['Player4', '4', '9', '4', '7', '9']]
Data after conversion:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', 2, 6, 5, 2, 2], ['Player2', 6, 4, 1, 8, 4], ['Player3', 8, 3, 2, 1, 5], ['Player4', 4, 9, 4, 7, 9]]
Use the data
Sum for Player 1: 17
Sum for Player 2: 23
Sum for Player 3: 19
Sum for Player 4: 33
Total points in Game 1: 20
Total points in Game 2: 22
Total points in Game 3: 12
Total points in Game 4: 18
回答by Anthony Perot
You don't need numpy at all, just do this:
您根本不需要 numpy,只需执行以下操作:
import csv
from collections import OrderedDict
with open('games') as f:
reader = csv.reader(f, delimiter=';')
data = list(reader)
sums = OrderedDict()
for row in data[1:]:
player, games = row[0], row[1:]
sums[player] = sum(map(int, games))