在 pandas Series 中设置值很慢,为什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30267338/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Setting values in pandas Series is slow, why?
提问by jkokorian
Question
题
Does anyone know why setting an item directly on a pandas series is so incredibly slow? Am I doing something wrong, or is it just the way it is?
有谁知道为什么直接在 Pandas 系列上设置一个项目如此之慢?我做错了什么,还是就是这样?
I ran a couple of tests to see what the fastest method is to set a value on a pandas Series object. Here are the results, ordered from fast to slow:
我进行了几次测试,以了解在 Pandas Series 对象上设置值的最快方法是什么。以下是结果,按从快到慢的顺序排列:
initialize array, set using integer index, create series
初始化数组,使用整数索引设置,创建系列
%%timeit
a = np.empty(1000, dtype='float')
for i in range(len(a)):
a[i] = 1.0
s = pd.Series(data=a)
1000 loops, best of 3: 630 μs per loop
1000 个循环,最好的 3 个:每个循环 630 μs
create empty list, add item using append, create series
创建空列表,使用追加添加项目,创建系列
%%timeit
l = []
for i in range(1000):
l.append(1.0)
s = pd.Series(data=l)
1000 loops, best of 3: 1.05 ms per loop
1000 个循环,最好的 3 个:每个循环 1.05 毫秒
initialize array, create series, set using set_value
初始化数组,创建系列,使用 set_value 设置
%%timeit
a = np.empty(1000, dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.set_value(i, 1.0)
100 loops, best of 3: 18.5 ms per loop
100 个循环,最好的 3 个:每个循环 18.5 毫秒
initialize array, create series, set using integer index
初始化数组,创建系列,使用整数索引设置
%%timeit
a = np.empty(1000, dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s[i] = 1.0
10 loops, best of 3: 30.2 ms per loop
10 个循环,最好的 3 个:每个循环 30.2 毫秒
intialize array, create series, set using iat
初始化数组,创建系列,使用 iat 设置
%%timeit
a = np.empty(1000, dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.iat[i] = 1.0
10 loops, best of 3: 36.2 ms per loop
10 个循环,最好的 3 个:每个循环 36.2 毫秒
initialize array, create series, set using iloc
初始化数组,创建系列,使用 iloc 设置
%%timeit
a = np.empty(1000, dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.iloc[i] = 1.0
1 loops, best of 3: 280 ms per loop
1 个循环,最好的 3 个:每个循环 280 毫秒
采纳答案by EdChum
From the docs
从文档
Since indexing with [] must handle a lot of cases (single-label access, slicing, boolean indexing, etc.), it has a bit of overhead in order to figure out what you're asking for.
由于使用 [] 进行索引必须处理很多情况(单标签访问、切片、布尔索引等),因此它需要一些开销才能确定您的要求。
So I get the following which should be comparable:
所以我得到以下应该可以比较的:
In [13]:
%%timeit
a = np.empty(1000,dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.iat[i] = 1.0
10 loops, best of 3: 23.3 ms per loop
In [14]:
%%timeit
a = np.empty(1000,dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.iloc[i] = 1.0
10 loops, best of 3: 159 ms per loop
for the other tests:
对于其他测试:
In [15]:
%%timeit
l = []
for i in range(1000):
l.append(1.0)
s = pd.Series(data=l)
1000 loops, best of 3: 525 μs per loop
In [16]:
%%timeit
a = np.empty(1000,dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s.set_value(i,1.0)
100 loops, best of 3: 10.1 ms per loop
In [17]:
%%timeit
a = np.empty(1000,dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
s[i] = 1.0
100 loops, best of 3: 17.5 ms per loop
回答by Alexander
I think these methods are even faster for initializing a series to a constant value:
我认为这些方法可以更快地将系列初始化为恒定值:
Base Line
基线
%%timeit
a = np.empty(1000, dtype='float')
for i in range(len(a)):
a[i] = 1.0
s = pd.Series(data=a)
10000 loops, best of 3: 121 μs per loop
Alternatives
备择方案
%%timeit
s = pd.Series(np.empty(1000, dtype='float')) * 1.
10000 loops, best of 3: 99.5 μs per loop
%%timeit
constant = 5.
s = pd.Series(np.ones(1000)) * constant
10000 loops, best of 3: 85.3 μs per loop
回答by jkokorian
I figured out how to get past the indexing overhead when setting values on a series object directly:
我想出了如何在直接在系列对象上设置值时绕过索引开销:
a = np.empty(1000,dtype='float')
s = pd.Series(data=a)
for i in range(len(a)):
a[i] = 1.0
When initializing the Series from a numpy array, the data is not copied. If a reference is kept to the original array, you can just set values on that!
从 numpy 数组初始化系列时,不会复制数据。如果保留对原始数组的引用,则只需在其上设置值即可!

