pandas 基于整数索引拆分数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17457329/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:58:27  来源:igfitidea点击:

split data frame based on integer index

pandas

提问by user2426361

In pandas how do I split Series/dataframe into two Series/DataFrames where odd rows in one Series, even rows in different? Right now I am using

在Pandas中,如何将系列/数据帧拆分为两个系列/数据帧,其中一个系列中的奇数行,不同的偶数行?现在我正在使用

rng = range(0, n, 2)
odd_rows = df.iloc[rng]

This is pretty slow.

这很慢。

回答by Andy Hayden

Use slice:

使用切片:

In [11]: s = pd.Series([1,2,3,4])

In [12]: s.iloc[::2]  # even
Out[12]:
0    1
2    3
dtype: int64

In [13]: s.iloc[1::2]  # odd
Out[13]:
1    2
3    4
dtype: int64

回答by Jeff

Here's some comparisions

这是一些比较

In [100]: df = DataFrame(randn(100000,10))

simple method (but I think range makes this slow), but will work regardless of the index (e.g. does not have to be a numeric index)

简单的方法(但我认为范围会使这个变慢),但不管索引如何都可以工作(例如不必是数字索引)

In [96]: %timeit df.iloc[range(0,len(df),2)]
10 loops, best of 3: 21.2 ms per loop

The following require an Int64Indexthat is range based (which is easy to get, just reset_index()).

以下需要一个Int64Index基于范围的(这很容易获得,只是reset_index())。

In [107]: %timeit df.iloc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.67 ms per loop

In [108]: %timeit df.loc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.48 ms per loop

make sure to give it index positions

确保给它索引位置

In [98]: %timeit df.take(df.index % 2)
100 loops, best of 3: 3.06 ms per loop

same as above but no conversions on negative indicies

同上,但没有负指数的转换

In [99]: %timeit df.take(df.index % 2,convert=False)
100 loops, best of 3: 2.44 ms per loop

This winner is @AndyHayden soln; this only works on a single dtype

这位获胜者是@AndyHayden soln;这仅适用于单个 dtype

In [118]: %timeit DataFrame(df.values[::2],index=df.index[::2])
10000 loops, best of 3: 63.5 us per loop