Pandas - 根据百分比获取前 n 行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50173283/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:31:41  来源:igfitidea点击:

Pandas - get first n-rows based on percentage

pythonpandaspercentage

提问by Mohamed Thasin ah

I have a dataframe i want to pop certain number of records, instead on number I want to pass as a percentage value.

我有一个数据框,我想弹出一定数量的记录,而不是我想作为百分比值传递的数字。

for example,

例如,

df.head(n=10)

df.head(n=10)

Pops out first 10 records from data set. I want a small change instead of 10 records i want to pop first5% of record from my data set. How to do this in pandas.

从数据集中弹出前 10 条记录。我想要一个小的改变而不是 10 条记录,我想从我的数据集中弹出5% 的记录。如何在Pandas中做到这一点。

I'm looking for a code like this,

我正在寻找这样的代码,

df.head(frac=0.05)

df.head(frac=0.05)

Is there any simple way to get this?

有什么简单的方法可以得到这个吗?

采纳答案by Mihai Alexandru-Ionut

I want to pop first 5% of record

我想弹出记录的前 5%

There is no built-in method but you can do this:

没有内置方法,但您可以这样做:

You can multiplythe total number of rows to your percent and use the result as parameter for headmethod.

您可以multiply将总行数设为百分比,并将结果用作head方法的参数。

n = 5
df.head(int(len(df)*(n/100)))

So if your dataframe contains 1000rows and n = 5%you will get the first 50rows.

因此,如果您的数据框包含1000行,n = 5%您将获得第一50行。

回答by Julian

I've extended Mihai's answer for my usage and it may be useful to people out there. The purpose is automated top-n records selection for time series sampling, so you're sure you're taking old records for training and recent records for testing.

我已经为我的使用扩展了 Mihai 的答案,它可能对那里的人有用。目的是为时间序列采样自动选择前 n 条记录,因此您可以确定将旧记录用于训练,并将最近记录用于测试。

# having 
# import pandas as pd 
# df = pd.DataFrame... 

def sample_first_prows(data, perc=0.7):
    import pandas as pd
    return data.head(int(len(data)*(perc)))

train = sample_first_prows(df)
test = df.iloc[max(train.index):]