Pandas - 根据百分比获取前 n 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50173283/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - get first n-rows based on percentage
提问by Mohamed Thasin ah
I have a dataframe i want to pop certain number of records, instead on number I want to pass as a percentage value.
我有一个数据框,我想弹出一定数量的记录,而不是我想作为百分比值传递的数字。
for example,
例如,
df.head(n=10)
df.head(n=10)
Pops out first 10 records from data set. I want a small change instead of 10 records i want to pop first5% of record from my data set. How to do this in pandas.
从数据集中弹出前 10 条记录。我想要一个小的改变而不是 10 条记录,我想从我的数据集中弹出前5% 的记录。如何在Pandas中做到这一点。
I'm looking for a code like this,
我正在寻找这样的代码,
df.head(frac=0.05)
df.head(frac=0.05)
Is there any simple way to get this?
有什么简单的方法可以得到这个吗?
采纳答案by Mihai Alexandru-Ionut
I want to pop first 5% of record
我想弹出记录的前 5%
There is no built-in method but you can do this:
没有内置方法,但您可以这样做:
You can multiply
the total number of rows to your percent and use the result as parameter for head
method.
您可以multiply
将总行数设为百分比,并将结果用作head
方法的参数。
n = 5
df.head(int(len(df)*(n/100)))
So if your dataframe contains 1000
rows and n = 5%
you will get the first 50
rows.
因此,如果您的数据框包含1000
行,n = 5%
您将获得第一50
行。
回答by Julian
I've extended Mihai's answer for my usage and it may be useful to people out there. The purpose is automated top-n records selection for time series sampling, so you're sure you're taking old records for training and recent records for testing.
我已经为我的使用扩展了 Mihai 的答案,它可能对那里的人有用。目的是为时间序列采样自动选择前 n 条记录,因此您可以确定将旧记录用于训练,并将最近记录用于测试。
# having
# import pandas as pd
# df = pd.DataFrame...
def sample_first_prows(data, perc=0.7):
import pandas as pd
return data.head(int(len(data)*(perc)))
train = sample_first_prows(df)
test = df.iloc[max(train.index):]