Pandas - 根据百分比获取前 n 行

Question

提问by Mohamed Thasin ah

I have a dataframe i want to pop certain number of records, instead on number I want to pass as a percentage value.

我有一个数据框，我想弹出一定数量的记录，而不是我想作为百分比值传递的数字。

for example,

例如，

df.head(n=10)

Pops out first 10 records from data set. I want a small change instead of 10 records i want to pop first5% of record from my data set. How to do this in pandas.

从数据集中弹出前 10 条记录。我想要一个小的改变而不是 10 条记录，我想从我的数据集中弹出前5% 的记录。如何在Pandas中做到这一点。

I'm looking for a code like this,

我正在寻找这样的代码，

df.head(frac=0.05)

Is there any simple way to get this?

有什么简单的方法可以得到这个吗？

Answer 1

采纳答案by Mihai Alexandru-Ionut

I want to pop first 5% of record

我想弹出记录的前 5%

There is no built-in method but you can do this:

没有内置方法，但您可以这样做：

You can multiplythe total number of rows to your percent and use the result as parameter for headmethod.

您可以multiply将总行数设为百分比，并将结果用作head方法的参数。

n = 5
df.head(int(len(df)*(n/100)))

So if your dataframe contains 1000rows and n = 5%you will get the first 50rows.

因此，如果您的数据框包含1000行，n = 5%您将获得第一50行。

Answer 2

回答by Julian

I've extended Mihai's answer for my usage and it may be useful to people out there. The purpose is automated top-n records selection for time series sampling, so you're sure you're taking old records for training and recent records for testing.

我已经为我的使用扩展了 Mihai 的答案，它可能对那里的人有用。目的是为时间序列采样自动选择前 n 条记录，因此您可以确定将旧记录用于训练，并将最近记录用于测试。

# having 
# import pandas as pd 
# df = pd.DataFrame... 

def sample_first_prows(data, perc=0.7):
    import pandas as pd
    return data.head(int(len(data)*(perc)))

train = sample_first_prows(df)
test = df.iloc[max(train.index):]

Pandas - 根据百分比获取前 n 行

提问by Mohamed Thasin ah

采纳答案by Mihai Alexandru-Ionut

回答by Julian

相关推荐

最近更新

标签

Pandas - 根据百分比获取前 n 行

提问by Mohamed Thasin ah

采纳答案by Mihai Alexandru-Ionut

回答by Julian

相关推荐

Pandas：从具有特定值的行下方开始读取 Excel 文件

如何根据日期时间索引对 Pandas Dataframe 进行切片

Pandas 中的 plot 和 iplot 有什么区别？

pandas 为什么我会在一个小 df 上使用 fast_executemany 出现内存错误？

相关推荐

最近更新

标签