Pandas Dataframe - 根据条件获取索引值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51672531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:52:53  来源:igfitidea点击:

Pandas Dataframe - Get index values based on condition

pythonpython-3.xpandasnumpydataframe

提问by Neal Titus Thomas

I have a text file called data.txt containing tabular data look like this:

我有一个名为 data.txt 的文本文件,其中包含表格数据,如下所示:

                        PERIOD
CHANNELS    1      2      3      4       5 
0         1.51   1.61   1.94   2.13   1.95 
5         1.76   1.91   2.29   2.54   2.38 
6         2.02   2.22   2.64   2.96   2.81 
7         2.27   2.52   2.99   3.37   3.24 
8         2.53   2.83   3.35   3.79   3.67 
9         2.78   3.13   3.70   4.21   4.09 
10        3.04   3.44   4.05   4.63   4.53

In the CHANNELS column are the channel numbers of an instrument and in the other 5 columns are the maximum energy that that particular channel can detect in periods 1, 2, 3, 4 and 5 respectively.

CHANNELS 列中是仪器的通道编号,其他 5 列中分别是该特定通道在周期 1、2、3、4 和 5 中可以检测到的最大能量。

I want to write a python code which gets the inputs: Period, Lower energy and Higher energy from the user and then gives out the channel numbers corresponding to the Lower energy and Higher energy for a given period.

我想编写一个 python 代码,它从用户那里获取输入:周期、较低能量和较高能量,然后给出与给定时间段内较低能量和较高能量相对应的通道号。

For example:

例如:

Enter the period:
>>1
Enter the Lower energy:
>1.0
Enter the Higher energy:
>2.0
#Output
The lower energy channel is 0
The higher energy channel is 6

This is what I have written so far:

这是我到目前为止所写的:

import numpy as np
import pandas as pd

period = int(input('Enter the period: '))
lower_energy = float(input('Enter the lower energy value: '))
higher_energy = float(input('Enter the higher energy value: '))
row_names = [0, 5, 6, 7, 8, 9, 10]
column_names = [1, 2, 3, 4, 5] 
data_list = []
with open('data.txt') as f:
lines = f.readlines()[2:]
for line in lines:
    arr = [float(num) for num in line.split()[1:]]
    data_list.append(arr)
df = pd.DataFrame(data_list, columns=column_names, index=row_names)
print (df, '\n')
print (df[period])

Help me add to this.

帮我补充一下。

回答by Bryce Ramgovind

You can add the following code:

您可以添加以下代码:

Retrieve the index based on the condition. Assumes constant increasing down the channels.

根据条件检索索引。假设沿通道不断增加。

lower_channel_energy = df[df[period]>lower_energy].index[0]
high_channel_energy =  df[(df[period]<higher_energy).shift(-1)==False].index[0]

Printing the channels that we calculated:

打印我们计算的通道:

print("The lower energy channel is {}".format(lower_channel_energy))
print("The higher energy channel is {}".format(high_channel_energy))

This solution assumes that the energy is increasing on the channels going down.

该解决方案假设能量在下行通道上增加。

回答by ejb

You can actually read your file directly with Pandas to simplify the program. I can reproduce the output you are expecting with:

实际上,您可以直接使用 Pandas 读取文件以简化程序。我可以重现您期望的输出:

import pandas as pd

df = pd.read_csv('data.txt', engine='python' header=1,sep=r'\s{2,}')

period = input('Enter the period: ')
lower_energy = float(input('Enter the lower energy value: '))
higher_energy = float(input('Enter the higher energy value: '))

# select the channels within the ranges provided
lo_e_range = (df[period] > lower_energy)
hi_e_range = (df[period] > higher_energy)

# Indices of the lower and higher energy channels
lec = df[period][lo_e_range].index[0]
hec = df[period][hi_e_range].index[0]

print('The lower energy channel is {}'.format(df['CHANNELS'][lec]))
print('The higher energy channel is {}'.format(df['CHANNELS'][hec]))

I have edited the code to take into account your comment.

我已经编辑了代码以考虑到您的评论。