Pandas Dataframe - 根据条件获取索引值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51672531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe - Get index values based on condition
提问by Neal Titus Thomas
I have a text file called data.txt containing tabular data look like this:
我有一个名为 data.txt 的文本文件,其中包含表格数据,如下所示:
PERIOD
CHANNELS 1 2 3 4 5
0 1.51 1.61 1.94 2.13 1.95
5 1.76 1.91 2.29 2.54 2.38
6 2.02 2.22 2.64 2.96 2.81
7 2.27 2.52 2.99 3.37 3.24
8 2.53 2.83 3.35 3.79 3.67
9 2.78 3.13 3.70 4.21 4.09
10 3.04 3.44 4.05 4.63 4.53
In the CHANNELS column are the channel numbers of an instrument and in the other 5 columns are the maximum energy that that particular channel can detect in periods 1, 2, 3, 4 and 5 respectively.
CHANNELS 列中是仪器的通道编号,其他 5 列中分别是该特定通道在周期 1、2、3、4 和 5 中可以检测到的最大能量。
I want to write a python code which gets the inputs: Period, Lower energy and Higher energy from the user and then gives out the channel numbers corresponding to the Lower energy and Higher energy for a given period.
我想编写一个 python 代码,它从用户那里获取输入:周期、较低能量和较高能量,然后给出与给定时间段内较低能量和较高能量相对应的通道号。
For example:
例如:
Enter the period:
>>1
Enter the Lower energy:
>1.0
Enter the Higher energy:
>2.0
#Output
The lower energy channel is 0
The higher energy channel is 6
This is what I have written so far:
这是我到目前为止所写的:
import numpy as np
import pandas as pd
period = int(input('Enter the period: '))
lower_energy = float(input('Enter the lower energy value: '))
higher_energy = float(input('Enter the higher energy value: '))
row_names = [0, 5, 6, 7, 8, 9, 10]
column_names = [1, 2, 3, 4, 5]
data_list = []
with open('data.txt') as f:
lines = f.readlines()[2:]
for line in lines:
arr = [float(num) for num in line.split()[1:]]
data_list.append(arr)
df = pd.DataFrame(data_list, columns=column_names, index=row_names)
print (df, '\n')
print (df[period])
Help me add to this.
帮我补充一下。
回答by Bryce Ramgovind
You can add the following code:
您可以添加以下代码:
Retrieve the index based on the condition. Assumes constant increasing down the channels.
根据条件检索索引。假设沿通道不断增加。
lower_channel_energy = df[df[period]>lower_energy].index[0]
high_channel_energy = df[(df[period]<higher_energy).shift(-1)==False].index[0]
Printing the channels that we calculated:
打印我们计算的通道:
print("The lower energy channel is {}".format(lower_channel_energy))
print("The higher energy channel is {}".format(high_channel_energy))
This solution assumes that the energy is increasing on the channels going down.
该解决方案假设能量在下行通道上增加。
回答by ejb
You can actually read your file directly with Pandas to simplify the program. I can reproduce the output you are expecting with:
实际上,您可以直接使用 Pandas 读取文件以简化程序。我可以重现您期望的输出:
import pandas as pd
df = pd.read_csv('data.txt', engine='python' header=1,sep=r'\s{2,}')
period = input('Enter the period: ')
lower_energy = float(input('Enter the lower energy value: '))
higher_energy = float(input('Enter the higher energy value: '))
# select the channels within the ranges provided
lo_e_range = (df[period] > lower_energy)
hi_e_range = (df[period] > higher_energy)
# Indices of the lower and higher energy channels
lec = df[period][lo_e_range].index[0]
hec = df[period][hi_e_range].index[0]
print('The lower energy channel is {}'.format(df['CHANNELS'][lec]))
print('The higher energy channel is {}'.format(df['CHANNELS'][hec]))
I have edited the code to take into account your comment.
我已经编辑了代码以考虑到您的评论。