Python 在 Pandas DataFrame 中定位第一个和最后一个非 NaN 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22403469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Locate first and last non NaN values in a Pandas DataFrame
提问by Jason
I have a Pandas DataFrameindexed by date. There a number of columns but many columns are only populated for part of the time series. I'd like to find where the first and last values non-NaNvalues are located so that I can extracts the dates and see how long the time series is for a particular column.
我有一个DataFrame按日期索引的 Pandas 。有许多列,但许多列仅填充时间序列的一部分。我想找到第一个和最后一个非NaN值所在的位置,以便我可以提取日期并查看特定列的时间序列有多长。
Could somebody point me in the right direction as to how I could go about doing something like this? Thanks in advance.
有人可以指出我如何做这样的事情的正确方向吗?提前致谢。
采纳答案by Jason
@behzad.nouri's solution worked perfectly to return the first and last non-NaN valuesusing Series.first_valid_indexand Series.last_valid_index, respectively.
@ behzad.nouri的解决方案完美工作,返回第一个和最后不NaN values使用Series.first_valid_index和Series.last_valid_index,分别。
回答by cs95
Here's some helpful examples.
这里有一些有用的例子。
Series
系列
s = pd.Series([np.NaN, 1, np.NaN, 3, np.NaN], index=list('abcde'))
s
a NaN
b 1.0
c NaN
d 3.0
e NaN
dtype: float64
# first valid index
s.first_valid_index()
# 'b'
# first valid position
s.index.get_loc(s.first_valid_index())
# 1
# last valid index
s.last_valid_index()
# 'd'
# last valid position
s.index.get_loc(s.last_valid_index())
# 3
Alternative solution using notnaand idxmax:
使用notna和的替代解决方案idxmax:
# first valid index
s.notna().idxmax()
# 'b'
# last valid index
s.notna()[::-1].idxmax()
# 'd'
DataFrame
数据框
df = pd.DataFrame({
'A': [np.NaN, 1, np.NaN, 3, np.NaN],
'B': [1, np.NaN, np.NaN, np.NaN, np.NaN]
})
df
A B
0 NaN 1.0
1 1.0 NaN
2 NaN NaN
3 3.0 NaN
4 NaN NaN
(first|last)_valid_indexisn't defined on DataFrames, but you can apply them on each column using apply.
(first|last)_valid_index未在 DataFrames 上定义,但您可以使用apply.
# first valid index for each column
df.apply(pd.Series.first_valid_index)
A 1
B 0
dtype: int64
# last valid index for each column
df.apply(pd.Series.last_valid_index)
A 3
B 0
dtype: int64
As before, you can also use notnaand idxmax. This is slightly more natural syntax.
和以前一样,您也可以使用notnaand idxmax。这是稍微更自然的语法。
# first valid index
df.notna().idxmax()
A 1
B 0
dtype: int64
# last valid index
df.notna()[::-1].idxmax()
A 3
B 0
dtype: int64

