Python Pandas - 获取给定列的第一行值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25254016/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - Get first row value of a given column
提问by Ahmed Haque
This seems like a ridiculously easy question... but I'm not seeing the easy answer I was expecting.
这似乎是一个非常简单的问题……但我没有看到我期待的简单答案。
So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).
那么,如何获取 Pandas 中给定列的第 n 行的值?(我对第一行特别感兴趣,但也会对更一般的实践感兴趣)。
For example, let's say I want to pull the 1.2 value in Btime as a variable.
例如,假设我想将 Btime 中的 1.2 值提取为变量。
Whats the right way to do this?
这样做的正确方法是什么?
df_test =
df_test =
ATime X Y Z Btime C D E
0 1.2 2 15 2 1.2 12 25 12
1 1.4 3 12 1 1.3 13 22 11
2 1.5 1 10 6 1.4 11 20 16
3 1.6 2 9 10 1.7 12 29 12
4 1.9 1 1 9 1.9 11 21 19
5 2.0 0 0 0 2.0 8 10 11
6 2.4 0 0 0 2.4 10 12 15
采纳答案by unutbu
To select the ithrow, use iloc:
要选择ith行,请使用iloc:
In [31]: df_test.iloc[0]
Out[31]:
ATime 1.2
X 2.0
Y 15.0
Z 2.0
Btime 1.2
C 12.0
D 25.0
E 12.0
Name: 0, dtype: float64
To select the ith value in the Btimecolumn you could use:
要选择Btime列中的第 i 个值,您可以使用:
In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2
There is a difference between df_test['Btime'].iloc[0](recommended) and df_test.iloc[0]['Btime']:
df_test['Btime'].iloc[0](推荐) 和之间有区别df_test.iloc[0]['Btime']:
DataFrames store data in column-based blocks (where each block has a single
dtype). If you select by column first, a viewcan be returned (which is
quicker than returning a copy) and the original dtype is preserved. In contrast,
if you select by row first, and if the DataFrame has columns of different
dtypes, then Pandas copiesthe data into a new Series of object dtype. So
selecting columns is a bit faster than selecting rows. Thus, although
df_test.iloc[0]['Btime']works, df_test['Btime'].iloc[0]is a little bit
more efficient.
DataFrames 将数据存储在基于列的块中(其中每个块都有一个 dtype)。如果您先按列选择,则可以返回视图(这比返回副本更快)并保留原始 dtype。相反,如果您先按行选择,并且如果 DataFrame 具有不同 dtype 的列,则 Pandas 会将数据复制到新的 Series 对象 dtype 中。所以选择列比选择行要快一些。因此,虽然df_test.iloc[0]['Btime']有效,但
df_test['Btime'].iloc[0]效率更高。
There is a big difference between the two when it comes to assignment.
df_test['Btime'].iloc[0] = xaffects df_test, but df_test.iloc[0]['Btime']may not. See below for an explanation of why. Because a subtle difference in
the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:
在分配方面,两者之间存在很大差异。
df_test['Btime'].iloc[0] = x影响df_test,但df_test.iloc[0]['Btime']可能不会。有关原因的解释,请参见下文。因为索引顺序的细微差异会导致行为的很大差异,所以最好使用单个索引分配:
df.iloc[0, df.columns.get_loc('Btime')] = x
df.iloc[0, df.columns.get_loc('Btime')] = x(recommended):
df.iloc[0, df.columns.get_loc('Btime')] = x(受到推崇的):
The recommended wayto assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,
为 DataFrame 分配新值的推荐方法是避免链接索引,而是使用andrew 所示的方法,
df.loc[df.index[n], 'Btime'] = x
or
或者
df.iloc[n, df.columns.get_loc('Btime')] = x
The latter method is a bit faster, because df.lochas to convert the row and column labels to
positional indices, so there is a little less conversion necessary if you use
df.ilocinstead.
后一种方法要快一些,因为df.loc必须将行和列标签转换为位置索引,因此如果您df.iloc改为使用,则所需的转换会少一些
。
df['Btime'].iloc[0] = xworks, but is not recommended:
df['Btime'].iloc[0] = x有效,但不推荐:
Although this works, it is taking advantage of the way DataFrames are currentlyimplemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime']always returns a
view (not a copy) so df['Btime'].iloc[n] = xcan be used to assigna new value
at the nth location of the Btimecolumn of df.
尽管这有效,但它利用了当前实现DataFrame 的方式。不能保证 Pandas 将来必须以这种方式工作。特别是,它利用了这样一个事实,即(当前)df['Btime']总是返回一个视图(而不是副本),因此df['Btime'].iloc[n] = x可用于在 的列的第 n 个位置分配一个新值。Btimedf
Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarningeven though in this case the assignment succeeds in modifying df:
由于 Pandas 没有明确保证索引器何时返回视图与副本,使用链式索引的赋值通常总是引发 a,SettingWithCopyWarning即使在这种情况下赋值成功修改df:
In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
In [26]: df
Out[26]:
foo bar
0 A 99 <-- assignment succeeded
2 B 100
1 C 100
df.iloc[0]['Btime'] = xdoes not work:
df.iloc[0]['Btime'] = x不起作用:
In contrast, assignment with df.iloc[0]['bar'] = 123does not work because df.iloc[0]is returning a copy:
相反,赋值df.iloc[0]['bar'] = 123不起作用,因为df.iloc[0]正在返回一个副本:
In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [67]: df
Out[67]:
foo bar
0 A 99 <-- assignment failed
2 B 100
1 C 100
Warning: I had previously suggested df_test.ix[i, 'Btime']. But this is not guaranteed to give you the ithvalue since ixtries to index by labelbefore trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i]will return the row labeledirather than the ithrow. For example,
警告:我之前曾建议df_test.ix[i, 'Btime']. 但这并不能保证为您提供ith值,因为在尝试按位置索引之前尝试ix按标签索引。因此,如果 DataFrame 的整数索引不是从 0 开始的排序顺序,则 using将返回标记的行而不是行。例如,ix[i]iith
In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [2]: df
Out[2]:
foo
0 A
2 B
1 C
In [4]: df.ix[1, 'foo']
Out[4]: 'C'
回答by andrew
Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.
请注意,@unutbu 的答案将是正确的,直到您想将该值设置为新的值,然后如果您的数据框是视图,它将不起作用。
In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Another approach that will consistently work with both setting and getting is:
另一种始终适用于设置和获取的方法是:
In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
foo bar
0 A 99
2 B 100
1 C 100
回答by nikhil
df.iloc[0].head(1)- First data set only from entire first row.df.iloc[0]- Entire First row in column.
df.iloc[0].head(1)- 仅来自整个第一行的第一个数据集。df.iloc[0]- 列中的整个第一行。
回答by anis
In a general way, if you want to pick up the first N rowsfrom the J columnfrom pandas dataframethe best way to do this is:
一般来说,如果您想从J 列中提取前N 行,最好的方法是:pandas dataframe
data = dataframe[0:N][:,J]
回答by Abdulrahman Bres
Another way to do this:
另一种方法来做到这一点:
first_value = df['Btime'].values[0]
This way seems to be faster than using .iloc:
这种方式似乎比使用更快.iloc:
In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 μs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 μs ± 1.28 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
回答by Hunaphu
Another way of getting the first row and preserving the index:
获取第一行并保留索引的另一种方法:
x = df.first('d') # Returns the first day. '3d' gives first three days.
回答by Alex Ortner
To get e.g the value from column 'test' and row 1 it works like
例如,要从“test”列和第 1 行中获取值,它的工作原理类似于
df[['test']].values[0][0]
as only df[['test']].values[0]gives back a array
因为只df[['test']].values[0]返回一个数组

