pandas 如何取出数据框中的列索引名称
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11334098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to take out the column index name in dataframe
提问by user1234440
Open High Low Close Volume Adj Close
Date
1990-01-02 00:00:00 35.25 37.50 35.00 37.25 6555600 8.70
1990-01-03 00:00:00 38.00 38.00 37.50 37.50 7444400 8.76
1990-01-04 00:00:00 38.25 38.75 37.25 37.63 7928800 8.79
1990-01-05 00:00:00 37.75 38.25 37.00 37.75 4406400 8.82
1990-01-08 00:00:00 37.50 38.00 37.00 38.00 3643200 8.88
How can I get rid of the Date index name in the above dataframe? It should be in the same row as the other column names but its not which is causing problems.
我怎样才能摆脱上面的日期索引名称dataframe?它应该与其他列名称在同一行,但它不是导致问题的原因。
Thanks
谢谢
回答by Wes McKinney
Try using the reset_indexmethod which moves the DataFrame's index into a column (which is what you want, I think).
尝试使用将reset_indexDataFrame 的索引移动到列中的方法(我认为这是您想要的)。
回答by ely
Short answer: you can't and it's not clear why this could ever "cause problems". The 'Date' name is naming the Index of the DataFrame, which is different from any of the columns. It gets printed with this offset specifically so you will not confuse it with a column of the frame. You would not slice into the date with DataFrame['Date']as per below:
简短的回答:你不能,也不清楚为什么这会“导致问题”。'Date' 名称命名 DataFrame 的索引,它不同于任何列。它专门使用此偏移量打印,因此您不会将其与框架的列混淆。您不会DataFrame['Date']按照以下方式切入日期:
>>> import numpy as np; import pandas; import datetime
>>> dfrm = pandas.DataFrame(np.random.rand(10,3),
... columns=['A','B','C'],
... index = pandas.Index(
... [datetime.date(2012,6,elem) for elem in range(1,11)],
... name="Date"))
>>> dfrm
A B C
Date
2012-06-01 0.283724 0.863012 0.798891
2012-06-02 0.097231 0.277564 0.872306
2012-06-03 0.821461 0.499485 0.126441
2012-06-04 0.887782 0.389486 0.374118
2012-06-05 0.248065 0.032287 0.850939
2012-06-06 0.101917 0.121171 0.577643
2012-06-07 0.225278 0.161301 0.708996
2012-06-08 0.906042 0.828814 0.247564
2012-06-09 0.733363 0.924076 0.393353
2012-06-10 0.273837 0.318013 0.754807
>>> dfrm['Date']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1458, in __getitem__
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 294, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 625, in get
_, block = self._find_block(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 715, in _find_block
self._check_have(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 722, in _check_have
raise KeyError('no item named %s' % str(item))
KeyError: 'no item named Date'
Longer answer:
更长的答案:
You canchange your DataFrame by adding the index into its own column if you'd like it to print that way. For example:
如果您希望以这种方式打印,您可以通过将索引添加到其自己的列中来更改您的 DataFrame。例如:
>>> dfrm['Date'] = dfrm.index
>>> dfrm
A B C Date
Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
After this, you could simply change the name of the index so that nothing prints:
在此之后,您可以简单地更改索引的名称,以便不打印任何内容:
>>> dfrm.reindex(pandas.Series(dfrm.index.values, name=''))
A B C Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
This seems a bit overkill. Another option is to just change the index to integers or something after adding the Date as a column:
这似乎有点矫枉过正。另一种选择是在将日期添加为列后将索引更改为整数或其他内容:
>>> dfrm.reset_index()
or if you already moved the index into a column manually, then just
或者如果您已经手动将索引移动到列中,那么只需
>>> dfrm.index = range(len(dfrm))
>>> dfrm
A B C Date
0 0.283724 0.863012 0.798891 2012-06-01
1 0.097231 0.277564 0.872306 2012-06-02
2 0.821461 0.499485 0.126441 2012-06-03
3 0.887782 0.389486 0.374118 2012-06-04
4 0.248065 0.032287 0.850939 2012-06-05
5 0.101917 0.121171 0.577643 2012-06-06
6 0.225278 0.161301 0.708996 2012-06-07
7 0.906042 0.828814 0.247564 2012-06-08
8 0.733363 0.924076 0.393353 2012-06-09
9 0.273837 0.318013 0.754807 2012-06-10
Or the following if you care about the order the columns appear:
或者,如果您关心列出现的顺序,请执行以下操作:
>>> dfrm.ix[:,[-1]+range(len(dfrm.columns)-1)]
Date A B C
0 2012-06-01 0.283724 0.863012 0.798891
1 2012-06-02 0.097231 0.277564 0.872306
2 2012-06-03 0.821461 0.499485 0.126441
3 2012-06-04 0.887782 0.389486 0.374118
4 2012-06-05 0.248065 0.032287 0.850939
5 2012-06-06 0.101917 0.121171 0.577643
6 2012-06-07 0.225278 0.161301 0.708996
7 2012-06-08 0.906042 0.828814 0.247564
8 2012-06-09 0.733363 0.924076 0.393353
9 2012-06-10 0.273837 0.318013 0.754807
Added
添加
Here are a few helpful functions to include in an iPython configuration script (so that they are loaded upon startup), or to put in a module you can easily load when working in Python.
这里有一些有用的函数可以包含在 iPython 配置脚本中(以便它们在启动时加载),或者放入一个在 Python 中工作时可以轻松加载的模块。
###########
# Imports #
###########
import pandas
import datetime
import numpy as np
from dateutil import relativedelta
from pandas.io import data as pdata
############################################
# Functions to retrieve Yahoo finance data #
############################################
# Utility to get generic stock symbol data from Yahoo finance.
# Starts two days prior to present (or most recent business day)
# and goes back a specified number of days.
def getStockSymbolData(sym_list, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
start_date = end_date + relativedelta.relativedelta(days=-num_dates)
return dict( (sym, dReader(sym, "yahoo", start=start_date, end=end_date)) for sym in sym_list )
###
# Utility function to get some AAPL data when needed
# for testing.
def getAAPL(end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
return getStockSymbolData(['AAPL'], end_date=end_date, num_dates=num_dates)
###
I also made a class below to hold some data for common stocks:
我还在下面做了一个类来保存一些普通股的数据:
#####
# Define a 'Stock' class that can hold simple info
# about a security, like SEDOL and CUSIP info. This
# is mainly for debugging things and quickly getting
# info for a single security.
class MyStock():
def __init__(self, ticker='None', sedol='None', country='None'):
self.ticker = ticker
self.sedol=sedol
self.country = country
###
def getData(self, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
return pandas.DataFrame(getStockSymbolData([self.ticker], end_date=end_date, num_dates=num_dates)[self.ticker])
###
#####
# Make some default stock objects for common stocks.
AAPL = MyStock(ticker='AAPL', sedol='03783310', country='US')
SAP = MyStock(ticker='SAP', sedol='484628', country='DE')

