pandas AttributeError: 'numpy.int64' 对象没有属性 'to_timestamp'

Question

提问by liv2hak

I am trying to plot a time series from a python data frame. The code is below.

我正在尝试从 python 数据框绘制时间序列。代码如下。

import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, YearLocator, MonthLocator
plt.style.use('ggplot')


def plot(df, filename, heading=None):

    fig, ax = plt.subplots(figsize=(8, 4))

    min_date = None
    max_date = None
    for col_name in df.columns.values:

        # plot the column
        col = df[col_name]
        col = col[col.notnull()] # drop NAs
        dates = [zzz.to_timestamp().date() for zzz in col.index]
        ax.plot_date(x=dates, y=col, fmt='-', label=col_name,
            tz=None, xdate=True, ydate=False, linewidth=1.5)

        # establish the date range for the data
        if min_date:
            min_date = min(min_date, min(dates))
        else:
            min_date = min(dates)
        if max_date:
            max_date = max(max_date, max(dates))
        else:
            max_date = max(dates)

    # give a bit of space at each end of the plot - aesthetics
    span = max_date - min_date
    extra = int(span.days * 0.03) * datetime.timedelta(days=1)
    ax.set_xlim([min_date - extra, max_date + extra])

    # format the x tick marks
    ax.xaxis.set_major_formatter(DateFormatter('%Y'))
    ax.xaxis.set_minor_formatter(DateFormatter('\n%b'))
    ax.xaxis.set_major_locator(YearLocator())
    ax.xaxis.set_minor_locator(MonthLocator(bymonthday=1, interval=2))

    # grid, legend and yLabel
    ax.grid(True)
    ax.legend(loc='best', prop={'size':'x-small'})
    ax.set_ylabel('Percent')

    # heading
    if heading:
        fig.suptitle(heading, fontsize=12)
    fig.tight_layout(pad=1.5)

    # footnote
    fig.text(0.99, 0.01, 'nse-timeseries-plot', ha='right',
        va='bottom', fontsize=8, color='#999999')

    # save to file
    fig.savefig(filename, dpi=125)


    url = "https://www.google.com/finance/historical?cid=207437&startdate=Jan%201%2C%201971&enddate=Jul%201%2C%202017&start={0}&num=30"
    how_many_pages=138
    start=0

    for i in range(how_many_pages):
        new_url = url.format(start)
        page = requests.get(new_url)
        soup = BeautifulSoup(page.content, "lxml")
        table = soup.find_all('table', class_='gf-table historical_price')[0]

        columns_header = [th.getText() for th in table.findAll('tr')[0].findAll('th')]
        data_rows=table.findAll('tr')[1:]
        data=[[td.getText() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]

        if start == 0:
            final_df = pd.DataFrame(data, columns=columns_header)
        else:
            df = pd.DataFrame(data, columns=columns_header)
            final_df = pd.concat([final_df, df],axis=0)
        start += 30
        final_df.to_csv('nse_data.csv', sep='\t', encoding='utf-8')


    plot(final_df,'nsetsplot')

When I run the code I get the error AttributeError: 'numpy.int64' object has no attribute 'to_timestamp'

当我运行代码时出现错误 AttributeError: 'numpy.int64' object has no attribute 'to_timestamp'

when I do

当我做

dates = [zzz.to_timestamp().date() for zzz in col.index]

I am using Anaconda 64-bit on Windows 7 (x86_64)

我在 Windows 7 (x86_64) 上使用 Anaconda 64 位

Answer 1

回答by Warren Weckesser

Apparently the index of your DataFrame is not a pandas.PeriodIndex. Instead, the index appears hold integers. The code that you posted requires the index of the data frame to be a PeriodIndex. E.g.

显然，您的 DataFrame 的索引不是pandas.PeriodIndex. 相反，索引显示为保持整数。您发布的代码要求数据框的索引为PeriodIndex. 例如

In [36]: df
Out[36]: 
                a         b
2012-01  1.457900  7.084201
2012-02  1.775861  6.448277
2012-03  1.069051  7.861898

In [37]: df.index
Out[37]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

When the index is the correct type, the following code (similar to the line in the code you posted) works:

当索引类型正确时，以下代码（类似于您发布的代码中的行）有效：

In [39]: dates = [zzz.to_timestamp().date() for zzz in df.index]

In [40]: dates
Out[40]: 
[datetime.date(2012, 1, 1),
 datetime.date(2012, 2, 1),
 datetime.date(2012, 3, 1)]

Answer 2

回答by Kadir A. Peker

This may be due to a excel format issue if you imported your dataframe from excel. I had a similar problem: The dates appear fine in excel, but appear as integers (the integer representation of the date in excel) in the imported dataframe. This solved the problem for me: I select the whole column of dates in excel, and apply date format to the column. When I import as a dataframe after this, dates come out as dates.

如果您从 excel 导入数据框，这可能是由于 excel 格式问题。我有一个类似的问题：日期在 excel 中看起来很好，但在导入的数据框中显示为整数（excel 中日期的整数表示）。这为我解决了问题：我在 excel 中选择了整列日期，并将日期格式应用于该列。当我在此之后作为数据框导入时，日期作为日期出现。

pandas AttributeError: 'numpy.int64' 对象没有属性 'to_timestamp'

提问by liv2hak

回答by Warren Weckesser

回答by Kadir A. Peker

相关推荐

最近更新

标签

pandas AttributeError: 'numpy.int64' 对象没有属性 'to_timestamp'

提问by liv2hak

回答by Warren Weckesser

回答by Kadir A. Peker

相关推荐

pandas 计算pandas中每行具有某些值的列数

从一列中的唯一值创建 Pandas DataFrame

pandas 按一列分组并在熊猫中找到另一列的总和和最大值

在一个图中绘制来自多个 Pandas 数据框的数据

相关推荐

最近更新

标签