pandas ValueError:长度不匹配:预期轴有 6 个元素,新值有 1 个元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44876939/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:55:13  来源:igfitidea点击:

ValueError: Length mismatch: Expected axis has 6 elements, new values have 1 elements

pythonpandasmatplotlibtime-series

提问by liv2hak

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('ggplot')


url = "https://www.google.com/finance/historical?cid=207437&startdate=Jan%201%2C%201971&enddate=Jul%201%2C%202017&start={0}&num=30"
how_many_pages=3
start=0

for i in range(how_many_pages):
    new_url = url.format(start)
    page = requests.get(new_url)
    soup = BeautifulSoup(page.content, "lxml")
    table = soup.find_all('table', class_='gf-table historical_price')[0]

    columns_header = [th.getText() for th in table.findAll('tr')[0].findAll('th')]
    data_rows=table.findAll('tr')[1:]
    data=[[td.getText() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]

    if start == 0:
        final_df = pd.DataFrame(data, columns=columns_header)
    else:
        df = pd.DataFrame(data, columns=columns_header)
        final_df = pd.concat([final_df, df],axis=0)
    start += 30
    final_df.to_csv('nse_data.csv', sep='\t', encoding='utf-8')


final_df.columns = ['Date']
final_df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d', utc=True)


df.plot(x='Date', y='Close')


plt.savefig('foo.png')

The data downloaded is in the following format

下载的数据格式如下

    "Date
"   "Open
"   "High
"   "Low
"   "Close
"   "Volume
"
0   "Jun 30, 2017
"   "9,478.50
"   "9,535.80
"   "9,448.75
"   "9,520.90
"   "-
"
1   "Jun 29, 2017
"   "9,522.95
"   "9,575.80
"   "9,493.80
"   "9,504.10
"   "-

For the time being I only want to plot Date(on X-axis) against Close(on Y-axis)

目前我只想绘制Date(在 X 轴上)对Close(在 Y 轴上)

However I am getting the error

但是我收到错误

ValueError: Length mismatch: Expected axis has 6 elements, new values have 1 elements

ValueError: Length mismatch: Expected axis has 6 elements, new values have 1 elements

回答by sim0ne

  • Your headers and data contain newline characters. print(final_df.columns)returns:

    Index(['Date\n', 'Open\n', 'High\n', 'Low\n', 'Close\n', 'Volume\n'], dtype='object')
    

    Use rstripto get rid of them:

    columns_header = [th.getText().rstrip() for th in table.findAll('tr')[0].findAll('th')]
    

    and

    data = [[td.getText().rstrip() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]
    
  • final_df.columns = ['Date']produces your error. A dataframe requires as many headers as its number of columns. Therefore, in your case a list of 6 elements is expected. I'm not sure what you want to do here, I think you can simply remove this line.

  • The format you specify for date parsing does not match your data ['Apr 4, 2017', 'Apr 5, 2017', 'Apr 6, 2017',...]. Documentation on format codes here. Use instead:

    final_df['Date'] = pd.to_datetime(df['Date'], format='%b %d, %Y')
    
  • Convert your data to numeric values so you can plot them:

    final_df['Close'] = [float(val.replace(',', '')) for val in final_df['Close']]
    
  • Finally you can call:

    final_df.plot(x='Date', y='Close')
    
  • 您的标题和数据包含换行符。print(final_df.columns)返回:

    Index(['Date\n', 'Open\n', 'High\n', 'Low\n', 'Close\n', 'Volume\n'], dtype='object')
    

    用于rstrip摆脱它们:

    columns_header = [th.getText().rstrip() for th in table.findAll('tr')[0].findAll('th')]
    

    data = [[td.getText().rstrip() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]
    
  • final_df.columns = ['Date']产生你的错误。数据框需要与其列数一样多的标题。因此,在您的情况下,预计会有 6 个元素的列表。我不确定你想在这里做什么,我想你可以简单地删除这一行。

  • 您为日期解析指定的格式与您的数据不匹配['Apr 4, 2017', 'Apr 5, 2017', 'Apr 6, 2017',...]此处提供有关格式代码的文档。改用:

    final_df['Date'] = pd.to_datetime(df['Date'], format='%b %d, %Y')
    
  • 将您的数据转换为数值,以便您可以绘制它们:

    final_df['Close'] = [float(val.replace(',', '')) for val in final_df['Close']]
    
  • 最后你可以调用:

    final_df.plot(x='Date', y='Close')