pandas ValueError:无法将 DatetimeIndex 转换为 dtype datetime64[us]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38516251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]
提问by Rock Pereira
I'm trying to create a PostgreSQL table of 30-minute data for the S&P 500 ETF (spy30new, for testing freshly inserted data) from a table of several stocks with 15-minute data (all15). all15 has an index on 'dt' (timestamp) and 'instr' (stock symbol). I would like spy30new to have an index on 'dt'.
我正在尝试从包含 15 分钟数据(全部 15)的几只股票的表中为 S&P 500 ETF(spy30new,用于测试新插入的数据)创建一个包含 30 分钟数据的 PostgreSQL 表。all15 在“dt”(时间戳)和“instr”(股票代码)上有一个索引。我希望 spy30new 在 'dt' 上有一个索引。
import numpy as np
import pandas as pd
from datetime import datetime, date, time, timedelta
from dateutil import parser
from sqlalchemy import create_engine
# Query all15
engine = create_engine('postgresql://user:passwd@localhost:5432/stocks')
new15Df = (pd.read_sql_query("SELECT dt, o, h, l, c, v FROM all15 WHERE (instr = 'SPY') AND (date(dt) BETWEEN '2016-06-27' AND '2016-07-15');", engine)).sort_values('dt')
# Correct for Time Zone.
new15Df['dt'] = (new15Df['dt'].copy()).apply(lambda d: d + timedelta(hours=-4))
# spy0030Df contains the 15-minute data at 00 & 30 minute time points
# spy1545Df contains the 15-minute data at 15 & 45 minute time points
spy0030Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 0]).reset_index(drop=True)
spy1545Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 15]).reset_index(drop=True)
high = pd.concat([spy1545Df['h'], spy0030Df['h']], axis=1).max(axis=1)
low = pd.concat([spy1545Df['l'], spy0030Df['l']], axis=1).min(axis=1)
volume = spy1545Df['v'] + spy0030Df['v']
# spy30Df assembled and pushed to PostgreSQL as table spy30new
spy30Df = pd.concat([spy0030Df['dt'], spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1)
spy30Df.columns = ['d', 'o', 'h', 'l', 'c', 'v']
spy30Df.set_index(['dt'], inplace=True)
spy30Df.to_sql('spy30new', engine, if_exists='append', index_label='dt')
This gives the error "ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]"
What I've tried so far (I have successfully pushed CSV files to PG using pandas. But here the source is a PG database):
这给出了错误“ValueError:无法将DatetimeIndex转换为dtype datetime64[us]”
到目前为止我已经尝试过(我已经使用pandas成功地将CSV文件推送到PG。但这里的源是一个PG数据库):
Not placing an index on
'dt'
spy30Df.set_index(['dt'], inplace=True) # Remove this line spy30Df.to_sql('spy30new', engine, if_exists='append') # Delete the index_label option
Converting 'dt' from type pandas.tslib.Timestamp to datetime.datetime using
to_pydatetime()
(in case psycopg2 can work with python dt, but not pandas Timestamp)u = (spy0030Df['dt']).tolist() timesAsPyDt = np.asarray(map((lambda d: d.to_pydatetime()), u)) spy30Df = pd.concat([spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1) newArray = np.c_[timesAsPyDt, spy30Df.values] colNames = ['dt', 'o', 'h', 'l', 'c', 'v'] newDf = pd.DataFrame(newArray, columns=colNames) newDf.set_index(['dt'], inplace=True) newDf.to_sql('spy30new', engine, if_exists='append', index_label='dt')
Using
datetime.utcfromtimestamp()
timesAsDt = (spy0030Df['dt']).apply(lambda d: datetime.utcfromtimestamp(d.tolist()/1e9))
Using
pd.to_datetime()
timesAsDt = pd.to_datetime(spy0030Df['dt'])
不放置索引
'dt'
spy30Df.set_index(['dt'], inplace=True) # Remove this line spy30Df.to_sql('spy30new', engine, if_exists='append') # Delete the index_label option
将“dt”从类型 pandas.tslib.Timestamp 转换为 datetime.datetime 使用
to_pydatetime()
(以防 psycopg2 可以使用 python dt,但不能使用 pandas Timestamp)u = (spy0030Df['dt']).tolist() timesAsPyDt = np.asarray(map((lambda d: d.to_pydatetime()), u)) spy30Df = pd.concat([spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1) newArray = np.c_[timesAsPyDt, spy30Df.values] colNames = ['dt', 'o', 'h', 'l', 'c', 'v'] newDf = pd.DataFrame(newArray, columns=colNames) newDf.set_index(['dt'], inplace=True) newDf.to_sql('spy30new', engine, if_exists='append', index_label='dt')
使用
datetime.utcfromtimestamp()
timesAsDt = (spy0030Df['dt']).apply(lambda d: datetime.utcfromtimestamp(d.tolist()/1e9))
使用
pd.to_datetime()
timesAsDt = pd.to_datetime(spy0030Df['dt'])
采纳答案by Rock Pereira
Using pd.to_datetime() on each element worked. Option 4, which doesn't work, applies pd.to_datetime() to the entire series. Perhaps the Postgres driver understands python datetime, but not datetime64 in pandas & numpy. Option 4 produced the correct output, but I got ValueError (see title) when sending the DF to Postgres
在每个元素上使用 pd.to_datetime() 都有效。选项 4 不起作用,将 pd.to_datetime() 应用于整个系列。也许 Postgres 驱动程序理解 python datetime,但不理解 pandas & numpy 中的 datetime64。选项 4 产生了正确的输出,但是在将 DF 发送到 Postgres 时我得到了 ValueError(见标题)
timesAsPyDt = (spy0030Df['dt']).apply(lambda d: pd.to_datetime(str(d)))
回答by Praveenkumar Beedanal
Actually, this was my data frame.
实际上,这是我的数据框。
Biomass Fossil Brown coal/Lignite Fossil Coal-derived gas Fossil Gas Fossil Hard coal Fossil Oil Geothermal Hydro Pumped Storage Hydro Run-of-river and poundage Hydro Water Reservtheitroad Nuclear Other Other renewable Solar Waste Wind Offshore Wind Onshore
2018-02-02 00:00:00+01:00 4835.0 16275.0 446.0 1013.0 4071.0 155.0 5.0 7.0 1906.0 35.0 8924.0 3643.0 142.0 0.0 595.0 2517.0 19999.0
2018-02-02 00:15:00+01:00 4834.0 16272.0 446.0 1010.0 3983.0 155.0 5.0 7.0 1908.0 71.0 8996.0 3878.0 142.0 0.0 594.0 2364.0 19854.0
2018-02-02 00:30:00+01:00 4828.0 16393.0 446.0 1019.0 4015.0 155.0 5.0
I was trying to insert into SQL database but getting the same error as in the above question. What i have done is, convert the index of the data frame to the column with a label 'index'.
我试图插入 SQL 数据库,但遇到与上述问题相同的错误。我所做的是,将数据框的索引转换为带有标签“索引”的列。
df.reset_index(level=0, inplace=True)
Rename the column name 'index' to 'DateTime' by using this code.
使用此代码将列名“index”重命名为“DateTime”。
df = df.rename(columns={'index': 'DateTime'})
Change the datatype to the 'datetime64'.
将数据类型更改为“datetime64”。
df['DateTime'] = df['DateTime'].astype('datetime64')
Store it in the sql database using these code.
使用这些代码将其存储在 sql 数据库中。
engine = create_engine('mysql+mysqlconnector://root:Password@localhost/generation_data', echo=True)
df.to_sql(con=engine, name='test', if_exists='replace')
回答by Wilhelm
I had the same problem and applying pd.to_datetime()
on each element worked as well. But it is orders of magnitude slower than running pd.to_datetime()
on the entire series. For a dataframe with over a 1 million rows:
我遇到了同样的问题,并且pd.to_datetime()
在每个元素上应用也很有效。但它比pd.to_datetime()
在整个系列上运行要慢几个数量级。对于超过 100 万行的数据框:
(df['Time']).apply(lambda d: pd.to_datetime(str(d)))
takes approximately 70 seconds
大约需要 70 秒
and
和
pd.to_datetime(df['Time'])
takes approximately 0.01 seconds
大约需要 0.01 秒
The actual problem is that timezone information is being included. To remove it:
实际问题是包含时区信息。要删除它:
t = pd.to_datetime(df['Time'])
t = t.tz_localize(None)
This should be much faster!
这应该快得多!