Pandas TypeError:仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效,但得到了“Int64Index”的实例
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48272540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
提问by Chris
I've got some order data that I want to analyse. Currently of interest is: How often has which SKU been bought in which month?
我有一些要分析的订单数据。目前感兴趣的是:哪个SKU在哪个月被购买的频率?
Here a small example:
这里有一个小例子:
import datetime
import pandas as pd
import numpy as np
d = {'sku': ['RT-17']}
df_skus = pd.DataFrame(data=d)
print(df_skus)
d = {'date': ['2017/02/17', '2017/03/17', '2017/04/17', '2017/04/18', '2017/05/02'], 'item_sku': ['HT25', 'RT-17', 'HH30', 'RT-17', 'RT-19']}
df_orders = pd.DataFrame(data=d)
print(df_orders)
for i in df_orders.index:
print("\n toll")
df_orders.loc[i,'date']=pd.to_datetime(df_orders.loc[i, 'date'])
df_orders = df_orders[df_orders["item_sku"].isin(df_skus["sku"])]
monthly_sales = df_orders.groupby(["item_sku", pd.Grouper(key="date",freq="M")]).size()
monthly_sales = monthly_sales.unstack(0)
print(monthly_sales)
That works fine, but if I use my real order data (from CSV) I get after some minutes:
这工作正常,但如果我使用我的真实订单数据(来自 CSV),几分钟后我会得到:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
类型错误:仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效,但有一个“Int64Index”实例
That problem comes from the line:
该问题来自以下行:
monthly_sales = df_orders.groupby(["item_sku", pd.Grouper(key="date",freq="M")]).size()
monthly_sales = df_orders.groupby(["item_sku", pd.Grouper(key="date",freq="M")]).size()
Is it possible to skip over the error? I tried a try except block:
是否可以跳过错误?我尝试了一个除了块之外的尝试:
try:
monthly_sales = df_orders.groupby(["item_sku", pd.Grouper(key="date",freq="M")]).size()
monthly_sales = monthly_sales.unstack(0)
except:
print "\n Here seems to be one issue"
Then I get for the print(monthly_sales)
然后我得到印刷品(monthly_sales)
Empty DataFrame
Columns: [txn_id, date, item_sku, quantity]
Index: []
空数据帧
列:[txn_id,日期,item_sku,数量]
索引:[]
So something in my data empties or brakes the grouping it seems like?
How can I 'clean' my data?
Or I'd be even fine with loosing the data of a sale here and there if I can just 'skip' over the error, is this possible?
那么我的数据中的某些内容会清空或破坏分组吗?如何“清理”我的数据?
或者,如果我可以“跳过”错误,我什至可以在这里和那里丢失销售数据,这可能吗?
回答by cs95
When reading your CSV, use the parse_dates
argument -
阅读 CSV 时,请使用parse_dates
参数 -
df_order = pd.read_csv('file.csv', parse_dates=['date'])
Which automatically converts date
to datetime. If that doesn't work, then you'll need to load it in as a string, and then use the errors='coerce'
argument with pd.to_datetime
-
它会自动转换date
为日期时间。如果这不起作用,那么您需要将其作为字符串加载,然后将errors='coerce'
参数与pd.to_datetime
-
df_order['date'] = pd.to_datetime(df_order['date'], errors='coerce')
Note that you can pass series objects (amongst other things) to pd.to_datetime`.
请注意,您可以将系列对象(除其他外)传递给 pd.to_datetime`。
Next, filter and group as you've been doing, and it should work.
接下来,像你一直在做的那样过滤和分组,它应该可以工作。
df_orders[df_orders["item_sku"].isin(df_skus["sku"])]\
.groupby(['item_sku', pd.Grouper(key='date', freq='M')]).size()
item_sku date
RT-17 2017-03-31 1
2017-04-30 1