pandas 熊猫绘制时间序列 ['numpy.ndarray' 对象没有属性 'find']
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15580234/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas plot time series ['numpy.ndarray' object has no attribute 'find']
提问by Harry Moreno
I have the following code attempting to plot a timeseries. Note, I drop the second column because it's not relevant. And I drop the first and last rows.
我有以下代码试图绘制时间序列。请注意,我删除了第二列,因为它不相关。我删除了第一行和最后一行。
import pandas as pd
activity = pd.read_csv('activity.csv', index_col=2)
activity = activity.ix[1:-1] #drop first and last rows due to invalid data
series = activity['activity']
series.plot()
I get the following error:
我收到以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-36df40c11065> in <module>()
----> 1 series.plot()
.../pandas/tools/plotting.pyc in plot_series(series, label, kind, use_index,
rot, xticks, yticks, xlim, ylim,
ax, style, grid, logy,
secondary_y, **kwds)
1326 secondary_y=secondary_y, **kwds)
1327
-> 1328 plot_obj.generate()
1329 plot_obj.draw()
1330
.../pandas/tools/plotting.pyc in generate(self)
573 self._compute_plot_data()
574 self._setup_subplots()
--> 575 self._make_plot()
576 self._post_plot_logic()
577 self._adorn_subplots()
.../pandas/tools/plotting.pyc in _make_plot(self)
916 args = (ax, x, y, style)
917
--> 918 newline = plotf(*args, **kwds)[0]
919 lines.append(newline)
920 leg_label = label
.../matplotlib/axes.pyc in plot(self, *args, **kwargs)
3991 lines = []
3992
-> 3993 for line in self._get_lines(*args, **kwargs):
3994 self.add_line(line)
3995 lines.append(line)
.../matplotlib>/axes.pyc in _grab_next_args(self, *args, **kwargs)
328 return
329 if len(remaining) <= 3:
--> 330 for seg in self._plot_args(remaining, kwargs):
331 yield seg
332 return
.../matplotlib/axes.pyc in _plot_args(self, tup, kwargs)
287 ret = []
288 if len(tup) > 1 and is_string_like(tup[-1]):
--> 289 linestyle, marker, color = _process_plot_format(tup[-1])
290 tup = tup[:-1]
291 elif len(tup) == 3:
.../matplotlib/axes.pyc in _process_plot_format(fmt)
94 # handle the multi char special cases and strip them from the
95 # string
---> 96 if fmt.find('--')>=0:
97 linestyle = '--'
98 fmt = fmt.replace('--', '')
AttributeError: 'numpy.ndarray' object has no attribute 'find'
If I try it with a small dataset such as:
如果我尝试使用一个小数据集,例如:
target, weekday, timestamp
0, Sat, 08 Dec 2012 16:26:26:625000
0, Sat, 08 Dec 2012 16:26:27:625000
0, Sat, 08 Dec 2012 16:26:28:625000
0, Sat, 08 Dec 2012 16:26:29:625000
1, Sat, 08 Dec 2012 16:26:30:625000
2, Sat, 08 Dec 2012 16:26:31:625000
0, Sat, 08 Dec 2012 16:26:32:625000
0, Sat, 08 Dec 2012 16:26:33:625000
1, Sat, 08 Dec 2012 16:26:34:625000
2, Sat, 08 Dec 2012 16:26:35:625000
it works, but not on my full dataset. https://dl.dropbox.com/u/60861504/activity.csvAlso I tried it with the first 10 items from my dataset and got the same error, but if I assign one value manually series[10] = 5the plot shows up. I'm stumped.
它有效,但不适用于我的完整数据集。https://dl.dropbox.com/u/60861504/activity.csv另外,我尝试使用数据集中的前 10 个项目并得到相同的错误,但如果我手动分配一个值,series[10] = 5该图就会显示出来。我难住了。
采纳答案by user1827356
In my experience this happens because of non numeric columns in the dataframe.
根据我的经验,这是因为数据框中的非数字列。
pd.read_csv tries to infer datatype of the columns - I suspect your corrupted columns might be confusing this process and you end up with columns of non numeric types in your data frame
pd.read_csv 尝试推断列的数据类型 - 我怀疑您损坏的列可能会混淆此过程,最终您的数据框中会出现非数字类型的列
回答by herrfz
The answer is in the error message:
答案在错误消息中:
AttributeError: 'numpy.ndarray' object has no attribute 'find'
The inferred datatype of your series is string (try type(series[0]))
您系列的推断数据类型是字符串(尝试type(series[0]))
If you first convert the datatype:
如果您首先转换数据类型:
series = series.astype(int)
series.plot()
should work.
应该管用。
回答by HYRY
There are two problems:
有两个问题:
Pandas can't parse the datetime string because the last colon: 08 Dec 2012 16:26:26 :625000
The second row in the file is not an integer, this will cause the dtype of the column become str object.
大Pandas无法解析的时间字符串,因为最后一个冒号:2012年12月8日16时26分26秒:625000
文件中的第二行不是整数,这会导致列的 dtype 成为 str 对象。
The following code works with your data:
以下代码适用于您的数据:
import pandas as pd
import re
from StringIO import StringIO
with open('activity.csv') as f:
str_data = re.sub(r":(\d+)$", r".", f.read(), flags=re.MULTILINE)
data = StringIO(str_data)
activity = pd.read_csv(data, index_col=2, parse_dates=True, dayfirst=True, na_values=["HEND0"])
activity = activity.ix[1:-1]
series = activity['activity']
series.plot()

