pandas ValueError:使用pandas DataFrame在python中数组的长度必须相同

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45051882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:58:44  来源:igfitidea点击:

ValueError: arrays must all be same length in python using pandas DataFrame

python-3.xpandasdataframeresampling

提问by paulc1111

I'm a newbie in python and using Dataframe from pandas package (python3.6).

我是 python 的新手,使用 Pandas 包 (python3.6) 中的 Dataframe。

I set it up like below code,

我像下面的代码一样设置它,

df = DataFrame({'list1': list1, 'list2': list2, 'list3': list3, 'list4': list4, 'list5': list5, 'list6': list6})

and it gives an error like ValueError: arrays must all be same length

它给出了一个错误,如 ValueError: arrays must all be same length

So I checked all the length of arrays, and list1& list2have 1 more data than other lists. If I want to add 1 data to those other 4 lists(list3, list4, list5, list6) by using pd.resample, then how should I write code...?

所以,我检查阵列的所有长度,list1list2比其他列表(1点)更多的数据。如果我想使用 将 1 个数据添加到其他 4 个列表(list3, list4, list5, list6pd.resample,那么我应该如何编写代码...?

Also, those lists are time series list with 1 minute.

此外,这些列表是 1 分钟的时间序列列表。

Does anybody have an idea or help me out here?

有人有想法或帮助我吗?

Thanks in advance.

提前致谢。

EDITSo I changed as what EdChum said. and added time list at the front. it is like below.

编辑所以我改变了 EdChum 所说的。并在前面添加了时间表。就像下面一样。

2017-04-01 0:00 895.87  730 12.8    4   19.1    380
2017-04-01 0:01 894.4   730 12.8    4   19.1    380
2017-04-01 0:02 893.08  730 12.8    4   19.3    380
2017-04-01 0:03 890.41  730 12.8    4   19.7    380
2017-04-01 0:04 889.28  730 12.8    4   19.93   380

and I typed code like

我输入了类似的代码

df.resample('1min', how='mean', fill_method='pad')

And it gives me this error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

它给了我这个错误: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

回答by EdChum

I'd just construct a Seriesfor each list and then concatthem all:

我只是Series为每个列表构建一个,然后将concat它们全部构建:

In [38]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1, name='list1')
s2 = pd.Series(l2, name='list2')
df = pd.concat([s1,s2], axis=1)
df

Out[38]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

As you can pass a namearg for the Seriesctor it will name each column in the df, plus it will place NaNwhere the column lengths don't match

由于您可以nameSeriesctor传递一个arg,它会命名 df 中的每一列,而且它将放置NaN列长度​​不匹配的位置

resamplerefers to when you have a DatetimeIndexfor which you want to rebase or adjust the length based on some time period which is not what you want here. You want to reindexwhich I think is unnecessary and messy:

resample指的是当你有一个DatetimeIndex你想要根据某个时间段重新设定或调整长度时,这不是你想要的。你想要的reindex我认为是不必要的和凌乱的:

In [40]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1)
s2 = pd.Series(l2)
df = pd.DataFrame({'list1':s1.reindex(s2.index), 'list2':s2})
df

Out[40]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

Here you'd need to know the longest length and then reindexall Series using that index, if you just concatit will automatically adjust the lengths and fill missing elements with NaN

在这里,您需要知道最长的长度,然后reindex使用该索引的所有系列,如果您只是concat它会自动调整长度并用NaN

回答by Doe Jowns

According to this documentation, it looks quite difficult to do this with pd.resample(): You should calculate a frequence which add only one value to your df, and the function seems really not made for this ^^ (seems to permit easy reshaping, ex : 1 min to 30sec or 1h) ! You'd better try what EdChum did :P

根据此文档,看起来很难做到这一点pd.resample():您应该计算一个频率,该频率只为您的 df 添加一个值,并且该功能似乎真的不是为此 ^^ 制作的(似乎允许轻松重塑,例如:1 分钟到 30 秒或 1 小时)!你最好试试 EdChum 所做的 :P