pandas numpy数组维度不匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22619288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:50:46  来源:igfitidea点击:

numpy arrays dimension mismatch

pythonnumpypandas

提问by alternated direction

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.

我正在使用 numpy 和 pandas 尝试将多个异类值连接到一个数组中。

np.concatenate((tmp, id, freqs))

Here are the exact values:

以下是确切值:

tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"

The dimensions of tmp, 17232, and freqsare as follows:

的尺寸tmp17232以及freqs如下的:

[in]  tmp.shape
[out] (4,)
[in]  np.array(17232).shape
[out] ()
[in]  freqs.shape
[out] (1,)

I have also tried casting them all as numpy arrays to no avail.

我也尝试将它们全部转换为 numpy 数组,但无济于事。

Although the variable freqswill frequently have more than one value.

尽管变量freqs经常有多个值。

However, with both the np.concatenateand np.appendfunctions I get the following error:

但是,使用np.concatenatenp.append函数时,我收到以下错误:

*** ValueError: all the input arrays must have same number of dimensions

These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?

这些都具有相同的列数(0),为什么我不能将它们与上述任何一种 numpy 方法连接起来?

All I'm looking to obtain is[(tmp), 17232, (freqs)]in one single dimensional array, which is to be appended onto the end of a pandas dataframe.

我想要获得的只是[(tmp), 17232, (freqs)]一个一维数组,它将被附加到Pandas数据帧的末尾。

Thanks.

谢谢。

Update

更新

It appears I can concatenate the two existing arrays:

看来我可以连接两个现有的数组:

np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)

However, the integer, even when casted cannot be used in concatenate.

但是,整数,即使在强制转换时也不能用于连接。

np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions

What does work, however is nesting append and concatenate

什么有效,但是嵌套追加和连接

np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
       0.022831050228310501], dtype=object)

Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?

虽然这有点乱。有没有人有更好的解决方案来连接多个异构数组?

采纳答案by gg349

The problem is that id, and later the integernp.array(17571), are not an array_likeobject. See herehow numpy decides whether an object can be converted automatically to a numpy array or not.

问题是id,以及后来的integernp.array(17571),都不是array_like对象。在这里查看numpy 如何决定一个对象是否可以自动转换为 numpy 数组。

The solution is to make idarray_like, i.e. to be an element of a listor tuple, so that numpy understands that idbelongs to a 1Darray_likestructure

解决方案是 make idarray_like,即成为listor 的一个元素tuple,以便 numpy 理解它id属于一个1Darray_like结构

It all boils down to

这一切都归结为

concatenate((tmp, (id,), freqs))

or

或者

concatenate((tmp, [id], freqs))

To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by @askewchan. See about it thisquestion/answer.

为了在使用 处理函数中的输入变量时避免此类问题numpy,您可以使用atleast_1d,正如@askewchan 所指出的。看看这个问题/答案。

Basically, if you are unsure if in different scenarios your variable idwill be a single stror a list of str, you are better off using

基本上,如果您不确定在不同场景中您的变量id是单个str还是一个列表str,您最好使用

concatenate((tmp, atleast_1d(id), freqs))

because the two options above will fail if idis already a list/tuple of strings.

因为如果id已经是字符串列表/元组,上面的两个选项将失败。

EDIT: It may not be obvious why np.array(17571)is not an array_likeobject. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

编辑:为什么np.array(17571)不是array_like对象可能并不明显。发生这种情况是因为np.array(17571).shape==(),因此它不可迭代,因为它没有维度。