将 np 数组添加到现有的 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47094437/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:43:34  来源:igfitidea点击:

Adding np arrays to existing pandas dataframe

pythonarrayspandasnumpydataframe

提问by

I've been trying to figure out a problem I came across for a while now, but somehow I cannot find the solution.

我一直试图找出我现在遇到的一个问题,但不知何故我找不到解决方案。

I've created a pandas dataframe which is already filled with values, let's say dimension (4,3)

我创建了一个已经填充了值的 Pandas 数据框,比如说维度 (4,3)

df=
  A    B    C
0 valX valX valX
1 valY valY valY
2 valZ valZ valZ
3 valW valW valW

What I want to do right now is append ten additional columns, each containing a numpy array filled with 38 zero's.

我现在想要做的是附加十个额外的列,每列包含一个填充了 38 个零的 numpy 数组。

My solution seems to be working when I first cast my array to a string and then add it to the original df.

当我第一次将数组转换为字符串然后将其添加到原始 df 时,我的解决方案似乎有效。

However, Pandas doesn't accept a plain numpy array. I need the value of the column to be a numpy array, as I will later do some sklearn computations on them.

但是,Pandas 不接受普通的 numpy 数组。我需要列的值是一个 numpy 数组,因为我稍后会对它们进行一些 sklearn 计算。

Later in my code, I substitute certain columns with a one-hot encoding of certain characters. The remaining columns act as a zero-padding.

后来在我的代码中,我用某些字符的一次性编码替换了某些列。其余列充当零填充。

Example of my code (which works for adding 10 columns):

我的代码示例(适用于添加 10 列):

#create empty array
x = np.zeros(38)
for i in range(0, 10):
    col_name = "char_" + str(i)

    df[col_name] = str(x)

The problem here is that I need to cast x to a string. If I keep it as a numpy array, it throws me this error:

这里的问题是我需要将 x 转换为字符串。如果我将它保留为一个 numpy 数组,它会抛出这个错误:

ValueError: Length of values does not match length of index

采纳答案by jezrael

Use:

用:

x = np.zeros(38)
for i in range(0, 10):
    col_name = "char_" + str(i)

    df[col_name] = pd.Series([x], index=df.index)


print (type(df.loc[0,'char_9']))
<class 'numpy.ndarray'>