Python 添加不同长度的熊猫列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27126511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
add columns different length pandas
提问by aleksander
I have a problem with adding columns in pandas. I have DataFrame, dimensional is nxk. And in process I wiil need add columns with dimensional mx1, where m = [1,n], but I don't know m.
我在 Pandas 中添加列时遇到问题。我有数据框中,尺寸为N×K个。在此过程中,我需要添加维度为 mx1 的列,其中 m = [1,n],但我不知道 m。
When I try do it:
当我尝试这样做时:
df['Name column'] = data
# type(data) = list
result:
结果:
AssertionError: Length of values does not match length of index
Can I add columns with different length?
我可以添加不同长度的列吗?
采纳答案by EdChum
Use concat and pass axis=1and ignore_index=True:
使用 concat 和 pass axis=1and ignore_index=True:
In [38]:
import numpy as np
df = pd.DataFrame({'a':np.arange(5)})
df1 = pd.DataFrame({'b':np.arange(4)})
print(df1)
df
b
0 0
1 1
2 2
3 3
Out[38]:
a
0 0
1 1
2 2
3 3
4 4
In [39]:
pd.concat([df,df1], ignore_index=True, axis=1)
Out[39]:
0 1
0 0 0
1 1 1
2 2 2
3 3 3
4 4 NaN
回答by The Red Pea
If you use accepted answer, you'll lose your column names, as shown in the accepted answer example, and described in the documentation(emphasis added):
如果您使用已接受的答案,您将丢失列名,如已接受的答案示例中所示,并在文档中进行了描述(强调已添加):
The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does nothave meaningful indexing information.
产生的轴将被标记为0,...,N - 1。这如果你是哪里串联串列轴线确实对象是很有用的不是有有意义的索引信息。
It looks like column names ('Name column') are meaningful to the Original Poster / Original Question.
看起来列名 ( 'Name column') 对原始海报/原始问题有意义。
To save column names, use pandas.concat, but don'tignore_index(default value of ignore_indexis false; so you can omit that argument altogether). Continue to use axis=1:
为了节省列名称,用途pandas.concat,但不ignore_index(默认值ignore_index是false,因此你完全可以忽略这样的说法)。继续使用axis=1:
import pandas
# Note these columns have 3 rows of values:
original = pandas.DataFrame({
'Age':[10, 12, 13],
'Gender':['M','F','F']
})
# Note this column has 4 rows of values:
additional = pandas.DataFrame({
'Name': ['Nate A', 'Jessie A', 'Daniel H', 'John D']
})
new = pandas.concat([original, additional], axis=1)
# Identical:
# new = pandas.concat([original, additional], ignore_index=False, axis=1)
print(new.head())
# Age Gender Name
#0 10 M Nate A
#1 12 F Jessie A
#2 13 F Daniel H
#3 NaN NaN John D
Notice how John D does not have an Age or a Gender.
请注意 John D 是如何没有 Age 或 Gender 的。
回答by Manivannan Murugavel
We can add the different size of list values to DataFrame.
我们可以将不同大小的列表值添加到 DataFrame。
Example
例子
a = [0,1,2,3]
b = [0,1,2,3,4,5,6,7,8,9]
c = [0,1]
Find the Length of all list
查找所有列表的长度
la,lb,lc = len(a),len(b),len(c)
# now find the max
max_len = max(la,lb,lc)
Resize all according to the determined max length (not in this example
根据确定的最大长度调整所有大小(不在此示例中
if not max_len == la:
a.extend(['']*(max_len-la))
if not max_len == lb:
b.extend(['']*(max_len-lb))
if not max_len == lc:
c.extend(['']*(max_len-lc))
Now the all list is same length and create dataframe
现在所有列表的长度相同并创建数据框
pd.DataFrame({'A':a,'B':b,'C':c})
Final Output is
最终输出是
A B C
0 1 0 1
1 2 1
2 3 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
回答by Marjan Radfar
I had the same issue, two different dataframes and without a common column. I just needed to put them beside each other in a csv file.
我有同样的问题,两个不同的数据框,没有一个公共列。我只需要将它们并排放在一个 csv 文件中。
- Merge: In this case, "merge" does not work; even adding a temporary column to both dfs and then dropping it. Because this method makes both dfs with the same length. Hence, it repeats the rows of the shorter dataframe to match the longer dataframe's length.
- Concat: The idea of The Red Peadidn't work for me. It just appended the shorter df to the longer one (row-wise) while leaving an empty column (NaNs) above the shorter df's column.
- Solution: You need to do the following:
- 合并:在这种情况下,“合并”不起作用;甚至向两个 dfs 添加一个临时列然后删除它。因为这种方法使两个dfs具有相同的长度。因此,它重复较短数据帧的行以匹配较长数据帧的长度。
- Concat:The Red Pea的想法对我不起作用。它只是将较短的 df 附加到较长的 df(按行),同时在较短的 df 列上方留下一个空列(NaN)。
- 解决方案:您需要执行以下操作:
df1 = df1.reset_index()
df2 = df2.reset_index()
df = [df1, df2]
df_final = pd.concat(df, axis=1)
df_final.to_csv(filename, index=False)
This way, you'll see your dfsbesides each other (column-wise), each of which with its own length.
这样,您将看到dfs彼此并列(按列),每个都有自己的长度。

