Pandas:连接文件但跳过第一个文件以外的标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45658478/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Concatenate files but skip the headers except the first file
提问by MCG Code
I have 3 files representing the same dataset split in 3 and I need to concatenate:
我有 3 个文件表示相同的数据集分成 3 个,我需要连接:
import pandas
df1 = pandas.read_csv('path1')
df2 = pandas.read_csv('path2')
df3 = pandas.read_csv('path3')
df = pandas.concat([df1,df2,df3])
But this will keep the headers in the middle of the dataset, I need to remove the headers (column names) from the 2nd and 3rd file. How do I do that?
但这会将标题保留在数据集的中间,我需要从第二个和第三个文件中删除标题(列名)。我怎么做?
采纳答案by jezrael
I think you need numpy.concatenate
with DataFrame
constructor:
我认为你需要numpy.concatenate
用DataFrame
构造函数:
df = pd.DataFrame(np.concatenate([df1.values, df2.values, df3.values]), columns=df1.columns)
Another solution is replace columns names in df2
and df3
:
另一种解决方案是替换列名称中df2
和df3
:
df2.columns = df1.columns
df3.columns = df1.columns
df = pd.concat([df1,df2,df3], ignore_index=True)
Samples:
样品:
np.random.seed(100)
df1 = pd.DataFrame(np.random.randint(10, size=(2,3)), columns=list('ABF'))
print (df1)
A B F
0 8 8 3
1 7 7 0
df2 = pd.DataFrame(np.random.randint(10, size=(1,3)), columns=list('ERT'))
print (df2)
E R T
0 4 2 5
df3 = pd.DataFrame(np.random.randint(10, size=(3,3)), columns=list('HTR'))
print (df3)
H T R
0 2 2 2
1 1 0 8
2 4 0 9
print (np.concatenate([df1.values, df2.values, df3.values]))
[[8 8 3]
[7 7 0]
[4 2 5]
[2 2 2]
[1 0 8]
[4 0 9]]
df = pd.DataFrame(np.concatenate([df1.values, df2.values, df3.values]), columns=df1.columns)
print (df)
A B F
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
df = pd.concat([df1,df2,df3], ignore_index=True)
print (df)
A B F
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
回答by Serenity
回答by Gustavo Bertoli
Use:
用:
df = pd.merge(df1, df2, how='outer')
Merge rows that appear in either or both df1 and df2 (union).
合并出现在 df1 和 df2(联合)中或两者中的行。