pandas 如何将一列从一个 csv 文件附加到第二个 csv(具有不同的索引)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/56763245/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to append a column from one csv file to second csv (with different indices)
提问by Matt
I am working on concatenating many csv files together and want to take one column, from a multicolumn csv, and append it as a new column in a second csv. The problem is that the columns have different numbers of rows so the new column that I am adding to the existing csv gets cut short once the row index from the existing csv is reached.
我正在将许多 csv 文件连接在一起,并希望从多列 csv 中取出一列,并将其作为新列附加到第二个 csv 中。问题是这些列的行数不同,因此一旦达到现有 csv 的行索引,我添加到现有 csv 的新列就会被缩短。
I have tried to read in the new column as a second dataframe and then add that dataframe as a new column to the existing csv.
我试图将新列作为第二个数据框读入,然后将该数据框作为新列添加到现有的 csv 中。
df = pd.read_csv("Existing CSV.csv")
df2 = pd.read_csv("New CSV.csv", usecols = ['Desired Column'])
df["New CSV"] = df2
"Existing CSV" has 1200 rows of data while "New CSV" has 1500 rows. When I run the code, the 'New CSV" column is added to "Existing CSV", however, only the first 1200 rows of data are included.
“现有 CSV”有 1200 行数据,而“新 CSV”有 1500 行数据。当我运行代码时,“新 CSV”列被添加到“现有 CSV”,但是,只包含前 1200 行数据。
Ideally, all 1500 rows from "New CSV" will be included and the 300 rows missing from "Existing CSV" will be left blank.
理想情况下,“新 CSV”中的所有 1500 行都将包含在内,“现有 CSV”中缺少的 300 行将留空。
回答by Peter Leimbigler
By default, read_csv
gives the resulting DataFrame an integer index, so I can think of a couple of options to try.
默认情况下,read_csv
为生成的 DataFrame 提供一个整数索引,因此我可以想到几个尝试的选项。
Setup
设置
df = pd.read_csv("Existing CSV.csv")
df2 = pd.read_csv("New CSV.csv", usecols = ['Desired Column'])
Method 1: join
方法一: join
df = df.join(df2['Desired Column'], how='right')
Method 2: reindex_like
and assign
方法2:reindex_like
和assign
df = df.reindex_like(df2).assign(**{'Desired Column': df2['Desired Column']})