Python 向 Pandas 中的现有列添加值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50066608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding values to existing columns in pandas
提问by Cyrille MODIANO
I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.
我循环到一个目录中的 csv 文件并用 Pandas 读取它们。对于每个 csv 文件,我都有一个类别和一个市场。然后我需要从对这个 csv 文件有效的数据库中获取类别的 id 和市场的 id。
the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.
finalDf 是一个包含所有 csv 文件的所有产品的数据框,我需要将当前 csv 的数据附加到它。
The list of the products of the current CSV are retrived using:
使用以下方法检索当前 CSV 的产品列表:
df['PRODUCT']
I need to append them to the finalDf and I used:
我需要将它们附加到 finalDf 并且我使用了:
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.
这似乎工作正常,我现在必须将 catid 和 marketid 插入到 finalDf 的相应列中。因为 catid 和 marketid 在当前的 csv 文件中是一致的,所以我只需要在 df 数据框中有行的时候添加它们,这就是我试图在下面的代码中完成的。
finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')
df = pd.read_csv(filename, header=None,
names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)
print finalDf.head()
PRODUCT CAT_ID MARKET_ID
0 ABC NaN NaN
1 ABB NaN NaN
2 ABE NaN NaN
3 DCB NaN NaN
4 EFT NaN NaN
As you can see, I just have NaN values instead of the actual values. expected output:
如您所见,我只有 NaN 值而不是实际值。预期输出:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
finalDF containing several csv would look like:
包含几个 csv 的 finalDF 看起来像:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
5 SDD 2114 13
6 ERT 2114 13
7 GHJ 2114 13
8 MOD 2114 13
9 GTR 2114 13
10 WLY 2114 13
11 WLO 2115 13
12 KOP 2115 13
Any idea?
任何的想法?
Thanks
谢谢
回答by Cyrille MODIANO
I finally found the solution, don't know why the other one didn't work though. But this one is simpler:
我终于找到了解决方案,但不知道为什么另一个不起作用。但这个更简单:
tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13
finalDf = pd.concat([finalDf,tempDf])
回答by Paul-Darius
You actually do not need catids and marketids:
你实际上不需要 catids 和 marketids:
finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid
Will work.
将工作。
For the rest of the script, I would probably have made things a bit simpler in that way:
对于脚本的其余部分,我可能会以这种方式使事情变得更简单:
finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()
Supposing that you are not interested in df
's original index as your code implied.
假设您对df
代码所暗示的原始索引不感兴趣。