Python 向 Pandas 中的现有列添加值

Question

提问by Cyrille MODIANO

I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.

我循环到一个目录中的 csv 文件并用 Pandas 读取它们。对于每个 csv 文件，我都有一个类别和一个市场。然后我需要从对这个 csv 文件有效的数据库中获取类别的 id 和市场的 id。

the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.

finalDf 是一个包含所有 csv 文件的所有产品的数据框，我需要将当前 csv 的数据附加到它。

The list of the products of the current CSV are retrived using:

使用以下方法检索当前 CSV 的产品列表：

df['PRODUCT']

I need to append them to the finalDf and I used:

我需要将它们附加到 finalDf 并且我使用了：

finalDf['PRODUCT'] =  finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)

This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.

这似乎工作正常，我现在必须将 catid 和 marketid 插入到 finalDf 的相应列中。因为 catid 和 marketid 在当前的 csv 文件中是一致的，所以我只需要在 df 数据框中有行的时候添加它们，这就是我试图在下面的代码中完成的。

finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')

df = pd.read_csv(filename, header=None,
                             names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
                                    'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')

finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)

print finalDf.head()

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       NaN    NaN
    1    ABB       NaN    NaN
    2    ABE       NaN    NaN
    3    DCB       NaN    NaN
    4    EFT       NaN    NaN

As you can see, I just have NaN values instead of the actual values. expected output:

如您所见，我只有 NaN 值而不是实际值。预期输出：

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13

finalDF containing several csv would look like:

包含几个 csv 的 finalDF 看起来像：

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13
    5    SDD       2114    13
    6    ERT       2114    13
    7    GHJ       2114    13
    8    MOD       2114    13
    9    GTR       2114    13
   10    WLY       2114    13
   11    WLO       2115    13
   12    KOP       2115    13

Any idea?

任何的想法？

Thanks

谢谢

Answer 1

回答by Cyrille MODIANO

I finally found the solution, don't know why the other one didn't work though. But this one is simpler:

我终于找到了解决方案，但不知道为什么另一个不起作用。但这个更简单：

tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13

finalDf = pd.concat([finalDf,tempDf])

Answer 2

回答by Paul-Darius

You actually do not need catids and marketids:

你实际上不需要 catids 和 marketids：

finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid

Will work.

将工作。

For the rest of the script, I would probably have made things a bit simpler in that way:

对于脚本的其余部分，我可能会以这种方式使事情变得更简单：

finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()

Supposing that you are not interested in df's original index as your code implied.

假设您对df代码所暗示的原始索引不感兴趣。

Python 向 Pandas 中的现有列添加值

提问by Cyrille MODIANO

回答by Cyrille MODIANO

回答by Paul-Darius

相关推荐

最近更新

标签

Python 向 Pandas 中的现有列添加值

提问by Cyrille MODIANO

回答by Cyrille MODIANO

回答by Paul-Darius

相关推荐

Anaconda Python 在 Windows10 中安装 imutils

如何在python中完全制作Android应用程序？

Python TensorFlow，“‘模块’对象没有‘占位符’属性”

Python 文件 "/usr/bin/pip", line 9, in <module> from pip import main ImportError: cannot import name main

相关推荐

最近更新

标签