pandas 将 openpyxl 数据传递给熊猫

Question

提问by mattrweaver

I am splitting "full name" fields into "first name", middle name" and "last name" fields from data from an excel file. I couldn't figure out how to do that in pandas, so I turned to openpyxl. I got the variables split as I desired. But, since adding columns to openpyxl for the new fields is not easy, I thought I would pass the values to pandas.

我正在将来自 excel 文件的数据中的“全名”字段拆分为“名字”、“中间名”和“姓氏”字段。我无法弄清楚如何在Pandas中做到这一点，所以我转向了 openpyxl。我将变量按照我的意愿拆分。但是，由于为新字段向 openpyxl 添加列并不容易，我想我会将值传递给 Pandas。

I'm generating the dataframe that I need when I run the code, but once I send the df to ExcelWriter, only the last row is added to the Excel file. The data is in the right places, though.

我正在生成运行代码时需要的数据框，但是一旦我将 df 发送到 ExcelWriter，只有最后一行添加到 Excel 文件中。不过，数据位于正确的位置。

Here's the code:

这是代码：

for cellObj in range(2, sheet.max_row+1):
    #print cellObj
    id = sheet['A' + str(cellObj)].value
    fullname = sheet['B' + str(cellObj)].value.strip()
    namelist = fullname.split(' ')  
    for i in namelist:
        firstname = namelist[0]
        if len(namelist) == 2:
            lastname = namelist[1]
            middlename = ''
        elif len(namelist) == 3:
            middlename = namelist[1]
            lastname = namelist[2]
        elif len(namelist) == 4:
            middlename = namelist[1]
            lastname = namelist[2] + " " + namelist[3]
        if (namelist[1] == 'Del') | (namelist[1] == 'El') | (namelist[1] == 'Van'):
            middlename = ''
            lastname = namelist[1] + " " + namelist[2]
    df = pd.DataFrame({'personID':id,'lastName':lastname,'firstName':firstname,'middleName':middlename}, index=[id])

    writer = pd.ExcelWriter('output.xlsx')
    df.to_excel(writer,'Sheet1', columns=['ID','lastName','firstName','middleName'])
    writer.save()

Any ideas?

有任何想法吗？

Thanks

谢谢

Answer 1

采纳答案by Sam

A couple of things. First, your code is only ever going to get you one line, because you overwrite the values every time it passes an if test. for example,

几件事。首先，你的代码只会给你一行，因为每次它通过 if 测试时你都会覆盖这些值。例如，

  if len(namelist) == 2:
        lastname = namelist[1]

This assigns a string to the variable lastname. You are not appending to a list, you are just assigning a string. Then when you make your dataframe, df = pd.DataFrame({'personID':id,'lastName':lastname,...your using this value, so the dataframe will only ever hold that string. Make sense? If you must do this using openpyexcel, try something like:

这将一个字符串分配给变量lastname。您没有附加到列表，您只是分配一个字符串。然后当你制作你的数据框时， df = pd.DataFrame({'personID':id,'lastName':lastname,...你使用这个值，所以数据框将只保存那个字符串。有道理？如果您必须使用 openpyexcel 执行此操作，请尝试以下操作：

lastname = [] #create an empty list
if len(namelist) == 2:
    lastname.append(namelist[1]) #add the name to the list

However, I think your life will ultimately be much easier if you just figure out how to do this with pandas. It is in fact quite easy. Try something like this:

然而，我认为如果你能想出如何用 Pandas 做到这一点，你的生活最终会容易得多。事实上，这很容易。尝试这样的事情：

import pandas as pd
#read excel
df = pd.read_excel('myInputFilename.xlsx', encoding = 'utf8')
#write to excel
df.to_excel('MyOutputFile.xlsx')

Answer 2

回答by Charlie Clark

FWIW openpyxl 2.4 makes it pretty easy to convert all or part of an Excel sheet to a Pandas Dataframe: ws.valuesis an iterator for all that values in the sheet. It also has a new ws.iter_cols()method that will allow you to work directly with columns.

FWIW openpyxl 2.4 使将 Excel 工作表的全部或部分转换为 Pandas Dataframe 变得非常容易：ws.values是工作表中所有值的迭代器。它还有一种新ws.iter_cols()方法，可以让您直接使用列。

It's currently (April 2016) available as an alpha version and can be installed using pip install -U --pre openpyxl

它目前（2016 年 4 月）作为 alpha 版本提供，可以使用 pip install -U --pre openpyxl

The code would then look a bit like this:

代码看起来有点像这样：

sheet["B1"] = "firstName"
sheet["C1"] = "middleName"
sheet["D1"] = "lastName"

for row in sheet.iter_rows(min_row=2, max_col=2):
    id_cell, name = row

    fullname = name.value.strip()
    namelist = fullname.split()
    firstname = namelist[0]
    lastname = namelist[-1]
    middlename = ""
    if len(namelist) >= 3:
        middlename = namelist[1]
    if len(namelist) == 4:
        lastname = " ".join(namelist[-2:])
    if middlename in ('Del', 'El', 'Van', 'Da'):
        lastname = " ".join([middlename, lastname])
        middlename = None

    name.value = firstname
    name.offset(column=1).value = middlename
    name.offset(column=2).value = lastname

wb.save("output.xlsx")

pandas 将 openpyxl 数据传递给熊猫

提问by mattrweaver

采纳答案by Sam

回答by Charlie Clark

相关推荐

最近更新

标签

pandas 将 openpyxl 数据传递给熊猫

提问by mattrweaver

采纳答案by Sam

回答by Charlie Clark

相关推荐

pandas Python：如何在两列之间的熊猫数据框中添加一列？

pandas 将熊猫数据框写入现有工作簿

pandas 从熊猫数据框列中的对象中删除逗号

pandas 大熊猫到sql server

相关推荐

最近更新

标签