Python 在 Pandas 中设置现有数据框的多索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24041436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:52:43  来源:igfitidea点击:

set multi index of an existing data frame in pandas

pythonpandas

提问by user3527975

I have a DataFramethat looks like

我有一个DataFrame看起来像

  Emp1    Empl2           date       Company
0    0        0     2012-05-01         apple
1    0        1     2012-05-29         apple
2    0        1     2013-05-02         apple
3    0        1     2013-11-22         apple
18   1        0     2011-09-09        google
19   1        0     2012-02-02        google
20   1        0     2012-11-26        google
21   1        0     2013-05-11        google

I want to pass the company and date for setting a MultiIndexfor this DataFrame. Currently it has a default index. I am using df.set_index(['Company', 'date'], inplace=True)

我想通过公司和日期设置MultiIndex为这个DataFrame。目前它有一个默认索引。我在用df.set_index(['Company', 'date'], inplace=True)

df = pd.DataFrame()
for c in company_list:
        row = pd.DataFrame([dict(company = '%s' %s, date = datetime.date(2012, 05, 01))])
        df = df.append(row, ignore_index = True)
        for e in emp_list:
            dataset  = pd.read_sql("select company, emp_name, date(date), count(*) from company_table where  = '"+s+"' and emp_name = '"+b+"' group by company, date, name LIMIT 5 ", con)
                if len(dataset) == 0:
                row = pd.DataFrame([dict(sitename='%s' %s, name = '%s' %b, date = datetime.date(2012, 05, 01), count = np.nan)])
                dataset = dataset.append(row, ignore_index=True)
            dataset = dataset.rename(columns = {'count': '%s' %b})
            dataset = dataset.groupby(['company', 'date', 'emp_name'], as_index = False).sum()

            dataset = dataset.drop('emp_name', 1)
            df = pd.merge(df, dataset, how = '')
            df = df.sort('date', ascending = True)
            df.fillna(0, inplace = True)

df.set_index(['Company', 'date'], inplace=True)            
print df

But when I print this DataFrame, it prints None. I saw this solution from stackoverflow it self. Is this not the correct way of doing it. Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

但是当我打印这个时DataFrame,它会打印None. 我从 stackoverflow 它自己看到了这个解决方案。这不是正确的做法吗。另外我想洗牌公司和日期列的位置,以便公司成为第一个索引,日期成为层次结构中的第二个。对此有何想法?

采纳答案by Andy Hayden

When you pass inplace in makes the changes on the original variable and returns None, and the function does notreturn the modified dataframe, it returns None.

当您就地传入对原始变量进行更改并返回 None 时,该函数返回修改后的数据帧,它返回 None。

is_none = df.set_index(['Company', 'date'], inplace=True)
df  # the dataframe you want
is_none # has the value None

so when you have a line like:

所以当你有这样一行时:

df = df.set_index(['Company', 'date'], inplace=True)

it first modifies df... but then it sets dfto None!

它首先修改df......但随后它设置df为无!

That is, you should just use the line:

也就是说,您应该只使用以下行:

df.set_index(['Company', 'date'], inplace=True)