Python Pandas,两行作为列标题?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41005577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas, two rows as column headers?
提问by Stephen
I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?
我已经看到如何使用双索引,但我还没有看到如何使用两行列标题。这可能吗?
For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015
例如,第 1 行是一系列重复的日期:2016、2016、2015、2015
Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.
第 2 行是一系列重复的数据。美元销售额、单位销售额、美元销售额、单位销售额。
So each "Dollar Sales" heading is actually tied to the date in the row above.
因此,每个“美元销售”标题实际上都与上一行中的日期相关联。
Subsequent rows are individual items with data.
后续行是带有数据的单个项目。
Is there a way to do a groupby
or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.
有没有办法做一个groupby
或某种方式我可以有两个列标题?最终,我想将“美元销售额”按日期排列为一个系列,以便我可以制作一个漂亮的图表。不幸的是,在下一个“Dollar Sales”值之前有多个列。(不仅仅是一个“单位销售额”列)。此外,如果我删除上面的日期行,则“美元销售”与每个日期之间没有链接。
回答by squareskittles
If using pandas.read_csv()
or pandas.read_table()
, you can provide a list of indices for the header
argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex
for you in df.columns
:
如果使用pandas.read_csv()
或pandas.read_table()
,您可以提供header
参数的索引列表,以指定要用于列标题的行。Python 将在pandas.MultiIndex
以下位置为您生成df.columns
:
df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])
You can also use more than two rows, or non-consecutive rows, to specify the column headers:
您还可以使用多于两行或非连续行来指定列标题:
df = pandas.read_table('DataSheet1.csv', header=[0,2,3])