Python 将 Pandas DataFrame 的行转换为列标题,
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26147180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert row to column header for Pandas DataFrame,
提问by E.K.
The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?
我必须处理的数据有点乱。它的数据中有标题名称。如何从现有的熊猫数据框中选择一行并将其(重命名为)列标题?
I want to do something like:
我想做类似的事情:
header = df[df['old_header_name1'] == 'new_header_name1']
df.columns = header
采纳答案by unutbu
In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])
In [22]: df
Out[22]:
0 1 2
0 1 2 3
1 foo bar baz
2 4 5 6
Set the column labels to equal the values in the 2nd row (index location 1):
将列标签设置为等于第二行(索引位置 1)中的值:
In [23]: df.columns = df.iloc[1]
If the index has unique labels, you can drop the 2nd row using:
如果索引具有唯一标签,则可以使用以下方法删除第二行:
In [24]: df.drop(df.index[1])
Out[24]:
1 foo bar baz
0 1 2 3
2 4 5 6
If the index is not unique, you could use:
如果索引不是唯一的,您可以使用:
In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]:
1 foo bar baz
0 1 2 3
2 4 5 6
Using df.drop(df.index[1])removes allrows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it's often better to take care that the index is unique (even though Pandas does not require it).
使用df.drop(df.index[1])删除与第二行具有相同标签的所有行。因为非唯一索引会导致像这样的绊脚石(或潜在的错误),通常最好注意索引是唯一的(即使 Pandas 不需要它)。
回答by Zachary Wilson
This works (pandas v'0.19.2'):
这有效(熊猫 v'0.19.2'):
df.rename(columns=df.iloc[0])
回答by ccpizza
You can specify the row index in the read_csvor read_htmlconstructors via the headerparameter which represents Row number(s) to use as the column names, and the start of the data. This has the advantage of automatically dropping all the preceding rows which supposedly are junk.
您可以通过表示.csv的参数在read_csv或read_html构造函数中指定行索引。这样做的好处是可以自动删除所有前面应该是垃圾的行。headerRow number(s) to use as the column names, and the start of the data
import pandas as pd
from io import StringIO
In[1]
csv = '''junk1, junk2, junk3, junk4, junk5
junk1, junk2, junk3, junk4, junk5
pears, apples, lemons, plums, other
40, 50, 61, 72, 85
'''
df = pd.read_csv(StringIO(csv), header=2)
print(df)
Out[1]
pears apples lemons plums other
0 40 50 61 72 85
回答by shahar_m
It would be easier to recreate the data frame. This would also interpret the columns types from scratch.
重新创建数据框会更容易。这也将从头开始解释列类型。
headers = df.iloc[0]
new_df = pd.DataFrame(df.values[1:], columns=headers)

