python:pandas - 如何将熊猫数据帧的前两行组合到数据帧标题？

Question

提问by kit

I'm trying to read one excel file which looks like below:

我正在尝试读取一个如下所示的 excel 文件：

I also have one script which will convert this xlsx file into the csv files with its sheet names (If three sheets available then it will create three different csv files).

我还有一个脚本，可以将这个 xlsx 文件转换为带有工作表名称的 csv 文件（如果有三张工作表，那么它将创建三个不同的 csv 文件）。

It's csv file looks like below:

它的 csv 文件如下所示：

Unnamed: 0,Gender A,Unnamed: 2,Gender B,Unnamed: 4,Gender C,Gender D
date,Male,Female,Male,Female,Male,Female
2017-01-01 00:00:00,2,3,3,2,3,3
2017-01-02 00:00:00,5,7,7,42,3,5
2017-01-03 00:00:00,4,6,6,12,2,7
2017-01-04 00:00:00,6,7,3,6,4,8
2017-01-05 00:00:00,6,8,8,3,5,3
2017-01-06 00:00:00,54,3,3,6,3,5
2017-01-07 00:00:00,3,4,6,3,6,5
2017-01-08 00:00:00,3,6,6,3,6,4
2017-01-09 00:00:00,2,2,8,7,5,2
2017-01-10 00:00:00,4,3,2,4,5,5
2017-01-11 00:00:00,12,10,10,3,1,6
2017-01-12 00:00:00,9,7,7,3,4,1

So, my first question is which is better choice to process these files - xlsx or csv?

所以，我的第一个问题是处理这些文件的更好选择是 xlsx 还是 csv？

Next, I just want to read first two rows as a column header. So that I can understand In which Gender how many males and females are available.

接下来，我只想读取前两行作为列标题。这样我就可以了解在哪个性别中有多少男性和女性可用。

Expected Output:

预期输出：

0                  date   Gender A_Male Gender A_Female   Gender B_Male  Gender B_Female   Gender C_Male  Gender D_Female 
1   2017-01-01 00:00:00        2              3           3                 2               3                  3
2   2017-01-02 00:00:00        5              7           7                42               3                  5
3   2017-01-03 00:00:00        4              6           6                12               2                  7
4   2017-01-04 00:00:00        6              7           3                 6               4                  8
5   2017-01-05 00:00:00        6              8           8                 3               5                  3
6   2017-01-06 00:00:00       54              3           3                 6               3                  5
7   2017-01-07 00:00:00        3              4           6                 3               6                  5
8   2017-01-08 00:00:00        3              6           6                 3               6                  4
9   2017-01-09 00:00:00        2              2           8                 7               5                  2
10  2017-01-10 00:00:00        4              3           2                 4               5                  5
11  2017-01-11 00:00:00       12             10          10                 3               1                  6
12  2017-01-12 00:00:00        9              7           7                 3               4                  1

Answer 1

回答by Scott Boston

Let's try:

咱们试试吧：

df = pd.read_excel('Untitled 2.xlsx', header=[0,1])
df.columns = df.columns.map('_'.join)
df.rename_axis('Date').reset_index()

Output:

输出：

         Date  Gender A_Male  Gender A_Female  Gender B_Male  Gender B_Female  \
0  2017-01-01              2                3              3                2   
1  2017-01-02              5                7              7               42   
2  2017-01-03              4                6              6               12   
3  2017-01-04              6                7              3                6   
4  2017-01-05              6                8              8                3   
5  2017-01-06             54                3              3                6   
6  2017-01-07              3                4              6                3   
7  2017-01-08              3                6              6                3   
8  2017-01-09              2                2              8                7   
9  2017-01-10              4                3              2                4   
10 2017-01-11             12               10             10                3   
11 2017-01-12              9                7              7                3   

    Gender C_Male  Gender D_Female  
0               3                3  
1               3                5  
2               2                7  
3               4                8  
4               5                3  
5               3                5  
6               6                5  
7               6                4  
8               5                2  
9               5                5  
10              1                6  
11              4                1

Answer 2

回答by hobs

I love @ScottBoston's approach. Here are some cosmetic alternatives. If you want your column headers to look pretty, especially when the second row contains the units for the quantity, you can do something like this:

我喜欢@ScottBoston 的方法。这里有一些化妆品替代品。如果您希望列标题看起来漂亮，尤其是当第二行包含数量单位时，您可以执行以下操作：

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '{}\n({})'.format(h[0], h[1]))
df.rename_axis('Date')

And if you want to make sure your column names don't contain spaces (so you can access them as properties of the DataFrame):

如果您想确保您的列名不包含空格（因此您可以将它们作为 DataFrame 的属性访问）：

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '  '.join(h).replace(' ', '_'))
df.rename_axis('Date')

Which gives:

这使：

                     Gender_A__Male  Gender_A__Female   ...         Gender_C__Male  Gender_D__Female
Date                                                    ...                                         
2017-01-01 00:00:00               2                 3   ...                      3                 3
2017-01-02 00:00:00               5                 7   ...                      3                 5
2017-01-03 00:00:00               4                 6   ...                      2                 7
2017-01-04 00:00:00               6                 7   ...                      4                 8
2017-01-05 00:00:00               6                 8   ...                      5                 3
2017-01-06 00:00:00              54                 3   ...                      3                 5
2017-01-07 00:00:00               3                 4   ...                      6                 5
2017-01-08 00:00:00               3                 6   ...                      6                 4
2017-01-09 00:00:00               2                 2   ...                      5                 2
2017-01-10 00:00:00               4                 3   ...                      5                 5
2017-01-11 00:00:00              12                10   ...                      1                 6
2017-01-12 00:00:00               9                 7   ...                      4                 1

python:pandas - 如何将熊猫数据帧的前两行组合到数据帧标题？

提问by kit

回答by Scott Boston

回答by hobs

相关推荐

最近更新

标签

python:pandas - 如何将熊猫数据帧的前两行组合到数据帧标题？

提问by kit

回答by Scott Boston

回答by hobs

相关推荐

尝试使用 Pandas 读取 csv 时出错

pandas “索引”对象在python中不可调用

pandas 迭代数据帧中的组

pandas 如何使用 Python 从数据框中的每个字符串中获取第一个单词？

相关推荐

最近更新

标签