在 Pandas 中连接列作为索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/17820260/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Concatenate Columns as Index in Pandas
提问by DJElbow
I am importing a text file into pandas, and would like to concatenate 3 of the columns from the file to make the index.
我正在将一个文本文件导入到 Pandas 中,并希望将文件中的 3 列连接起来以制作索引。
I am open to doing this in 1 or more steps. I can either do the conversion at the same time I create the DataFrame, or I can create the DataFrame and restructure it with the newly created column. Knowing how to do this both ways would be the most helpful for me.
我愿意分 1 个或多个步骤执行此操作。我可以在创建 DataFrame 的同时进行转换,也可以创建 DataFrame 并使用新创建的列对其进行重组。知道如何以两种方式做到这一点对我来说是最有帮助的。
I would eventually like the index to be value of concatenating the values in the first 3 columns.
我最终希望索引是连接前 3 列中的值的值。
回答by joris
If your columns consist of strings, you can just use the +operator (addition in the context of strings is to concatenate them in python, and pandas follows this):
如果您的列由字符串组成,您可以只使用+运算符(字符串上下文中的添加是在python中连接它们,而pandas遵循此):
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'year':['2012', '2012'], 'month':['01', '02']})
In [3]: df
Out[3]:
  month  year
0    01  2012
1    02  2012
In [4]: df['concatenated'] = df['year'] + df['month']
In [5]: df
Out[5]:
  month  year concatenated
0    01  2012       201201
1    02  2012       201202
And then, if this column is created, you can just use set_indexto change the index
然后,如果创建了此列,则可以仅用于set_index更改索引
In [6]: df = df.set_index('concatenated')
In [7]: df
Out[7]:
             month  year
concatenated
201201          01  2012
201202          02  2012
Note that pd.concatis not to 'concat'enate strings but to concatenate series/dataframes, so to add columns or rows of different dataframes or series together into one dataframe (not several rows/columns into one row/column). See http://pandas.pydata.org/pandas-docs/dev/merging.htmlfor an extensive explanation of this.
请注意,这pd.concat不是“连接”字符串,而是连接系列/数据帧,因此将不同数据帧或系列的列或行添加到一个数据帧中(而不是将几行/列添加到一行/列中)。请参阅http://pandas.pydata.org/pandas-docs/dev/merging.html以获得对此的详细解释。
回答by voithos
If you're using read_csvto import your text file, there is an index_colargument that you can pass a list of column names or numbersto. This will end up creating a MultiIndex- I'm not sure if that suits your application.
如果您使用read_csv导入文本文件,则index_col可以将列名或编号列表传递给一个参数。这最终会创建一个MultiIndex- 我不确定这是否适合您的应用程序。
If you want to explicitly concatenate your index together (assuming that they are strings), it seems you can do so with the +operator. (Warning, untested code ahead)
如果您想显式地将索引连接在一起(假设它们是字符串),似乎您可以使用+运算符来做到这一点。(警告,前面未经测试的代码)
df['concatenated'] = df['year'] + df['month']
df.set_index('concatenated')

