在 Pandas 中连接列作为索引

Question

提问by DJElbow

I am importing a text file into pandas, and would like to concatenate 3 of the columns from the file to make the index.

我正在将一个文本文件导入到 Pandas 中，并希望将文件中的 3 列连接起来以制作索引。

I am open to doing this in 1 or more steps. I can either do the conversion at the same time I create the DataFrame, or I can create the DataFrame and restructure it with the newly created column. Knowing how to do this both ways would be the most helpful for me.

我愿意分 1 个或多个步骤执行此操作。我可以在创建 DataFrame 的同时进行转换，也可以创建 DataFrame 并使用新创建的列对其进行重组。知道如何以两种方式做到这一点对我来说是最有帮助的。

I would eventually like the index to be value of concatenating the values in the first 3 columns.

我最终希望索引是连接前 3 列中的值的值。

Answer 1

回答by joris

If your columns consist of strings, you can just use the +operator (addition in the context of strings is to concatenate them in python, and pandas follows this):

如果您的列由字符串组成，您可以只使用+运算符（字符串上下文中的添加是在python中连接它们，而pandas遵循此）：

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'year':['2012', '2012'], 'month':['01', '02']})

In [3]: df
Out[3]:
  month  year
0    01  2012
1    02  2012

In [4]: df['concatenated'] = df['year'] + df['month']

In [5]: df
Out[5]:
  month  year concatenated
0    01  2012       201201
1    02  2012       201202

And then, if this column is created, you can just use set_indexto change the index

然后，如果创建了此列，则可以仅用于set_index更改索引

In [6]: df = df.set_index('concatenated')

In [7]: df
Out[7]:
             month  year
concatenated
201201          01  2012
201202          02  2012

Note that pd.concatis not to 'concat'enate strings but to concatenate series/dataframes, so to add columns or rows of different dataframes or series together into one dataframe (not several rows/columns into one row/column). See http://pandas.pydata.org/pandas-docs/dev/merging.htmlfor an extensive explanation of this.

请注意，这pd.concat不是“连接”字符串，而是连接系列/数据帧，因此将不同数据帧或系列的列或行添加到一个数据帧中（而不是将几行/列添加到一行/列中）。请参阅http://pandas.pydata.org/pandas-docs/dev/merging.html以获得对此的详细解释。

Answer 2

回答by voithos

If you're using read_csvto import your text file, there is an index_colargument that you can pass a list of column names or numbersto. This will end up creating a MultiIndex- I'm not sure if that suits your application.

如果您使用read_csv导入文本文件，则index_col可以将列名或编号列表传递给一个参数。这最终会创建一个MultiIndex- 我不确定这是否适合您的应用程序。

If you want to explicitly concatenate your index together (assuming that they are strings), it seems you can do so with the +operator. (Warning, untested code ahead)

如果您想显式地将索引连接在一起（假设它们是字符串），似乎您可以使用+运算符来做到这一点。（警告，前面未经测试的代码）

df['concatenated'] = df['year'] + df['month']
df.set_index('concatenated')

在 Pandas 中连接列作为索引

提问by DJElbow

回答by joris

回答by voithos

相关推荐

最近更新

标签

在 Pandas 中连接列作为索引

提问by DJElbow

回答by joris

回答by voithos

相关推荐

pandas 熊猫系列中的缺失值检查

使用 python pandas 查找另一个数据框并返回相应的值

pandas 基于整数索引拆分数据框

从 Pandas MultiIndex 中删除一个级别

相关推荐

最近更新

标签