pandas 将多列拆分为熊猫数据框中的行

Question

提问by ctan

I have a pandas dataframe as follows:

我有一个Pandas数据框，如下所示：

ticker    account      value         date
aa       assets       100,200       20121231, 20131231
bb       liabilities  50, 150       20141231, 20131231

I would like to split df['value']and df['date']so that the dataframe looks like this:

我想拆分df['value']，df['date']以便数据框看起来像这样：

ticker    account      value         date
aa       assets       100           20121231
aa       assets       200           20131231 
bb       liabilities  50            20141231
bb       liabilities  150           20131231

Would greatly appreciate any help.

将不胜感激任何帮助。

Answer 1

回答by jezrael

You can first splitcolumns, create Seriesby stackand remove whitespaces by strip:

您可以首先split列，Series通过以下方式创建stack和删除空格strip：

s1 = df.value.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.date.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)

Then concatboth Seriesto df1:

然后concat都Series到df1：

df1 = pd.concat([s1,s2], axis=1, keys=['value','date'])

Remove old columns valueand dateand join:

删除旧列value和date和join：

print (df.drop(['value','date'], axis=1).join(df1).reset_index(drop=True))
  ticker      account value      date
0     aa       assets   100  20121231
1     aa       assets   200  20131231
2     bb  liabilities    50  20141231
3     bb  liabilities   150  20131231

Answer 2

回答by piRSquared

I'm noticing this question a lot. That is, how do I split this column that has a list into multiple rows? I've seen it called exploding. Here are some links:

我经常注意到这个问题。也就是说，如何将此具有列表的列拆分为多行？我见过它叫做爆炸。以下是一些链接：

So I wrote a function that will do it.

所以我写了一个函数来完成它。

def explode(df, columns):
    idx = np.repeat(df.index, df[columns[0]].str.len())
    a = df.T.reindex_axis(columns).values
    concat = np.concatenate([np.concatenate(a[i]) for i in range(a.shape[0])])
    p = pd.DataFrame(concat.reshape(a.shape[0], -1).T, idx, columns)
    return pd.concat([df.drop(columns, axis=1), p], axis=1).reset_index(drop=True)

But before we can use it, we need lists (or iterable) in a column.

但是在我们可以使用它之前，我们需要在列中使用列表（或可迭代的）。

Setup

设置

df = pd.DataFrame([['aa', 'assets',      '100,200', '20121231,20131231'],
                   ['bb', 'liabilities', '50,50',   '20141231,20131231']],
                  columns=['ticker', 'account', 'value', 'date'])

df

split valueand datecolumns:

拆分value和date列：

df.value = df.value.str.split(',')
df.date = df.date.str.split(',')

df

Now we could explode on either column or both, one after the other.

现在我们可以在任一列或两个列上一个接一个地爆炸。

Solution

解决方案

explode(df, ['value','date'])

Timing

定时

I removed stripfrom @jezrael's timing because I could not effectively add it to mine. This is a necessary step for this question as OP has spaces in strings after commas. I was aiming at providing a generic way to explode a column given it already has iterables in it and I think I've accomplished that.

我strip从@jezrael 的时间中删除了，因为我无法有效地将它添加到我的。这是此问题的必要步骤，因为 OP 在逗号后的字符串中有空格。我的目标是提供一种通用的方法来爆炸列，因为它已经包含可迭代对象，我想我已经做到了。

code

代码

def get_df(n=1):
    return pd.DataFrame([['aa', 'assets',      '100,200,200', '20121231,20131231,20131231'],
                         ['bb', 'liabilities', '50,50',   '20141231,20131231']] * n,
                        columns=['ticker', 'account', 'value', 'date'])

small 2 row sample

小 2 行样本

medium 200 row sample

中 200 行样本

large 2,000,000 row sample

大 2,000,000 行样本

Answer 3

回答by titipata

I wrote explodefunction based on previous answers. It might be useful for anyone who want to grab and use it quickly.

我explode根据以前的答案编写了函数。对于想要快速获取和使用它的任何人来说，它可能很有用。

def explode(df, cols, split_on=','):
    """
    Explode dataframe on the given column, split on given delimeter
    """
    cols_sep = list(set(df.columns) - set(cols))
    df_cols = df[cols_sep]
    explode_len = df[cols[0]].str.split(split_on).map(len)
    repeat_list = []
    for r, e in zip(df_cols.as_matrix(), explode_len):
        repeat_list.extend([list(r)]*e)
    df_repeat = pd.DataFrame(repeat_list, columns=cols_sep)
    df_explode = pd.concat([df[col].str.split(split_on, expand=True).stack().str.strip().reset_index(drop=True)
                            for col in cols], axis=1)
    df_explode.columns = cols
    return pd.concat((df_repeat, df_explode), axis=1)

example given from @piRSquared:

从@piRSquared 给出的例子：

df = pd.DataFrame([['aa', 'assets', '100,200', '20121231,20131231'],
                   ['bb', 'liabilities', '50,50', '20141231,20131231']],
                  columns=['ticker', 'account', 'value', 'date'])
explode(df, ['value', 'date'])

output

输出

+-----------+------+-----+--------+
|    account|ticker|value|    date|
+-----------+------+-----+--------+
|     assets|    aa|  100|20121231|
|     assets|    aa|  200|20131231|
|liabilities|    bb|   50|20141231|
|liabilities|    bb|   50|20131231|
+-----------+------+-----+--------+

Answer 4

回答by Caro

Because I'm too new, I'm not allowed to write a comment, so I write an "answer".

因为我太新，不许写评论，所以写了个“回答”。

@titipata your answer worked really good, but in my opinion there is a small "mistake" in your code I'm not able to find for my self.

@titipata 您的回答非常有效，但在我看来，您的代码中有一个小“错误”，我无法为自己找到。

I work with the example from this question and changed just the values.

我使用this question中的示例并仅更改了值。

df = pd.DataFrame([['title1', 'publisher1', '1.1,1.2', '1'],
               ['title2', 'publisher2', '2', '2.1,2.2']],
              columns=['titel', 'publisher', 'print', 'electronic'])

explode(df, ['print', 'electronic'])

    publisher   titel   print   electronic
0   publisher1  title1  1.1     1
1   publisher1  title1  1.2     2.1
2   publisher2  title2  2       2.2

As you see, in the column 'electronic' should be in row '1' the value '1' and not '2.1'.

如您所见，“电子”列中的“1”行应为“1”而不是“2.1”。

Because of that, the hole DataSet would change. I hope someone could help me to find a solution for this.

因此，孔数据集会发生变化。我希望有人可以帮助我找到解决方案。

pandas 将多列拆分为熊猫数据框中的行

提问by ctan

回答by jezrael

回答by piRSquared

Setup

设置

Solution

解决方案

Timing

定时

回答by titipata

回答by Caro

相关推荐

最近更新

标签

pandas 将多列拆分为熊猫数据框中的行

提问by ctan

回答by jezrael

回答by piRSquared

Setup

设置

Solution

解决方案

Timing

定时

回答by titipata

回答by Caro

相关推荐

pandas 突出显示 matplotlib 散点图中的特定点

pandas 如何在 sklearn 中使用 OneHotEncoder 的输出？

pandas 谁能解释这个错误 [AttributeError: 'DataFrame' object has no attribute 'to_numeric']

pandas ValueError：无法将 DatetimeIndex 转换为 dtype datetime64[us]

相关推荐

最近更新

标签