pandas 根据不同列中的值复制行

Question

提问by MRA

I have a dataframe of transactions. Each row represents a transaction of two item (think of it like a transaction of 2 event tickets or something). I want to duplicate each row based on the quantity sold.

我有一个交易数据框。每行代表两个项目的交易（把它想象成 2 张活动门票或其他东西的交易）。我想根据销售数量复制每一行。

Here's example code:

这是示例代码：

# dictionary of transactions

d = {
    '1': ['20',  'NYC', '2'],
    '2': ['30',  'NYC', '2'],
    '3': ['5',   'NYC', '2'],
    '4': ['300', 'LA',  '2'],
    '5': ['30',  'LA',  '2'],
    '6': ['100', 'LA',  '2']
}

columns=['Price', 'City', 'Quantity']

# create dataframe and rename columns

df = pd.DataFrame.from_dict(
    data=d, orient='index'
)
df.columns = columns

This produces a dataframe that looks like this

这会产生一个看起来像这样的数据框

Price   City    Quantity
20       NYC         2
30       NYC         2
5        NYC         2
300      LA          2
30       LA          2
100      LA          2

So in the case above, each row will transform into two duplicate rows. If the 'quantity' column was 3, then that row would transform into three duplicate rows.

所以在上面的例子中，每一行都会变成两个重复的行。如果“数量”列是 3，那么该行将转换为三个重复的行。

Answer 1

采纳答案by Alexander

First, I recreated your data using integers instead of text. I also varied the quantity so that one can more easily understand the problem.

首先，我使用整数而不是文本重新创建了您的数据。我还改变了数量，以便人们可以更容易地理解问题。

d = {1: [20, 'NYC', 1], 2: [30, 'NYC', 2], 3: [5, 'SF', 3],      
     4: [300, 'LA', 1], 5: [30, 'LA', 2],  6: [100, 'SF', 3]}

columns=['Price', 'City', 'Quantity'] 
# create dataframe and rename columns

df = pd.DataFrame.from_dict(data=d, orient='index').sort_index()
df.columns = columns

>>> df
   Price City  Quantity
1     20  NYC         1
2     30  NYC         2
3      5   SF         3
4    300   LA         1
5     30   LA         2
6    100   SF         3

I created a new DataFrame by using a nested list comprehension structure.

我使用嵌套列表理解结构创建了一个新的 DataFrame。

df_new = pd.DataFrame([df.ix[idx] 
                       for idx in df.index 
                       for _ in range(df.ix[idx]['Quantity'])]).reset_index(drop=True)
>>> df_new
    Price City  Quantity
0      20  NYC         1
1      30  NYC         2
2      30  NYC         2
3       5   SF         3
4       5   SF         3
5       5   SF         3
6     300   LA         1
7      30   LA         2
8      30   LA         2
9     100   SF         3
10    100   SF         3
11    100   SF         3

Answer 2

回答by YOBEN_S

Answer by using repeat

使用回答 repeat

df.loc[df.index.repeat(df.Quantity)]
Out[448]: 
  Price City Quantity
1    20  NYC        2
1    20  NYC        2
2    30  NYC        2
2    30  NYC        2
3     5  NYC        2
3     5  NYC        2
4   300   LA        2
4   300   LA        2
5    30   LA        2
5    30   LA        2
6   100   LA        2
6   100   LA        2

Answer 3

回答by Dickster

How about this approach. I changed your data slightly to call out a sale of 4 tickets.

这个方法怎么样。我稍微更改了您的数据以显示 4 张门票的销售。

We use a helper np.ones() array, suitably sized ,and then the key line of code is: a[np.arange(a.shape[1])[:] > a[:,0,np.newaxis]] = 0

我们使用一个助手 np.ones() 数组，大小合适，然后关键的代码行是： a[np.arange(a.shape[1])[:] > a[:,0,np.newaxis]] = 0

I was shown this technique here: numpy - update values using slicing given an array value

我在这里展示了这种技术：numpy - update values using slicing given an array value

Then its simply a call to .stack()and some basic filtering to complete.

然后它只是一个调用.stack()和一些基本的过滤来完成。

d = {'1': ['20', 'NYC', '2'], '2': ['30', 'NYC', '2'], '3': ['5', 'NYC', '2'], \
     '4': ['300', 'LA', '2'], '5': ['30', 'LA', '4'],  '6': ['100', 'LA', '2']}

columns=['Price', 'City', 'Quantity']
df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns
df['Quantity'] = df['Quantity'].astype(int)

# make a ones array 
my_ones = np.ones(shape=(len(df),df['Quantity'].max()))

# turn my_ones into a dataframe same index as df so we can join it to the right hand side. Plenty of other ways to achieve the same outcome. 
df_my_ones = pd.DataFrame(data =my_ones,index = df.index)

df = df.join(df_my_ones)

which looks like:

看起来像：

  Price City  Quantity  0  1  2  3
1    20  NYC         2  1  1  1  1
3     5  NYC         2  1  1  1  1
2    30  NYC         2  1  1  1  1
5    30   LA         4  1  1  1  1
4   300   LA         2  1  1  1  1

now get the Quantity column and the ones into a numpy array

现在将 Quantity 列和那些列放入一个 numpy 数组中

a = df.iloc[:,2:].values

this is the clever bit

这是聪明的一点

a[np.arange(a.shape[1])[:] > a[:,0,np.newaxis]] = 0

and re-assign back to df.

并重新分配回 df。

df.iloc[:,2:] = a

and now df looks like following, notice how we have set to zero past the number in Quantity:

现在 df 如下所示，请注意我们如何将 Quantity 中的数字设置为零：

  Price City  Quantity  0  1  2  3
1    20  NYC         2  1  1  0  0
3     5  NYC         2  1  1  0  0
2    30  NYC         2  1  1  0  0
5    30   LA         4  1  1  1  1
4   300   LA         2  1  1  0  0

df.set_index(['Price','City','Quantity'],inplace=True)
df =  df.stack().to_frame()
df.columns = ['sale_flag']
df.reset_index(inplace=True)
print df[['Price','City', 'Quantity']][df['sale_flag'] !=0]
print df

which produces:

它产生：

Price City  Quantity
0     20  NYC         2
1     20  NYC         2
4      5  NYC         2
5      5  NYC         2
8     30  NYC         2
9     30  NYC         2
12    30   LA         4
13    30   LA         4
14    30   LA         4
15    30   LA         4
16   300   LA         2
17   300   LA         2

pandas 根据不同列中的值复制行

提问by MRA

采纳答案by Alexander

回答by YOBEN_S

回答by Dickster

相关推荐

最近更新

标签

pandas 根据不同列中的值复制行

提问by MRA

采纳答案by Alexander

回答by YOBEN_S

回答by Dickster

相关推荐

保留列顺序 - Python Pandas 和 Column Concat

Python Pandas：如何将一行移动到数据框的第一行？

使用 `pandas.cut()`，如何获得整数 bin 并避免获得负的最低界限？

将 Pandas 数据框转换为二维数组

相关推荐

最近更新

标签