如何在 Pandas 数据框中提取元组值以使用 matplotlib？

Question

提问by HP Peng

I have the following dataframe:

我有以下数据框：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x = np.arange(10)
x = np.concatenate((x,x))
y = []
for i in range(2):
    y.append(np.random.random_integers(0,10,20))

d = {'A': [(x[i], y[0][i]) for i in range(20)],
    'B': [(x[i], y[1][i]) for i in range(20)]} 
df = pd.DataFrame(d, index = list('aaaaaaaaaabbbbbbbbbb'))

df

    A        B
a  (0, 2)  (0, 10)
a  (1, 0)   (1, 8)
a  (2, 3)   (2, 8)
a  (3, 7)   (3, 8)
a  (4, 8)  (4, 10)
a  (5, 2)   (5, 0)
a  (6, 1)   (6, 4)
a  (7, 3)   (7, 9)
a  (8, 4)   (8, 4)
a  (9, 4)  (9, 10)
b  (0, 0)   (0, 3)
b  (1, 2)  (1, 10)
b  (2, 8)   (2, 3)
b  (3, 1)   (3, 7)
b  (4, 6)   (4, 1)
b  (5, 8)   (5, 3)
b  (6, 1)   (6, 4)
b  (7, 1)   (7, 1)
b  (8, 2)   (8, 7)
b  (9, 9)   (9, 3)

How do I make the following plots?

我如何制作以下图表？

Plot 1 is on column 'A', 2 lines (one line for index = a, the other for index = b), x values are the first elements of the tuples. y values are the 2nd elements of the tuple.

图 1 在列 'A' 上，2 行（一行用于索引 = a，另一行用于索引 = b），x 值是元组的第一个元素。y 值是元组的第二个元素。

Plot 2 is on column'B', the rest is the same as plot 1.

图 2 在“B”列上，其余与图 1 相同。

I cannot figure out how I can extract values from the tuples in the dataframe.

我不知道如何从数据框中的元组中提取值。

In addition, will groupby be helpful in this case?

另外，在这种情况下，groupby 会有帮助吗？

In reality, I have about a thousand columns of data, 5 groups, each group ~500 rows. So I'm looking for a quick way to solve this (dataframe size ~2500 x 1000)

实际上，我有大约一千列数据，5 组，每组 ~500 行。所以我正在寻找一种快速的方法来解决这个问题（数据帧大小 ~2500 x 1000）

Thanks a lot

非常感谢

Answer 1

回答by Alexander

Here is how to unpack your tuples using zip. The *unpacks the argument listof each column.

以下是如何使用zip. 在*解压缩参数列表每列的。

df['A.x'], df['A.y'] = zip(*df.A)
df['B.x'], df['B.y'] = zip(*df.B)

>>> df.head()
        A       B  A.x  A.y  B.x  B.y
a  (0, 6)  (0, 0)    0    6    0    0
a  (1, 8)  (1, 4)    1    8    1    4
a  (2, 8)  (2, 5)    2    8    2    5
a  (3, 5)  (3, 2)    3    5    3    2
a  (4, 2)  (4, 4)    4    2    4    4

Answer 2

回答by jezrael

I think you can use indexing with stronly:

我认为您只能将索引与 str 一起使用：

df['a1'], df['a2'] = df['A'].str[0], df['A'].str[1]
df['b1'], df['b2'] = df['B'].str[0], df['B'].str[1]

print (df)
         A       B  a1  a2  b1  b2
a   (0, 5)  (0, 1)   0   5   0   1
a   (1, 0)  (1, 5)   1   0   1   5
a   (2, 3)  (2, 9)   2   3   2   9
a   (3, 3)  (3, 8)   3   3   3   8
a   (4, 7)  (4, 9)   4   7   4   9
a   (5, 9)  (5, 4)   5   9   5   4
a   (6, 3)  (6, 3)   6   3   6   3
a   (7, 5)  (7, 0)   7   5   7   0
a   (8, 2)  (8, 3)   8   2   8   3
a   (9, 4)  (9, 5)   9   4   9   5
b   (0, 7)  (0, 0)   0   7   0   0
b   (1, 6)  (1, 2)   1   6   1   2
b   (2, 8)  (2, 3)   2   8   2   3
b   (3, 8)  (3, 8)   3   8   3   8
b  (4, 10)  (4, 1)   4  10   4   1
b   (5, 1)  (5, 3)   5   1   5   3
b   (6, 6)  (6, 3)   6   6   6   3
b   (7, 7)  (7, 3)   7   7   7   3
b   (8, 7)  (8, 7)   8   7   8   7
b   (9, 8)  (9, 0)   9   8   9   0

如何在 Pandas 数据框中提取元组值以使用 matplotlib？

提问by HP Peng

回答by Alexander

回答by jezrael

相关推荐

最近更新

标签

如何在 Pandas 数据框中提取元组值以使用 matplotlib？

提问by HP Peng

回答by Alexander

回答by jezrael

相关推荐

pandas 计算表中每 x 行的平均值并创建新表

pandas 根据特定列或列中是否存在空值从 DataFrame 中选择行

pandas 在数据框的每一列中查找数据类型

Pandas read_csv，读取缺少标题元素的csv文件

相关推荐

最近更新

标签