如何在 Pandas 数据框中提取元组值以使用 matplotlib?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36996785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract tuple values in pandas dataframe for use of matplotlib?
提问by HP Peng
I have the following dataframe:
我有以下数据框:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.arange(10)
x = np.concatenate((x,x))
y = []
for i in range(2):
y.append(np.random.random_integers(0,10,20))
d = {'A': [(x[i], y[0][i]) for i in range(20)],
'B': [(x[i], y[1][i]) for i in range(20)]}
df = pd.DataFrame(d, index = list('aaaaaaaaaabbbbbbbbbb'))
df
df
A B
a (0, 2) (0, 10)
a (1, 0) (1, 8)
a (2, 3) (2, 8)
a (3, 7) (3, 8)
a (4, 8) (4, 10)
a (5, 2) (5, 0)
a (6, 1) (6, 4)
a (7, 3) (7, 9)
a (8, 4) (8, 4)
a (9, 4) (9, 10)
b (0, 0) (0, 3)
b (1, 2) (1, 10)
b (2, 8) (2, 3)
b (3, 1) (3, 7)
b (4, 6) (4, 1)
b (5, 8) (5, 3)
b (6, 1) (6, 4)
b (7, 1) (7, 1)
b (8, 2) (8, 7)
b (9, 9) (9, 3)
How do I make the following plots?
我如何制作以下图表?
Plot 1 is on column 'A', 2 lines (one line for index = a, the other for index = b), x values are the first elements of the tuples. y values are the 2nd elements of the tuple.
图 1 在列 'A' 上,2 行(一行用于索引 = a,另一行用于索引 = b),x 值是元组的第一个元素。y 值是元组的第二个元素。
Plot 2 is on column'B', the rest is the same as plot 1.
图 2 在“B”列上,其余与图 1 相同。
I cannot figure out how I can extract values from the tuples in the dataframe.
我不知道如何从数据框中的元组中提取值。
In addition, will groupby be helpful in this case?
另外,在这种情况下,groupby 会有帮助吗?
In reality, I have about a thousand columns of data, 5 groups, each group ~500 rows. So I'm looking for a quick way to solve this (dataframe size ~2500 x 1000)
实际上,我有大约一千列数据,5 组,每组 ~500 行。所以我正在寻找一种快速的方法来解决这个问题(数据帧大小 ~2500 x 1000)
Thanks a lot
非常感谢
回答by Alexander
Here is how to unpack your tuples using zip
. The *
unpacks the argument listof each column.
以下是如何使用zip
. 在*
解压缩参数列表每列的。
df['A.x'], df['A.y'] = zip(*df.A)
df['B.x'], df['B.y'] = zip(*df.B)
>>> df.head()
A B A.x A.y B.x B.y
a (0, 6) (0, 0) 0 6 0 0
a (1, 8) (1, 4) 1 8 1 4
a (2, 8) (2, 5) 2 8 2 5
a (3, 5) (3, 2) 3 5 3 2
a (4, 2) (4, 4) 4 2 4 4
回答by jezrael
I think you can use indexing with stronly:
我认为您只能将索引与 str 一起使用:
df['a1'], df['a2'] = df['A'].str[0], df['A'].str[1]
df['b1'], df['b2'] = df['B'].str[0], df['B'].str[1]
print (df)
A B a1 a2 b1 b2
a (0, 5) (0, 1) 0 5 0 1
a (1, 0) (1, 5) 1 0 1 5
a (2, 3) (2, 9) 2 3 2 9
a (3, 3) (3, 8) 3 3 3 8
a (4, 7) (4, 9) 4 7 4 9
a (5, 9) (5, 4) 5 9 5 4
a (6, 3) (6, 3) 6 3 6 3
a (7, 5) (7, 0) 7 5 7 0
a (8, 2) (8, 3) 8 2 8 3
a (9, 4) (9, 5) 9 4 9 5
b (0, 7) (0, 0) 0 7 0 0
b (1, 6) (1, 2) 1 6 1 2
b (2, 8) (2, 3) 2 8 2 3
b (3, 8) (3, 8) 3 8 3 8
b (4, 10) (4, 1) 4 10 4 1
b (5, 1) (5, 3) 5 1 5 3
b (6, 6) (6, 3) 6 6 6 3
b (7, 7) (7, 3) 7 7 7 3
b (8, 7) (8, 7) 8 7 8 7
b (9, 8) (9, 0) 9 8 9 0