pandas 将元组列表转换为熊猫中的数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24175369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:09:19  来源:igfitidea点击:

Convert a list of tuples into a dataframe in pandas

pythonpandas

提问by user3720101

I have a list of tuples(y) that I wish to convert to a DataFrame x. There are five tuples in y. Each tuple in y has 33 elements. Element 1 in all 5 tuples is text and is the same. Element two in all five tuples is text and is the same. Element three in each tuple is text and is the same.

我有一个要转换为 DataFrame x 的元组 (y) 列表。y 中有五个元组。y 中的每个元组都有 33 个元素。所有 5 个元组中的元素 1 是文本并且是相同的。所有五个元组中的元素二是文本并且是相同的。每个元组中的元素三是文本并且是相同的。

I'd like to first three elements in y to be the column names in the DataFrame. I want to convert the list of tuples into a 10 x 3 DataFrame. The tricky part is row 1 in the dataframe would be elements 4,5,6 in y[1], row 2 in the dataframe would be elements 7,8,9 in y[1], row 3 would be 10,11,12...etc.

我想将 y 中的前三个元素作为 DataFrame 中的列名。我想将元组列表转换为 10 x 3 DataFrame。棘手的部分是数据帧中的第 1 行将是 y[1] 中的元素 4,5,6,数据帧中的第 2 行将是 y[1] 中的元素 7,8,9,第 3 行将是 10,11, 12...等。

y looks like this (not showing the entire list) :

y 看起来像这样(未显示整个列表):

List of tuples y                
y[0]    y[1]    y[2]    y[3]    y[4]

Formula Formula Formula Formula Formula
Phase   Phase   Phase   Phase   Phase
Value   Value   Value   Value   Value
"a" "a" "a" "a" "a"
"nxxx"  "nxxx"  "nxxx"  "nxxx"  "nxxx"
3.2 3.7 22.4    18.2    9.7
"h45"   "h45"   "h45"   "h45"   "h45"
"cacpp" "cacpp" "cacpp" "cacpp" "cacpp"
45.2    61.76   101.2   171.89  203.7
"trx"   "trx"   "trx"   "trx"   "trx"
"v2o5p" "v2o5p" "v2o5p" "v2o5p" "v2o5p"
0.24    0.81    0.97    1.2 1.98
"blnt"  "blnt"  "blnt"  "blnt"  "blnt"
"g2o3"  "g2o3"  "g2o3"  "g2o3"  "g2o3"
807.2   905.8   10089   10345   10979

I want to convert y into DataFrame x as follows:

我想将 y 转换为 DataFrame x 如下:

DataFrame x     
column 1 column 2 column 3

Formula Phase   Value
"a" "nxxx"  3.2
"h45"   "cacpp" 45.2
"trx"   "v2o5p" 0.24
"blnt"  "g2o3"  807.2
"a" "nxxx"  3.7
"h45"   "cacpp" 61.76
"trx"   "v2o5p" 0.81
"blnt"  "g2o3"  905.8
"a" "nxxx"  22.4
"h45"   "cacpp" 101.2
"trx"   "v2o5p" 0.97
"blnt"  "g2o3"  10089
etc etc etc

I know there must be an easy way to iterate through the list of tuples. But new to Pandas and relatively new to Python so I'm struggling with a clean way to do this.

我知道必须有一种简单的方法来遍历元组列表。但是 Pandas 的新手和 Python 的新手,所以我正在努力寻找一种干净的方法来做到这一点。

回答by Happy001

Basically, you need: 1) remove first 3 element of each tuple (just need one as column header) 2) concatenateall elements in y3) reshapeto 3 columns All these can be achieved with numpywhich you must be familiar if you are using pandas

基本上,您需要:1)删除每个元组的前 3 个元素(只需要一个作为列标题)2) 3)中的concatenate所有元素到 3 列所有这些都可以实现,如果您正在使用,您必须熟悉yreshapenumpypandas

#Step 1) and 2) above.
In [83]: data = np.concatenate ([z[3:] for z in y])

#reshape
In [84]: data = data.reshape(-1, 3)

#Now data is a numpy array which looks what you need:
In [85]: data
Out[85]: 
array([['a', 'nxxx', '3.2'],
       ['h45', 'cacpp', '45.2'],
       ['trx', 'v2o5p', '0.24'],
       ['blnt', 'g2o3', '807.2'],
       ['a', 'nxxx', '3.7'],
       ['h45', 'cacpp', '61.76'],
       ['trx', 'v2o5p', '0.81'],
       ['blnt', 'g2o3', '905.8'],
       ['a', 'nxxx', '22.4'],
       ['h45', 'cacpp', '101.2'],
       ['trx', 'v2o5p', '0.97'],
       ['blnt', 'g2o3', '10089'],
       ['a', 'nxxx', '18.2'],
       ['h45', 'cacpp', '171.89'],
       ['trx', 'v2o5p', '1.2'],
       ['blnt', 'g2o3', '10345'],
       ['a', 'nxxx', '9.7'],
       ['h45', 'cacpp', '203.7'],
       ['trx', 'v2o5p', '1.98'],
       ['blnt', 'g2o3', '10979']], 
      dtype='|S6')

You can put datainto a pandas DataFrame

你可以放入data一个pandas DataFrame

In [86]: df = pd.DataFrame (data, columns=y[0][:3])

In [87]: df
Out[87]: 
   Formula  Phase   Value
0        a   nxxx     3.2
1      h45  cacpp    45.2
2      trx  v2o5p    0.24
3     blnt   g2o3   807.2
4        a   nxxx     3.7
5      h45  cacpp   61.76
6      trx  v2o5p    0.81
7     blnt   g2o3   905.8
8        a   nxxx    22.4
9      h45  cacpp   101.2
10     trx  v2o5p    0.97
11    blnt   g2o3   10089
12       a   nxxx    18.2
13     h45  cacpp  171.89
14     trx  v2o5p     1.2
15    blnt   g2o3   10345
16       a   nxxx     9.7
17     h45  cacpp   203.7
18     trx  v2o5p    1.98
19    blnt   g2o3   10979

回答by chrisb

Assuming some dummy data:

假设一些虚拟数据:

In [122]: y1 = ('Formula', 'Phase', 'Value', 1, 2, 3, 4, 5, 6)
In [123]: y2 = ('Formula', 'Phase', 'Value', 7, 8, 9, 10, 11, 12)
In [124]: y = [y1, y2]

And using this 'grouper' recipe from this answerto iterate by groups.

并使用此答案中的“石斑鱼”配方按组进行迭代。

In [125]: from itertools import izip_longest

In [126]: def grouper(iterable, n, fillvalue=None):
     ...:     args = [iter(iterable)] * n
     ...:     return izip_longest(*args, fillvalue=fillvalue)

Then you could do something like this? The grouper(y_tuple[3:], 3)iterates over the tuple in groups of 3, excluding the first 3 elements.

那么你可以做这样的事情吗?该grouper(y_tuple[3:], 3)遍历中的3个基团,不包括前3个元素的元组。

In [127]: columns = y[0][:3]

In [128]: data = []
     ...: for y_tuple in y:
     ...:     for group_of_3 in grouper(y_tuple[3:], 3):
     ...:         data.append(list(group_of_3))
     ...:         

In [129]: data
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [130]: pd.DataFrame(data=data, columns=columns)
Out[130]: 
   Formula  Phase  Value
0        1      2      3
1        4      5      6
2        7      8      9
3       10     11     12