pandas 将元组列表转换为熊猫中的数据框

Question

提问by user3720101

I have a list of tuples(y) that I wish to convert to a DataFrame x. There are five tuples in y. Each tuple in y has 33 elements. Element 1 in all 5 tuples is text and is the same. Element two in all five tuples is text and is the same. Element three in each tuple is text and is the same.

我有一个要转换为 DataFrame x 的元组 (y) 列表。y 中有五个元组。y 中的每个元组都有 33 个元素。所有 5 个元组中的元素 1 是文本并且是相同的。所有五个元组中的元素二是文本并且是相同的。每个元组中的元素三是文本并且是相同的。

I'd like to first three elements in y to be the column names in the DataFrame. I want to convert the list of tuples into a 10 x 3 DataFrame. The tricky part is row 1 in the dataframe would be elements 4,5,6 in y[1], row 2 in the dataframe would be elements 7,8,9 in y[1], row 3 would be 10,11,12...etc.

我想将 y 中的前三个元素作为 DataFrame 中的列名。我想将元组列表转换为 10 x 3 DataFrame。棘手的部分是数据帧中的第 1 行将是 y[1] 中的元素 4,5,6，数据帧中的第 2 行将是 y[1] 中的元素 7,8,9，第 3 行将是 10,11， 12...等。

y looks like this (not showing the entire list) :

y 看起来像这样（未显示整个列表）：

List of tuples y                
y[0]    y[1]    y[2]    y[3]    y[4]

Formula Formula Formula Formula Formula
Phase   Phase   Phase   Phase   Phase
Value   Value   Value   Value   Value
"a" "a" "a" "a" "a"
"nxxx"  "nxxx"  "nxxx"  "nxxx"  "nxxx"
3.2 3.7 22.4    18.2    9.7
"h45"   "h45"   "h45"   "h45"   "h45"
"cacpp" "cacpp" "cacpp" "cacpp" "cacpp"
45.2    61.76   101.2   171.89  203.7
"trx"   "trx"   "trx"   "trx"   "trx"
"v2o5p" "v2o5p" "v2o5p" "v2o5p" "v2o5p"
0.24    0.81    0.97    1.2 1.98
"blnt"  "blnt"  "blnt"  "blnt"  "blnt"
"g2o3"  "g2o3"  "g2o3"  "g2o3"  "g2o3"
807.2   905.8   10089   10345   10979

I want to convert y into DataFrame x as follows:

我想将 y 转换为 DataFrame x 如下：

DataFrame x     
column 1 column 2 column 3

Formula Phase   Value
"a" "nxxx"  3.2
"h45"   "cacpp" 45.2
"trx"   "v2o5p" 0.24
"blnt"  "g2o3"  807.2
"a" "nxxx"  3.7
"h45"   "cacpp" 61.76
"trx"   "v2o5p" 0.81
"blnt"  "g2o3"  905.8
"a" "nxxx"  22.4
"h45"   "cacpp" 101.2
"trx"   "v2o5p" 0.97
"blnt"  "g2o3"  10089
etc etc etc

I know there must be an easy way to iterate through the list of tuples. But new to Pandas and relatively new to Python so I'm struggling with a clean way to do this.

我知道必须有一种简单的方法来遍历元组列表。但是 Pandas 的新手和 Python 的新手，所以我正在努力寻找一种干净的方法来做到这一点。

Answer 1

回答by Happy001

Basically, you need: 1) remove first 3 element of each tuple (just need one as column header) 2) concatenateall elements in y3) reshapeto 3 columns All these can be achieved with numpywhich you must be familiar if you are using pandas

基本上，您需要：1）删除每个元组的前 3 个元素（只需要一个作为列标题）2） 3）中的concatenate所有元素到 3 列所有这些都可以实现，如果您正在使用，您必须熟悉yreshapenumpypandas

#Step 1) and 2) above.
In [83]: data = np.concatenate ([z[3:] for z in y])

#reshape
In [84]: data = data.reshape(-1, 3)

#Now data is a numpy array which looks what you need:
In [85]: data
Out[85]: 
array([['a', 'nxxx', '3.2'],
       ['h45', 'cacpp', '45.2'],
       ['trx', 'v2o5p', '0.24'],
       ['blnt', 'g2o3', '807.2'],
       ['a', 'nxxx', '3.7'],
       ['h45', 'cacpp', '61.76'],
       ['trx', 'v2o5p', '0.81'],
       ['blnt', 'g2o3', '905.8'],
       ['a', 'nxxx', '22.4'],
       ['h45', 'cacpp', '101.2'],
       ['trx', 'v2o5p', '0.97'],
       ['blnt', 'g2o3', '10089'],
       ['a', 'nxxx', '18.2'],
       ['h45', 'cacpp', '171.89'],
       ['trx', 'v2o5p', '1.2'],
       ['blnt', 'g2o3', '10345'],
       ['a', 'nxxx', '9.7'],
       ['h45', 'cacpp', '203.7'],
       ['trx', 'v2o5p', '1.98'],
       ['blnt', 'g2o3', '10979']], 
      dtype='|S6')

You can put datainto a pandas DataFrame

你可以放入data一个pandas DataFrame

In [86]: df = pd.DataFrame (data, columns=y[0][:3])

In [87]: df
Out[87]: 
   Formula  Phase   Value
0        a   nxxx     3.2
1      h45  cacpp    45.2
2      trx  v2o5p    0.24
3     blnt   g2o3   807.2
4        a   nxxx     3.7
5      h45  cacpp   61.76
6      trx  v2o5p    0.81
7     blnt   g2o3   905.8
8        a   nxxx    22.4
9      h45  cacpp   101.2
10     trx  v2o5p    0.97
11    blnt   g2o3   10089
12       a   nxxx    18.2
13     h45  cacpp  171.89
14     trx  v2o5p     1.2
15    blnt   g2o3   10345
16       a   nxxx     9.7
17     h45  cacpp   203.7
18     trx  v2o5p    1.98
19    blnt   g2o3   10979

Answer 2

回答by chrisb

Assuming some dummy data:

假设一些虚拟数据：

In [122]: y1 = ('Formula', 'Phase', 'Value', 1, 2, 3, 4, 5, 6)
In [123]: y2 = ('Formula', 'Phase', 'Value', 7, 8, 9, 10, 11, 12)
In [124]: y = [y1, y2]

And using this 'grouper' recipe from this answerto iterate by groups.

并使用此答案中的“石斑鱼”配方按组进行迭代。

In [125]: from itertools import izip_longest

In [126]: def grouper(iterable, n, fillvalue=None):
     ...:     args = [iter(iterable)] * n
     ...:     return izip_longest(*args, fillvalue=fillvalue)

Then you could do something like this? The grouper(y_tuple[3:], 3)iterates over the tuple in groups of 3, excluding the first 3 elements.

那么你可以做这样的事情吗？该grouper(y_tuple[3:], 3)遍历中的3个基团，不包括前3个元素的元组。

In [127]: columns = y[0][:3]

In [128]: data = []
     ...: for y_tuple in y:
     ...:     for group_of_3 in grouper(y_tuple[3:], 3):
     ...:         data.append(list(group_of_3))
     ...:         

In [129]: data
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [130]: pd.DataFrame(data=data, columns=columns)
Out[130]: 
   Formula  Phase  Value
0        1      2      3
1        4      5      6
2        7      8      9
3       10     11     12

pandas 将元组列表转换为熊猫中的数据框

提问by user3720101

回答by Happy001

回答by chrisb

相关推荐

最近更新

标签

pandas 将元组列表转换为熊猫中的数据框

提问by user3720101

回答by Happy001

回答by chrisb

相关推荐

pandas 酸洗数据帧

Python Pandas MemoryError

pandas 如何在熊猫时间序列中基于 5 分钟的间隔创建组 ID？

pandas Python - 熊猫 - 将系列附加到空白数据帧中

相关推荐

最近更新

标签