Python:用于元组的 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37092187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Pandas DataFrame for tuples
提问by Rebin
Is this a correct way of creating DataFrame for tuples? (assume that the tuples are created inside code fragment)
这是为元组创建 DataFrame 的正确方法吗?(假设元组是在代码片段中创建的)
import pandas as pd
import numpy as np
import random
row = ['a','b','c']
col = ['A','B','C','D']
# use numpy for creating a ZEROS matrix
st = np.zeros((len(row),len(col)))
df2 = pd.DataFrame(st, index=row, columns=col)
# CONVERT each cell to an OBJECT for inserting tuples
for c in col:
df2[c] = df2[c].astype(object)
print df2
for i in row:
for j in col:
df2.set_value(i, j, (i+j, np.round(random.uniform(0, 1), 4)))
print df2
As you can see I first created a zeros(3,4)
in numpy and then made each cell an OBJECT type in Pandas so I can insert tuples. Is this correct way to do or there is a better solution to ADD/RETRIVE tuples to matrices?
如您所见,我首先zeros(3,4)
在 numpy 中创建了一个,然后在 Pandas 中将每个单元格设为 OBJECT 类型,以便我可以插入元组。这是正确的做法还是有更好的解决方案来向矩阵添加/返回元组?
Results are fine:
结果很好:
A B C D
a 0 0 0 0
b 0 0 0 0
c 0 0 0 0
A B C D
a (aA, 0.7134) (aB, 0.006) (aC, 0.1948) (aD, 0.2158)
b (bA, 0.2937) (bB, 0.8083) (bC, 0.3597) (bD, 0.324)
c (cA, 0.9534) (cB, 0.9666) (cC, 0.7489) (cD, 0.8599)
回答by unutbu
First, to answer your literal question: You can construct DataFrames from a list of lists. The values in the list of lists can themselves be tuples:
首先,回答您的字面问题:您可以从列表列表中构建 DataFrame。列表列表中的值本身可以是元组:
import numpy as np
import pandas as pd
np.random.seed(2016)
row = ['a','b','c']
col = ['A','B','C','D']
data = [[(i+j, round(np.random.uniform(0, 1), 4)) for j in col] for i in row]
df = pd.DataFrame(data, index=row, columns=col)
print(df)
yields
产量
A B C D
a (aA, 0.8967) (aB, 0.7302) (aC, 0.7833) (aD, 0.7417)
b (bA, 0.4621) (bB, 0.6426) (bC, 0.2249) (bD, 0.7085)
c (cA, 0.7471) (cB, 0.6251) (cC, 0.58) (cD, 0.2426)
Having said that, beware that storing tuples in DataFrames dooms you to Python-speed loops. To take advantage of fast Pandas/NumPy routines, you need to use native NumPy dtypes such as np.float64
(whereas, in contrast, tuples require "object" dtype).
话虽如此,请注意将元组存储在 DataFrame 中会让您陷入 Python 速度的循环。要利用快速 Pandas/NumPy 例程,您需要使用本机 NumPy 数据类型,例如np.float64
(而相比之下,元组需要“对象”数据类型)。
So perhaps a better solution for your purpose is to use two separate DataFrames, one for the strings and one for the numbers:
因此,也许针对您的目的更好的解决方案是使用两个单独的 DataFrame,一个用于字符串,另一个用于数字:
import numpy as np
import pandas as pd
np.random.seed(2016)
row=['a','b','c']
col=['A','B','C','D']
prevstate = pd.DataFrame([[i+j for j in col] for i in row], index=row, columns=col)
prob = pd.DataFrame(np.random.uniform(0, 1, size=(len(row), len(col))).round(4),
index=row, columns=col)
print(prevstate)
# A B C D
# a aA aB aC aD
# b bA bB bC bD
# c cA cB cC cD
print(prob)
# A B C D
# a 0.8967 0.7302 0.7833 0.7417
# b 0.4621 0.6426 0.2249 0.7085
# c 0.7471 0.6251 0.5800 0.2426
To loop through the columns, find the row with maximum probability and retrieve the corresponding prevstate
, you could use .idxmax
and .loc
:
要遍历列,找到概率最大的行并检索相应的prevstate
,您可以使用.idxmax
和.loc
:
for col in prob.columns:
idx = (prob[col].idxmax())
print('{}: {}'.format(prevstate.loc[idx, col], prob.loc[idx, col]))
yields
产量
aA: 0.8967
aB: 0.7302
aC: 0.7833
aD: 0.7417