Python 逐行编辑熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20692122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Edit pandas dataframe row-by-row
提问by Jonas Lindel?v
pandas for python is neat. I'm trying to replace a list-of-dictionaries with a pandas-dataframe. However, I'm wondering of there's a way to change values row-by-row in a for-loop just as easy?
python的熊猫很整洁。我正在尝试用熊猫数据框替换字典列表。但是,我想知道有没有一种方法可以同样简单地在 for 循环中逐行更改值?
Here's the non-pandas dict-version:
这是非熊猫字典版本:
trialList = [
{'no':1, 'condition':2, 'response':''},
{'no':2, 'condition':1, 'response':''},
{'no':3, 'condition':1, 'response':''}
] # ... and so on
for trial in trialList:
# Do something and collect response
trial['response'] = 'the answer!'
... and now trialListcontains the updated values because trialrefers back to that. Very handy! But the list-of-dicts is very unhandy, especially because I'd like to be able to compute stuff column-wise which pandas excel at.
...现在trialList包含更新的值,因为trial指的是那个。非常便利!但是字典列表非常不方便,特别是因为我希望能够按列计算熊猫擅长的东西。
So given trialList from above, I though I could make it even better by doing something pandas-like:
因此,鉴于上面的trialList,我虽然可以通过做一些类似熊猫的事情来使它变得更好:
import pandas as pd
dfTrials = pd.DataFrame(trialList) # makes a nice 3-column dataframe with 3 rows
for trial in dfTrials.iterrows():
# do something and collect response
trials[1]['response'] = 'the answer!'
... but trialListremains unchanged here. Is there an easy way to update values row-by-row, perhaps equivalent to the dict-version? It is important that it's row-by-row as this is for an experiment where participants are presented with a lot of trials and various data is collected on each single trial.
...但trialList在这里保持不变。有没有一种简单的方法可以逐行更新值,也许相当于 dict-version?重要的是它是逐行的,因为这是一个实验,在这个实验中,参与者会看到很多试验,并且每次试验都会收集各种数据。
采纳答案by DSM
If you really want row-by-row ops, you could use iterrowsand loc:
如果你真的想要逐行操作,你可以使用iterrowsand loc:
>>> for i, trial in dfTrials.iterrows():
... dfTrials.loc[i, "response"] = "answer {}".format(trial["no"])
...
>>> dfTrials
condition no response
0 2 1 answer 1
1 1 2 answer 2
2 1 3 answer 3
[3 rows x 3 columns]
Better though is when you can vectorize:
更好的是当您可以矢量化时:
>>> dfTrials["response 2"] = dfTrials["condition"] + dfTrials["no"]
>>> dfTrials
condition no response response 2
0 2 1 answer 1 3
1 1 2 answer 2 3
2 1 3 answer 3 4
[3 rows x 4 columns]
And there's always apply:
而且总是有apply:
>>> def f(row):
... return "c{}n{}".format(row["condition"], row["no"])
...
>>> dfTrials["r3"] = dfTrials.apply(f, axis=1)
>>> dfTrials
condition no response response 2 r3
0 2 1 answer 1 3 c2n1
1 1 2 answer 2 3 c1n2
2 1 3 answer 3 4 c1n3
[3 rows x 5 columns]

