对 Pandas Dataframe 中的列和行进行迭代
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48951047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Iteration over columns and rows in Pandas Dataframe
提问by Notna
Say I have a dataframe that looks like:
假设我有一个如下所示的数据框:
d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)
print(df)
option1 option2 option3 views
0 1 0 1 6
1 0 0 1 10
2 1 1 0 5
3 1 0 0 2
I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.
我正在尝试构建一个 for 循环,该循环遍历每一列(“视图”列除外)和每一行。如果单元格的值不是 0,我想用同一行中“views”列的相应值替换它。
The following output is required (should be easier to understand):
需要以下输出(应该更容易理解):
option1 option2 option3 views
0 6 0 6 6
1 0 0 10 10
2 5 5 0 5
3 2 0 0 2
I tried something like:
我试过类似的东西:
df_range = len(df)
for column in df:
for i in range(df_range):
if column != 0:
column = df.views[i]
But I know I'm missing something, it does not work.
但我知道我错过了一些东西,它不起作用。
Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!
另请注意,在我的真实数据框中,我有几十列,所以我需要一些自动迭代每一列的东西。谢谢!!
I saw this thread Update a dataframe in pandas while iterating row by rowbut it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.
我看到这个线程在逐行迭代时更新Pandas中的数据框,但它并不完全适用于我的问题,因为我不仅要逐行进行,还需要逐列进行。
采纳答案by Keith Dowd
You can also achieve the result you want this way:
您还可以通过这种方式实现您想要的结果:
for col in df:
if col == 'views':
continue
for i, row_value in df[col].iteritems():
df[col][i] = row_value * df['views'][i]
Notice the following about this solution:
请注意有关此解决方案的以下内容:
1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).
1) 此解决方案单独对数据帧中的每个值进行操作,因此效率低于广播,因为它执行两个循环(一个外部循环,一个内部循环)。
2) This solution assumes that option1
...option N are binary because essentially this solution is multiplying each binary value in option1
...option N with the values in views
.
2) 该解决方案假设option1
...option N 是二进制的,因为本质上该解决方案是将option1
...option N 中的每个二进制值与 中的值相乘views
。
3) This solution will work for any number of option columns. The option columns may have any labels you desire.
3) 此解决方案适用于任意数量的选项列。选项列可能有您想要的任何标签。
4) This solution assumes there is a column labeled views
.
4) 此解决方案假定有一列标记为views
。
回答by YOLO
You don't need to iterate through rows. This one should be faster: Ensure that the columns values are integers.
您不需要遍历行。这个应该更快:确保列值是整数。
## convert column type to integer
for i in df:
df[i] = df[i].astype(int)
## update columns
for col in df:
if col != 'views':
df[col] = df[col] * df['views']
df
option1 option2 option3 views
0 6 0 6 6
1 0 0 10 10
2 5 5 0 5
3 2 0 0 2
回答by luqman ahmad
dataSet = pd.read_excel("dataset.xlsx")
i = 0 ;
for column in dataSet:
for i in dataSet[column].iteritems():
if (column == 'views'):
print (i)
回答by luqman ahmad
I think this would work:
我认为这会奏效:
df=df.astype(int)
df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)
df[df.columns[:-1]]= df[df.columns[:-1]].mul(df['views'].as_matrix(), axis=0)