pandas 熊猫,按列和行选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30033850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:18:27  来源:igfitidea点击:

Pandas, selecting by column and row

pythonpandas

提问by Andrew Spott

I want to sum up all values that I select based on some function of column and row.

我想根据列和行的某些功能总结我选择的所有值。

Another way of putting it is that I want to use a function of the row index and column index to determine if a value should be included in a sum along an axis.

另一种表达方式是,我想使用行索引和列索引的函数来确定值是否应包含在沿轴的总和中。

Is there an easy way of doing this?

有没有简单的方法来做到这一点?

回答by Haleemur Ali

Columns can be selected using the syntax dataframe[<list of columns>]. The index (row) can be used for filtering using the dataframe.indexmethod.

可以使用语法选择列dataframe[<list of columns>]。索引(行)可用于使用该dataframe.index方法进行过滤。

import pandas as pd

df = pd.DataFrame({'a': [0.1, 0.2], 'b': [0.2, 0.1]})

odd_a = df['a'][df.index % 2 == 1]
even_b = df['b'][df.index % 2 == 0]
# odd_a: 
# 1    0.2
# Name: a, dtype: float64
# even_b: 
# 0    0.2
# Name: b, dtype: float64

回答by fixxxer

If dfis your dataframe :

如果df是您的数据框:

In [477]: df
Out[477]: 
   A   s2  B
0  1    5  5
1  2    3  5
2  4    5  5

You can access the odd rows like this :

您可以像这样访问奇数行:

In [478]: df.loc[1::2]
Out[478]: 
   A   s2  B
1  2    3  5

and the even ones like this:

偶数是这样的:

In [479]: df.loc[::2]
Out[479]: 
   A   s2  B
0  1    5  5
2  4    5  5

To answer your question, getting even rows and column Bwould be :

要回答您的问题,获得偶数行和列B将是:

In [480]: df.loc[::2,'B']
Out[480]: 
0    5
2    5
Name: B, dtype: int64

and odd rows and column Acan be done as:

和奇数行和列A可以这样做:

In [481]: df.loc[1::2,'A']
Out[481]: 
1    2
Name: A, dtype: int64

回答by Matt

I think this should be fairly general if not the cleanest implementation. This should allow applying separate functions for rows and columns depending on conditions (that I defined here in dictionaries).

我认为这应该是相当通用的,如果不是最干净的实现。这应该允许根据条件(我在字典中定义的)为行和列应用单独的函数。

import numpy as np
import pandas as pd

ran = np.random.randint(0,10,size=(5,5))
df = pd.DataFrame(ran,columns = ["a","b","c","d","e"])

# A dictionary to define what function is passed
d_col = {"high":["a","c","e"], "low":["b","d"]}
d_row = {"high":[1,2,3], "low":[0,4]}

# Generate list of Pandas boolean Series
i_col = [df[i].apply(lambda x: x>5) if i in d_col["high"] else df[i].apply(lambda x: x<5) for i in df.columns]

# Pass the series as a matrix
df = df[pd.concat(i_col,axis=1)]

# Now do this again for rows
i_row = [df.T[i].apply(lambda x: x>5) if i in d_row["high"] else df.T[i].apply(lambda x: x<5) for i in df.T.columns]

# Return back the DataFrame in original shape
df = df.T[pd.concat(i_row,axis=1)].T

# Perform the final operation such as sum on the returned DataFrame
print(df.sum().sum())