Python 从 DataFrame 中的特定列中选择非空行并从其他列中进行子选择

Question

提问by EdChum

I have a dataFrame which has several coulmns, so i choosed some of its coulmns to create a variable like this xtrain = df[['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]i want to drop from these coulmns all raws that the Survive coulmn in the main dataFrame is nan.

我有一个包含多个库尔姆的数据帧，所以我选择了它的一些库尔姆来创建一个这样的变量，xtrain = df[['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]我想从这些库中删除主数据帧中的生存库是 nan 的所有原始数据。

Answer 1

回答by EdChum

You can pass a boolean mask to your df based on notnull()of 'Survive' column and select the cols of interest:

您可以根据notnull()“生存”列将布尔掩码传递给您的 df并选择感兴趣的列：

In [2]:
# make some data
df = pd.DataFrame(np.random.randn(5,7), columns= ['Survive', 'Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ])
df['Survive'].iloc[2] = np.NaN
df
Out[2]:
    Survive       Age      Fare  Group_Size      deck    Pclass     Title
0  1.174206 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  0.036843  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
2       NaN -0.132394 -0.236904   -0.324087  0.570660  0.758084 -0.176421
3 -2.145934 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.197144 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Now pass a mask to locto take only non NaNrows:

现在传递一个掩码来loc只取非NaN行：

In [3]:
xtrain = df.loc[df['Survive'].notnull(), ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]
xtrain

Out[3]:
        Age      Fare  Group_Size      deck    Pclass     Title
0 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
3 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Answer 2

回答by piRSquared

Two alternatives because... well why not?
Both drop nanprior to column slicing. That's two call rather than EdChum's one call.

两种选择，因为......为什么不呢？
两者都nan在列切片之前下降。这是两次通话而不是 EdChum 的一次通话。

one

一

df.dropna(subset=['Survive'])[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

two

二

df.query('Survive == Survive')[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

Python 从 DataFrame 中的特定列中选择非空行并从其他列中进行子选择

提问by EdChum

回答by EdChum

回答by piRSquared

相关推荐

最近更新

标签

Python 从 DataFrame 中的特定列中选择非空行并从其他列中进行子选择

提问by EdChum

回答by EdChum

回答by piRSquared

相关推荐

Python JSONDecodeError: 需要 ',' 分隔符：第 1 行第 43 列（字符 42）

Python 轴类 - 以给定单位明确设置轴的大小（宽度/高度）

Python 如何在 Tensorflow 中关闭 dropout 以进行测试？

Python 确定 Pandas 列数据类型

相关推荐

最近更新

标签