Python Pandas scatter_matrix - 绘制分类变量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28034424/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:38:23  来源:igfitidea点击:

Pandas scatter_matrix - plot categorical variables

pythonpandasmatplotlibkaggle

提问by Geoffrey Stoel

I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data

我正在查看 Kaggle 比赛中著名的泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data

I have loaded and processed the data using:

我已经使用以下方法加载和处理了数据:

# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# load the data from the file
df = pd.read_csv('./data/train.csv')

# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix

# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']

# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))

df.info()

scatter_matrix from matplotlib

来自 matplotlib 的 scatter_matrix

How can I add the categorical columns like Sex and Embarked to the plot?

如何将诸如 Sex 和 Embarked 之类的分类列添加到情节中?

采纳答案by knightofni

You need to transform the categorical variables into numbers to plot them.

您需要将分类变量转换为数字以绘制它们。

Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

示例(假设“Sex”列包含性别数据,“M”代表男性,“F”代表女性)

df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1

Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

现在所有女性都用 0 表示,男性用 1 表示。未知性别(如果有的话)将被忽略。

The rest of your code should process the updated dataframe nicely.

其余代码应该很好地处理更新后的数据帧。

回答by Geoffrey Stoel

after googling and remembering something like the .map() function I fixed it in the following way:

在谷歌搜索并记住类似 .map() 函数的内容后,我通过以下方式修复了它:

colors=['red','green'] # color codes for survived : 0=red or 1=green

# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])    
df['gender']=df.Sex.map(gender)

# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)

# add survived also back to the df
df['survived']=target

now I can plot it again...and drop the added columns afterwards.

现在我可以再次绘制它......然后删除添加的列。

thanks everyone for responding.....

谢谢大家的回复。。。。。。