什么是 Python pandas 中的 str()、summary() 和 head() 等 R 函数的等价物?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27637281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:02:52  来源:igfitidea点击:

What are Python pandas equivalents for R functions like str(), summary(), and head()?

pythonrpandas

提问by megashigger

I'm only aware of the describe()function. Are there any other functions similar to str(), summary(), and head()?

我只知道这个describe()功能。是否还有其他的功能类似str()summary()head()

采纳答案by omer sagy

  • summary()~ describe()
  • head()~ head()
  • summary()~ describe()
  • head()~ head()

I'm not sure about the str()equivalent.

我不确定str()等价物。

回答by Wakaru44

I don't know much about R, but here are some leads:

我对 R 了解不多,但这里有一些线索:

str => 

difficult one... for functions you can use dir(), dir() on datasets will give you all the methods, so maybe that's not what you want...

困难的一个......对于你可以在数据集上使用 dir() 的函数,dir() 会给你所有的方法,所以也许这不是你想要的......

summary => describe. 

See the parameters to customize the results.

查看参数以自定义结果。

head => your can use head(), or use slices. 

head as you already do. To get the first 10 rows of a dataset called ds ds[:10]same for tail ds[:-10]

像你已经做的那样。获取名为 ds ds[:10]same for tail的数据集的前 10 行ds[:-10]

回答by jjurach

This provides output similar to R's str(). It presents unique values instead of initial values.

这提供了类似于 R 的输出str()。它呈现唯一值而不是初始值。

def rstr(df): return df.shape, df.apply(lambda x: [x.unique()])

print(rstr(iris))

((150, 5), sepal_length    [[5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.4, 4.8, 4.3,...
sepal_width     [[3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 2.9, 3.7,...
petal_length    [[1.4, 1.3, 1.5, 1.7, 1.6, 1.1, 1.2, 1.0, 1.9,...
petal_width     [[0.2, 0.4, 0.3, 0.1, 0.5, 0.6, 1.4, 1.5, 1.3,...
class            [[Iris-setosa, Iris-versicolor, Iris-virginica]]
dtype: object)

回答by Martin Thoma

Pandas offers an extensive Comparison with R / R libraries. The most obvious difference is that R prefers functional programming while Pandas is object orientated, with the data frame as the key object. Another difference between R and Python is that Python starts arrays at 0, but R at 1.

Pandas 提供了与 R/R 库的广泛比较。最明显的区别是 R 更喜欢函数式编程,而 Pandas 是面向对象的,以数据框为关键对象。R 和 Python 的另一个区别是 Python 从 0 开始数组,而 R 从 1 开始。

R               | Pandas
-------------------------------
summary(df)     | df.describe()
head(df)        | df.head()
dim(df)         | df.shape
slice(df, 1:10) | df.iloc[:9]

回答by fubar2021

For a Python equivalent to the str()function in R, I use the method dtypes. This will provide the data types for each column.

对于str()与 R 中的函数等效的 Python ,我使用方法dtypes. 这将为每一列提供数据类型。

In [22]: df2.dtypes
Out[22]: 
Survived      int64
Pclass        int64
Sex          object
Age         float64
SibSp         int64
Parch         int64
Ticket       object
Fare        float64
Cabin        object
Embarked     object
dtype: object

回答by reedcourty

In pandas the info()method creates a very similar output like R's str():

在 Pandas 中,该info()方法创建了一个与 R 非常相似的输出str()

> str(train)
'data.frame':   891 obs. of  13 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
 $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
 $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
 $ Child      : num  0 0 0 0 0 NA 0 1 0 1 ...


train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB

回答by neves

I still prefer str()because it list some examples. A confusing aspect of infois that its behavior depends on some environment settings like pandas.options.display.max_info_columns.

我仍然更喜欢,str()因为它列出了一些例子。的一个令人困惑的方面info是它的行为取决于某些环境设置,例如pandas.options.display.max_info_columns.

I think the best alternative is to call infowith some other parameters that will force a fixed behavior:

我认为最好的选择是info使用其他一些强制固定行为的参数进行调用:

df.info(null_counts=True, verbose=True)

And for your other functions:

对于您的其他功能:

summary(df)     | df.describe()
head(df)        | df.head()
dim(df)         | df.shape