scala 如何计算 Spark Dataframe 中的列数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51553569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:35:34  来源:igfitidea点击:

How to count number of columns in Spark Dataframe?

scalaapache-sparkdataframeapache-spark-sql

提问by Rahul Pandey

I have this dataframe in Spark I want to count the number of available columns in it. I know how to count the number of rows in column but I want to count number of columns.

我在 Spark 中有这个数据框,我想计算其中的可用列数。我知道如何计算列中的行数,但我想计算列数。

val df1 = Seq(
    ("spark", "scala",  "2015-10-14", 10,"rahul"),
    ("spark", "scala", "2015-10-15", 11,"abhishek"),
    ("spark", "scala", "2015-10-16", 12,"Jay"),
    ("spark","scala",null,13,"Kiran"))
  .toDF("bu_name","client_name","date","patient_id","paitent _name")
df1.show

Can anybody tell me how I can count number of column count in this dataframe? I am using the Scala language.

谁能告诉我如何计算此数据框中的列数?我正在使用 Scala 语言。

回答by Shaido - Reinstate Monica

To count the number of columns, simply do:

要计算列数,只需执行以下操作:

df1.columns.size

回答by jillm_5

In python, the following code worked for me:

在python中,以下代码对我有用:

print(len(df.columns))

回答by Neville Lusimba

data.columns accesses the list of column titles. All you have to do is count the number of items in the list. so

data.columns 访问列标题列表。您所要做的就是计算列表中的项目数。所以

len(df1.columns)

works To obtain the whole data in a single variable, we do

作品 为了获得单个变量中的全部数据,我们做

rows = df.count()
columns = len(df.columns)
size = (rows, columns)
print(size)

回答by Kris

The length of the mutable indexed sequence also work.

可变索引序列的长度也有效。

df.columns.length

回答by Saeid SOHEILY KHAH

To count the columns of a Spark dataFrame:

要计算 Spark 数据帧的列数:

len(df1.columns)

and to count the number of rows of a dataFrame:

并计算数据帧的行数:

df1.count()

回答by KeepLearning

in Pyspark you can just result.select("your column").count()

在 Pyspark 你可以 result.select("your column").count()