Spark DataFrame 等价于 Pandas Dataframe `.iloc()` 方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37487170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:18:10  来源:igfitidea点击:

Spark DataFrame equivalent to Pandas Dataframe `.iloc()` method?

pandasscalaapache-sparkdataframeapache-spark-sql

提问by conner.xyz

Is there a way to reference Spark DataFrame columns by position using an integer?

有没有办法使用整数按位置引用 Spark DataFrame 列?

Analogous Pandas DataFrame operation:

类似的 Pandas DataFrame 操作:

df.iloc[:0] # Give me all the rows at column position 0 

回答by ShadyMBA

The equivalent of Python df.ilocis collect

Python 的等价物df.iloc是 collect

PySpark examples:

PySpark 示例:

X = df.collect()[0]['age'] 

or

或者

X = df.collect()[0][1]  #row 0 col 1

回答by zero323

Not really, but you can try something like this:

不是真的,但你可以尝试这样的事情:

Python:

蟒蛇

df = sc.parallelize([(1, "foo", 2.0)]).toDF()
df.select(*df.columns[:1])  # I assume [:1] is what you really want
## DataFrame[_1: bigint]

or

或者

df.select(df.columns[1:3])
## DataFrame[_2: string, _3: double]

Scala

斯卡拉

val df = sc.parallelize(Seq((1, "foo", 2.0))).toDF()
df.select(df.columns.slice(0, 1).map(col(_)): _*)

Note:

注意

Spark SQL doesn't support and it is unlikely to ever support row indexing so it is not possible to index across row dimension.

Spark SQL 不支持并且不太可能支持行索引,因此不可能跨行维度进行索引。

回答by u6939919

You can use like this in spark-shell.

您可以在 spark-shell 中像这样使用。

scala>: df.columns  
Array[String] = Array(age, name)

scala>: df.select(df.columns(0)).show()
+----+
| age|
+----+
|null|
|  30|
|  19|
+----+