Spark DataFrame 等价于 Pandas Dataframe `.iloc()` 方法？

Question

提问by conner.xyz

Is there a way to reference Spark DataFrame columns by position using an integer?

有没有办法使用整数按位置引用 Spark DataFrame 列？

Analogous Pandas DataFrame operation:

类似的 Pandas DataFrame 操作：

df.iloc[:0] # Give me all the rows at column position 0

Answer 1

回答by ShadyMBA

The equivalent of Python df.ilocis collect

Python 的等价物df.iloc是 collect

PySpark examples:

PySpark 示例：

X = df.collect()[0]['age']

or

或者

X = df.collect()[0][1]  #row 0 col 1

Answer 2

回答by zero323

Not really, but you can try something like this:

不是真的，但你可以尝试这样的事情：

Python:

蟒蛇：

df = sc.parallelize([(1, "foo", 2.0)]).toDF()
df.select(*df.columns[:1])  # I assume [:1] is what you really want
## DataFrame[_1: bigint]

or

或者

df.select(df.columns[1:3])
## DataFrame[_2: string, _3: double]

Scala

斯卡拉

val df = sc.parallelize(Seq((1, "foo", 2.0))).toDF()
df.select(df.columns.slice(0, 1).map(col(_)): _*)

Note:

注意：

Spark SQL doesn't support and it is unlikely to ever support row indexing so it is not possible to index across row dimension.

Spark SQL 不支持并且不太可能支持行索引，因此不可能跨行维度进行索引。

Answer 3

回答by u6939919

You can use like this in spark-shell.

您可以在 spark-shell 中像这样使用。

scala>: df.columns  
Array[String] = Array(age, name)

scala>: df.select(df.columns(0)).show()
+----+
| age|
+----+
|null|
|  30|
|  19|
+----+

Spark DataFrame 等价于 Pandas Dataframe `.iloc()` 方法？

提问by conner.xyz

回答by ShadyMBA

回答by zero323

回答by u6939919

相关推荐

最近更新

标签

Spark DataFrame 等价于 Pandas Dataframe `.iloc()` 方法？

提问by conner.xyz

回答by ShadyMBA

回答by zero323

回答by u6939919

相关推荐

pandas 从本地 Jupyter Notebook 访问 Google BigQuery 数据

按值对 Pandas DataFrame 进行排序

pandas 从熊猫数据框中绘制累积图表？

将 python xgboost dMatrix 转换为 numpy ndarray 或 pandas DataFrame

相关推荐

最近更新

标签