pandas pyspark在ipython笔记本中将数据框显示为带有水平滚动的表格

Question

提问by muon

a pyspark.sql.DataFramedisplays messy with DataFrame.show()- lines wrap instead of a scroll.

apyspark.sql.DataFrame显示乱七八糟DataFrame.show()- 换行而不是滚动。

but displays with pandas.DataFrame.head

但显示为 pandas.DataFrame.head

I tried these options

我试过这些选项

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. Although the scroll works when used within Atom editor with jupyter plugin:

但没有运气。尽管在带有 jupyter 插件的 Atom 编辑器中使用时滚动有效：

Answer 1

回答by muon

this is a workaround

这是一种解决方法

spark_df.limit(5).toPandas().head()

although, I do not know the computational burden of this query. I am thinking limit()is not expensive. corrections welcome.

虽然，我不知道这个查询的计算负担。我想 limit()是不是很贵。欢迎指正。

Answer 2

回答by Karan Singla

Just edit the css file and you are good to go.

只需编辑 css 文件即可。

Open the jupyter notebook ../site-packages/notebook/static/style/style.min.cssfile.
Search for white-space: pre-wrap;, and remove it.
Save the file and restart jupyter-notebook.

打开 jupyter 笔记本../site-packages/notebook/static/style/style.min.css文件。
搜索white-space: pre-wrap;，然后将其删除。
保存文件并重新启动 jupyter-notebook。

Problem fixed. :)

问题已解决。:)

Answer 3

回答by Vijay Jangir

I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.

我不确定是否有人仍然面临这个问题。但它可以通过使用开发人员工具调整一些网站设置来解决。

WHen you do

当你做

Open developer setting (F12). and then inspect element (ctrl+shift+c) and click on the output. and uncheck whitespace attribute (see snapshot below)

打开开发人员设置 ( F12)。然后检查元件（ctrl+ shift+ c），然后点击输出。并取消选中空白属性（见下面的快照）

You just need to do this estting once. (unless you refresh the page)

你只需要做一次这个测试。（除非你刷新页面）

This will show you the exact data natively as is. No need to convert to pandas.

这将按原样向您显示本机的确切数据。无需转换为Pandas。

Answer 4

回答by Mbhatt

I created below li'l function and it works fine:

我在 li'l 函数下面创建了它，它工作正常：

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

you can use it straight on your spark queries if you like, or on any spark data frame:

如果愿意，您可以直接在 Spark 查询或任何 Spark 数据框中使用它：

printDf(spark.sql('''
select * from employee
'''))

Answer 5

回答by tallamjr

Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrapstyling can be done like so:

添加@karan-singla 和@vijay-jangir 上面给出的答案，一个方便的单行注释white-space: pre-wrap样式可以这样完成：

$ awk -i inplace '/pre-wrap/ {##代码##="/*"##代码##"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awkto update inplacelines that contain pre-wrapto be surrounded by */ -- */i.e. comment out, on the file found in styles.cssfound in your working Python environment.

这翻译为；用于awk更新包含被ie 注释掉的就地行，在您的工作 Python 环境中找到的文件中。pre-wrap*/ -- */styles.css

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

从理论上讲，如果使用多个环境，例如 Anaconda，则可以将其用作别名。

REFs:

参考文献：

pandas pyspark在ipython笔记本中将数据框显示为带有水平滚动的表格

提问by muon

回答by muon

回答by Karan Singla

回答by Vijay Jangir

回答by Mbhatt

回答by tallamjr

相关推荐

最近更新

标签

pandas pyspark在ipython笔记本中将数据框显示为带有水平滚动的表格

提问by muon

回答by muon

回答by Karan Singla

回答by Vijay Jangir

回答by Mbhatt

回答by tallamjr

相关推荐

pandas 加入数据帧 - 一个有多索引列，另一个没有

pandas 如何根据Python中的两个条件更改列的值

向 Pandas 数据透视表添加过滤器

pandas TypeError: 'DataFrame' 对象是可变的，因此它们不能被散列

相关推荐

最近更新

标签