pandas pyspark在ipython笔记本中将数据框显示为带有水平滚动的表格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43427138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:23:56  来源:igfitidea点击:

pyspark show dataframe as table with horizontal scroll in ipython notebook

pandaspysparkipythonjupyter-notebookpyspark-sql

提问by muon

a pyspark.sql.DataFramedisplays messy with DataFrame.show()- lines wrap instead of a scroll.

apyspark.sql.DataFrame显示乱七八糟DataFrame.show()- 换行而不是滚动。

enter image description here

在此处输入图片说明

but displays with pandas.DataFrame.headenter image description here

但显示为 pandas.DataFrame.head在此处输入图片说明

I tried these options

我试过这些选项

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. Although the scroll works when used within Atom editor with jupyter plugin:

但没有运气。尽管在带有 jupyter 插件的 Atom 编辑器中使用时滚动有效:

enter image description here

在此处输入图片说明

回答by muon

this is a workaround

这是一种解决方法

spark_df.limit(5).toPandas().head()

although, I do not know the computational burden of this query. I am thinking limit()is not expensive. corrections welcome.

虽然,我不知道这个查询的计算负担。我想 limit()是不是很贵。欢迎指正。

回答by Karan Singla

Just edit the css file and you are good to go.

只需编辑 css 文件即可。

  1. Open the jupyter notebook ../site-packages/notebook/static/style/style.min.cssfile.

  2. Search for white-space: pre-wrap;, and remove it.

  3. Save the file and restart jupyter-notebook.

  1. 打开 jupyter 笔记本../site-packages/notebook/static/style/style.min.css文件。

  2. 搜索white-space: pre-wrap;,然后将其删除。

  3. 保存文件并重新启动 jupyter-notebook。

Problem fixed. :)

问题已解决。:)

回答by Vijay Jangir

I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.

我不确定是否有人仍然面临这个问题。但它可以通过使用开发人员工具调整一些网站设置来解决。

WHen you do enter image description here

当你做 在此处输入图片说明

Open developer setting (F12). and then inspect element (ctrl+shift+c) and click on the output. and uncheck whitespace attribute (see snapshot below) enter image description here

打开开发人员设置 ( F12)。然后检查元件(ctrl+ shift+ c),然后点击输出。并取消选中空白属性(见下面的快照) 在此处输入图片说明

You just need to do this estting once. (unless you refresh the page)

你只需要做一次这个测试。(除非你刷新页面)

This will show you the exact data natively as is. No need to convert to pandas.

这将按原样向您显示本机的确切数据。无需转换为Pandas。

回答by Mbhatt

I created below li'l function and it works fine:

我在 li'l 函数下面创建了它,它工作正常:

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

you can use it straight on your spark queries if you like, or on any spark data frame:

如果愿意,您可以直接在 Spark 查询或任何 Spark 数据框中使用它:

printDf(spark.sql('''
select * from employee
'''))

回答by tallamjr

Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrapstyling can be done like so:

添加@karan-singla 和@vijay-jangir 上面给出的答案,一个方便的单行注释white-space: pre-wrap样式可以这样完成:

$ awk -i inplace '/pre-wrap/ {##代码##="/*"##代码##"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awkto update inplacelines that contain pre-wrapto be surrounded by */ -- */i.e. comment out, on the file found in styles.cssfound in your working Python environment.

这翻译为;用于awk更新包含被ie 注释掉的就地行,在您的工作 Python 环境中找到的文件中。pre-wrap*/ -- */styles.css

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

从理论上讲,如果使用多个环境,例如 Anaconda,则可以将其用作别名。

REFs:

参考文献: