pandas pyspark在ipython笔记本中将数据框显示为带有水平滚动的表格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43427138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pyspark show dataframe as table with horizontal scroll in ipython notebook
提问by muon
a pyspark.sql.DataFrame
displays messy with DataFrame.show()
- lines wrap instead of a scroll.
apyspark.sql.DataFrame
显示乱七八糟DataFrame.show()
- 换行而不是滚动。
but displays with pandas.DataFrame.head
I tried these options
我试过这些选项
import IPython
IPython.auto_scroll_threshold = 9999
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
but no luck. Although the scroll works when used within Atom editor with jupyter plugin:
但没有运气。尽管在带有 jupyter 插件的 Atom 编辑器中使用时滚动有效:
回答by muon
this is a workaround
这是一种解决方法
spark_df.limit(5).toPandas().head()
although, I do not know the computational burden of this query. I am thinking limit()
is not expensive. corrections welcome.
虽然,我不知道这个查询的计算负担。我想 limit()
是不是很贵。欢迎指正。
回答by Karan Singla
Just edit the css file and you are good to go.
只需编辑 css 文件即可。
Open the jupyter notebook
../site-packages/notebook/static/style/style.min.css
file.Search for
white-space: pre-wrap;
, and remove it.Save the file and restart jupyter-notebook.
打开 jupyter 笔记本
../site-packages/notebook/static/style/style.min.css
文件。搜索
white-space: pre-wrap;
,然后将其删除。保存文件并重新启动 jupyter-notebook。
Problem fixed. :)
问题已解决。:)
回答by Vijay Jangir
I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.
我不确定是否有人仍然面临这个问题。但它可以通过使用开发人员工具调整一些网站设置来解决。
Open developer setting (F12). and then inspect element (ctrl+shift+c)
and click on the output. and uncheck whitespace attribute (see snapshot below)
打开开发人员设置 ( F12)。然后检查元件(ctrl+ shift+ c),然后点击输出。并取消选中空白属性(见下面的快照)
You just need to do this estting once. (unless you refresh the page)
你只需要做一次这个测试。(除非你刷新页面)
This will show you the exact data natively as is. No need to convert to pandas.
这将按原样向您显示本机的确切数据。无需转换为Pandas。
回答by Mbhatt
I created below li'l function and it works fine:
我在 li'l 函数下面创建了它,它工作正常:
def printDf(sprkDF):
newdf = sprkDF.toPandas()
from IPython.display import display, HTML
return HTML(newdf.to_html())
you can use it straight on your spark queries if you like, or on any spark data frame:
如果愿意,您可以直接在 Spark 查询或任何 Spark 数据框中使用它:
printDf(spark.sql('''
select * from employee
'''))
回答by tallamjr
Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrap
styling can be done like so:
添加@karan-singla 和@vijay-jangir 上面给出的答案,一个方便的单行注释white-space: pre-wrap
样式可以这样完成:
$ awk -i inplace '/pre-wrap/ {##代码##="/*"##代码##"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css
This translates as; use awk
to update inplacelines that contain pre-wrap
to be surrounded by */ -- */
i.e. comment out, on the file found in styles.css
found in your working Python environment.
这翻译为;用于awk
更新包含被ie 注释掉的就地行,在您的工作 Python 环境中找到的文件中。pre-wrap
*/ -- */
styles.css
This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.
从理论上讲,如果使用多个环境,例如 Anaconda,则可以将其用作别名。
REFs:
参考文献: