如何每天自动运行 python jupyter notebook
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48750055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to run a python jupyter notebook daily automatically
提问by Betsy Curbelo
I have a code in a python jupyter notebook but i need to run this every day so I would like to know if there's a way to set this, I really appreciate it
我在 python jupyter 笔记本中有一个代码,但我需要每天运行它,所以我想知道是否有办法设置它,我真的很感激
采纳答案by Icarus
It's better to combine with airflow if you want to have higher quality. I packaged them in a docker image, https://github.com/michaelchanwahyan/datalab.
如果你想有更高的质量,最好结合气流。我将它们打包在一个 docker 镜像中,https://github.com/michaelchanwahyan/datalab。
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph can be generated on the flyThe output can be updated and saved within inside the notebook.
它是通过修改一个开源包 nbparameterize 并集成诸如 execution_date 的传递参数来完成的。图形可以即时生成输出可以更新并保存在笔记本内。
When it is executed
执行时
- the notebook will be read and inject the parameters
- the notebook is executed and the output will overwrite the original path
- 将读取笔记本并注入参数
- 笔记本被执行,输出将覆盖原始路径
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
此外还安装配置了spark、keras、tensorflow等常用工具。
回答by Zephro
Update
recently I came across papermill which is for executing and parameterizing notebooks.
最近更新我遇到了用于执行和参数化笔记本的造纸厂。
https://github.com/nteract/papermill
https://github.com/interact/papermill
papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
This seems better than nbconvert, because you can use parameters. You still have to trigger this command with a scheduler. Below is an example with cron on Ubuntu.
这似乎比 nbconvert 更好,因为您可以使用参数。您仍然必须使用调度程序触发此命令。下面是一个在 Ubuntu 上使用 cron 的例子。
Old Answer
旧答案
nbconvert --execute
can execute a jupyter notebook, this embedded into a cronjob will do what you want.
可以执行 jupyter notebook,这嵌入到 cronjob 中会做你想做的。
Example setup on Ubuntu:
Ubuntu 上的示例设置:
Create yourscript.shwith the following content:
使用以下内容创建yourscript.sh:
/opt/anaconda/envs/yourenv/bin/jupyter nbconvert \
--execute \
--to notebook /path/to/yournotebook.ipynb \
--output /path/to/yournotebook-output.ipynb
You have more options except --to notebook. I like this option since you have a fully executable "log"-File afterwards.
除了 --to notebook 之外,您还有更多选择。我喜欢这个选项,因为之后你有一个完全可执行的“日志”文件。
I recommend using a virtual environment to run your notebook, to avoid that future updates mess with your script. Do not forget to install nbconvert into the environment.
我建议使用虚拟环境来运行您的笔记本,以避免将来的更新与您的脚本混淆。不要忘记将 nbconvert 安装到环境中。
Now create a cronjob, that runs every day e.g. at 5:10 AM, by typing crontab -e
in your terminal and add this line:
现在创建一个 cronjob,每天运行,例如在 5:10 AM,通过crontab -e
在您的终端中输入并添加以下行:
10 5 * * * /path/to/yourscript.sh
回答by Thabo
Try the SeekWell Chrome Extension. It lets you schedule notebooks to run weekly, daily, hourly or every 5 minutes, right from Jupyter Notebooks. You can also send DataFrames directly to Sheets or Slack if you like.
试试SeekWell Chrome 扩展。它允许您直接从 Jupyter Notebooks 安排笔记本每周、每天、每小时或每 5 分钟运行一次。如果愿意,您还可以将 DataFrame 直接发送到 Sheets 或 Slack。
Here's a demo video, and there is more info in the Chrome Web Store link above as well.
这是一个演示视频,上面的 Chrome 网上应用店链接中还有更多信息。
**Disclosure: I'm a SeekWell co-founder
**披露:我是 SeekWell 的联合创始人
回答by Marc Wouts
Executing Jupyter notebooks with parameters is conveniently done with Papermill. I also find convenient to share/version control the notebook either as a Markdown file or a Python script with Jupytext. Then I convert the notebook to an HTML file with nbconvert
. Typically my workflow looks like this:
使用Papermill可以方便地使用参数执行 Jupyter 笔记本。我还发现将笔记本作为 Markdown 文件或带有Jupytext的 Python 脚本共享/版本控制很方便。然后我将笔记本转换为带有nbconvert
. 通常,我的工作流程如下所示:
cat world_facts.md \
| jupytext --from md --to ipynb --set-kernel - \
| papermill -p year 2017 \
| jupyter nbconvert --no-input --stdin --output world_facts_2017_report.html
Learn more about the above, including how to specify the Python environment in which the notebook is expected to run, and how to use continuous integration on notebooks, have a look at my article Automated reports with Jupyter Notebooks (using Jupytext and Papermill)which you can read either on Medium, GitHub, or on Binder. Use the Binder link if you want to test interactively the outcome of the commands in the article.
了解更多关于以上,包括如何指定Python环境中,笔记本电脑,预计运行,以及如何使用在笔记本电脑上持续集成,看看我的文章与Jupyter笔记本电脑(使用Jupytext和造纸厂)自动化报告你可以在Medium、GitHub或Binder 上阅读。如果您想以交互方式测试文章中命令的结果,请使用 Binder 链接。
回答by Kanak Shrivastava
you can add jupyter notebook in cronjob
你可以在 cronjob 中添加 jupyter notebook
0 * * * * /home/ec2-user/anaconda3/bin/python /home/ec2-user/anaconda3/bin/jupyter-notebook
you have to replace /home/ec2-user/anaconda3 with your anaconda install location, and you can schedule time based on your requirements in cron
您必须将 /home/ec2-user/anaconda3 替换为您的 anaconda 安装位置,并且您可以在 cron 中根据您的要求安排时间
回答by Edu
As others have mentioned, papermill is the way to go. Papermill is just nbconvert
with a few extra features.
正如其他人所说,造纸厂是必经之路。Papermill 只是nbconvert
有一些额外的功能。
If you want to handle a workflow of multiple notebooks that depend on one another, you can try Airflow's integration with papermill. If you are looking for something simpler that does not need a scheduler to run, you can try ploomberwhich also integrates with papermill (Disclaimer: I'm the author).
如果您想处理相互依赖的多个笔记本的工作流程,您可以尝试Airflow 与 papermill 的集成。如果您正在寻找不需要调度程序来运行的更简单的东西,您可以尝试ploomber,它也与papermill集成(免责声明:我是作者)。
回答by gogasca
You want to use Google AI Platform NotebooksScheduler service currently in EAP.
您想使用当前在 EAP 中的Google AI Platform NotebooksScheduler 服务。