python selenium,找出下载完成的时间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34338897/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:50:19  来源:igfitidea点击:

python selenium, find out when a download has completed?

pythonselenium

提问by applecider

I've used selenium to initiate a download. After the download is complete, certain actions need to be taken, is there any simple method to find out when a download has complete? (I am using the FireFox driver)

我使用 selenium 来启动下载。下载完成后,需要进行一些操作,有没有什么简单的方法可以找出下载完成的时间?(我正在使用 FireFox 驱动程序)

采纳答案by alecxe

There is no built-in to selenium way to wait for the download to be completed.

selenium 没有内置等待下载完成的方式。



The general idea here would be to wait until a file would appear in your "Downloads" directory.

这里的一般想法是等到文件出现在您的“下载”目录中

This might either be achieved by looping over and over again checking for file existence:

这可以通过一遍又一遍地循环检查文件存在来实现:

Or, by using things like watchdogto monitor a directory:

或者,通过使用诸如watchdog监视目录之类的东西:

回答by kd88

With Chrome, files which have not finished downloading have the extension .crdownload. If you set your download directoryproperly, then you can wait until the file that you want no longer has this extension. In principle, this is not much different to waiting for file to exist (as suggested by alecxe) - but at least you can monitor progress in this way.

使用 Chrome,尚未完成下载的文件具有扩展名.crdownload. 如果您正确设置了下载目录,那么您可以等到您想要的文件不再具有此扩展名。原则上,这与等待文件存在没有太大区别(如alecxe建议的) - 但至少您可以通过这种方式监控进度。

回答by Prashanth Sridhar

x1=0
while x1==0:
    count=0
    li = os.listdir("directorypath")
    for x1 in li:
        if x1.endswith(".crdownload"):
             count = count+1        
    if count==0:
        x1=1
    else:
        x1=0

This works if you are trying to check if a set of files(more than one) have finished downloading.

如果您尝试检查一组文件(多个)是否已完成下载,则此方法有效。

回答by Austin Mackillop

I came across this problem recently. I was downloading multiple files at once and had to build in a way to timeout if the downloads failed.

我最近遇到了这个问题。我正在一次下载多个文件,如果下载失败,则必须以超时的方式进行构建。

The code checks the filenames in some download directory every second and exits once they are complete or if it takes longer than 20 seconds to finish. The returned download time was used to check if the downloads were successful or if it timed out.

该代码每秒检查某个下载目录中的文件名,并在完成或完成时间超过 20 秒时退出。返回的下载时间用于检查下载是否成功或是否超时。

import time
import os

def download_wait(path_to_downloads):
    seconds = 0
    dl_wait = True
    while dl_wait and seconds < 20:
        time.sleep(1)
        dl_wait = False
        for fname in os.listdir(path_to_downloads):
            if fname.endswith('.crdownload'):
                dl_wait = True
        seconds += 1
    return seconds

I believe that this only works with chrome files as they end with the .crdownload extension. There may be a similar way to check in other browsers.

我相信这仅适用于 chrome 文件,因为它们以 .crdownload 扩展名结尾。可能有类似的方法来检查其他浏览器。

Edit: I recently changed the way that I use this function for times that .crdownloaddoes not appear as the extension. Essentially this just waits for the correct number of files as well.

编辑:我最近更改了我.crdownload在未显示为扩展名的情况下使用此功能的方式。本质上,这也只是等待正确数量的文件。

def download_wait(directory, timeout, nfiles=None):
    """
    Wait for downloads to finish with a specified timeout.

    Args
    ----
    directory : str
        The path to the folder where the files will be downloaded.
    timeout : int
        How many seconds to wait until timing out.
    nfiles : int, defaults to None
        If provided, also wait for the expected number of files.

    """
    seconds = 0
    dl_wait = True
    while dl_wait and seconds < timeout:
        time.sleep(1)
        dl_wait = False
        files = os.listdir(directory)
        if nfiles and len(files) != nfiles:
            dl_wait = True

        for fname in files:
            if fname.endswith('.crdownload'):
                dl_wait = True

        seconds += 1
    return seconds

回答by C. Feng

As answered before, there is no native way to check if download is finished. So here is a helper function that does the job for Firefox and Chrome. One trick is to clear the temp download folder before start a new download. Also, use native pathlib for cross-platform usage.

如前所述,没有本地方法来检查下载是否完成。所以这里有一个辅助函数可以为 Firefox 和 Chrome 完成这项工作。一个技巧是在开始新的下载之前清除临时下载文件夹。此外,使用本机路径库进行跨平台使用。

from pathlib import Path

def is_download_finished(temp_folder):
    firefox_temp_file = sorted(Path(temp_folder).glob('*.part'))
    chrome_temp_file = sorted(Path(temp_folder).glob('*.crdownload'))
    downloaded_files = sorted(Path(temp_folder).glob('*.*'))
    if (len(firefox_temp_file) == 0) and \
       (len(chrome_temp_file) == 0) and \
       (len(downloaded_files) >= 1):
        return True
    else:
        return False

回答by DHS

I know its too late for the answer, though would like to share a hack for future readers.

我知道答案为时已晚,但我想为未来的读者分享一个黑客。

You can create a thread say thread1from main thread and initiate your download here. Now, create some another thread, say thread2and in there ,let it wait till thread1completes using join() method.Now here,you can continue your flow of execution after download completes.

您可以从主线程创建一个线程,比如thread1并在此处启动下载。现在,创建另一个线程,比如thread2,然后在那里使用 join() 方法等待线程 1完成。现在,您可以在下载完成后继续执行流程。

Still make sure you dont initiate your download using selenium, instead extract the link using selenium and use requests module to download.

仍然确保您不要使用 selenium 启动下载,而是使用 selenium 提取链接并使用请求模块进行下载。

Download using requests module

使用请求模块下载

For eg:

例如:

def downloadit():
     #download code here    

def after_dwn():
     dwn_thread.join()           #waits till thread1 has completed executing
     #next chunk of code after download, goes here

dwn_thread = threading.Thread(target=downloadit)
dwn_thread.start()

metadata_thread = threading.Thread(target=after_dwn)
metadata_thread.start()

回答by greencode

this worked for me:

这对我有用:

fileends = "crdownload"
while "crdownload" in fileends:
    sleep(1)
    for fname in os.listdir(some_path): 
        print(fname)
        if "crdownload" in fname:
            fileends = "crdownload"
        else:
            fileends = "None"

回答by Jawad Ahmad Khan

Check for "Unconfirmed" key word in file name in download directory:

检查下载目录中文件名中的“未确认”关键字:

            #wait for download complete
            wait = True
            while(wait==True):
                for fname in os.listdir('\path\to\download directory):
                    if ('Unconfirmed') in fname:
                        print('downloading files ...')
                        time.sleep(10)
                    else:
                        wait=False
            print('finished downloading all files ...')

As soon as the the filed download is completed it exits the while loop.

一旦提交的下载完成,它就会退出 while 循环。

回答by Martijn Witteveen

I got a better one though:

不过我得到了一个更好的:

So redirect the function that starts the download. e.g. download_function= lambda: element.click()

所以重定向开始下载的函数。例如 download_function= lambda: element.click()

than check number of files in directory and wait for a new file that doesnt have the download extension. After that rename it. (can be change to move the file instead of renaming it in the same directory)

比检查目录中的文件数并等待没有下载扩展名的新文件。之后重命名它。(可以更改为移动文件而不是在同一目录中重命名)

def save_download(self, directory, download_function, new_name, timeout=30):
    """
    Download a file and rename it
    :param directory: download location that is set
    :param download_function: function to start download
    :param new_name: the name that the new download gets
    :param timeout: number of seconds to wait for download
    :return: path to downloaded file
    """
    self.logger.info("Downloading " + new_name)
    files_start = os.listdir(directory)
    download_function()
    wait = True
    i = 0
    while (wait or len(os.listdir(directory)) == len(files_start)) and i < timeout * 2:
        sleep(0.5)
        wait = False
        for file_name in os.listdir(directory):
            if file_name.endswith('.crdownload'):
                wait = True
    if i == timeout * 2:
        self.logger.warning("Documents not downloaded")
        raise TimeoutError("File not downloaded")
    else:
        self.logger.info("Downloading done")
        new_file = [name for name in os.listdir(directory) if name not in files_start][0]
        self.logger.info("New file found renaming " + new_file + " to " + new_name)
        while not os.access(directory + r"\" + new_file, os.W_OK):
            sleep(0.5)
            self.logger.info("Waiting for write permission")
        os.rename(directory + "\" + new_file, directory + "\" + new_name)
        return directory + "\" + new_file