Python Selenium 下载时给出文件名

Question

提问by

I am working with a selenium script where I am trying to download a Excel file and give it a specific name. This is my code:

我正在使用 selenium 脚本，我试图在其中下载 Excel 文件并为其指定特定名称。这是我的代码：

Is there anyway that I can give the file being downloaded a specific name ?

无论如何，我可以给正在下载的文件一个特定的名称吗？

Code:

代码：

#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile

profile = FirefoxProfile()
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
profile.set_preference("browser.download.dir", "C:\Downloads" )
browser = webdriver.Firefox(firefox_profile=profile)

browser.get('https://test.com/')
browser.find_element_by_partial_link_text("Excel").click() # Download file

Answer 1

采纳答案by parishodak

You cannot specify name of download file through selenium. However, you can download the file, find the latest file in the downloaded folder, and rename as you want.

您不能通过 selenium 指定下载文件的名称。但是，您可以下载文件，在下载的文件夹中找到最新的文件，然后根据需要重命名。

Note: borrowed methods from google searches may have errors. but you get the idea.

注意：从谷歌搜索中借用的方法可能有错误。但你明白了。

import os
import shutil
filename = max([Initial_path + "\" + f for f in os.listdir(Initial_path)],key=os.path.getctime)
shutil.move(filename,os.path.join(Initial_path,r"newfilename.ext"))

Answer 2

回答by James Lemieux

You can download the file and name it at the same time using urlretrieve:

您可以使用以下命令下载文件并同时命名它urlretrieve：

import urllib

url = browser.find_element_by_partial_link_text("Excel").get_attribute('href')
urllib.urlretrieve(url, "/choose/your/file_name.xlsx")

Answer 3

回答by toshiro92

There is something i would correct for @parishodak answer:

对于@parishodak 的回答，我要纠正一些事情：

the filename here will only return the relative path (here the name of the file) not the absolute path.

这里的文件名只会返回相对路径（这里是文件名）而不是绝对路径。

That is why @FreshRamen got the following error after:

这就是为什么@FreshRamen 之后出现以下错误的原因：

File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/??python2.7/genericpath.py", 
line 72, in getctime return os.stat(filename).st_ctime OSError: 
[Errno 2] No such file or directory: '.localized'

There is the correct code:

有正确的代码：

import os
import shutil

filepath = 'c:\downloads'
filename = max([filepath +"\"+ f for f in os.listdir(filepath)], key=os.path.getctime)
shutil.move(os.path.join(dirpath,filename),newfilename)

Answer 4

回答by dmb

Hope this snippet is not that confusing. It took me a while to create this and is really useful, because there has not been a clear answer to this problem, with just this library.

希望这个片段不会那么令人困惑。我花了一段时间来创建这个，它真的很有用，因为这个问题没有明确的答案，只有这个库。

import os
import time
def tiny_file_rename(newname, folder_of_download):
    filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa :   os.path.getctime(os.path.join(folder_of_download,xa)))
    if '.part' in filename:
        time.sleep(1)
        os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))
    else:
        os.rename(os.path.join(folder_of_download, filename),os.path.join(folder_of_download,newname))

Hope this saves someone's day, cheers.

希望这可以挽救某人的一天，欢呼。

EDIT: Thanks to @Om Prakash editing my code, it made me remember that I didn't explain the code thoughly.

编辑：感谢@Om Prakash 编辑了我的代码，这让我想起了我没有详细解释代码。

Using the max([])function could lead to a race condition, leaving you with empty or corrupted file(I know it from experience). You want to check if the file is completely downloaded in the first place. This is due to the fact that selenium don't wait for the file download to complete, so when you check for the last created file, an incomplete file will show up on your generated list and it will try to move that file. And even then, you are better off waiting a little bit for the file to be free from Firefox.

使用该max([])函数可能会导致竞争条件，使您的文件为空或损坏（我从经验中知道）。您首先要检查文件是否已完全下载。这是因为 selenium 不会等待文件下载完成，因此当您检查上次创建的文件时，生成的列表中将显示一个不完整的文件，它会尝试移动该文件。即便如此，您最好稍等片刻，以便文件从 Firefox 中释放出来。

EDIT 2: More Code

编辑 2：更多代码

I was asked if 1 second was enough time and mostly it is, but in case you need to wait more than that you could change the above code to this:

有人问我 1 秒是否足够，大部分时间是足够的，但如果您需要等待更多时间，您可以将上面的代码更改为：

import os
import time
def tiny_file_rename(newname, folder_of_download, time_to_wait=60):
    time_counter = 0
    filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa :   os.path.getctime(os.path.join(folder_of_download,xa)))
    while '.part' in filename:
        time.sleep(1)
        time_counter += 1
        if time_counter > time_to_wait:
            raise Exception('Waited too long for file to download')
    filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa :   os.path.getctime(os.path.join(folder_of_download,xa)))
    os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))

Answer 5

回答by supputuri

Here is another simple solution, where you can wait until the download completed and then get the downloaded file name from chrome downloads.

这是另一个简单的解决方案，您可以等待下载完成，然后从 chrome 下载中获取下载的文件名。

Chrome:

铬合金：

# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
    driver.execute_script("window.open()")
    # switch to new tab
    driver.switch_to.window(driver.window_handles[-1])
    # navigate to chrome downloads
    driver.get('chrome://downloads')
    # define the endTime
    endTime = time.time()+waitTime
    while True:
        try:
            # get downloaded percentage
            downloadPercentage = driver.execute_script(
                "return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
            # check if downloadPercentage is 100 (otherwise the script will keep waiting)
            if downloadPercentage == 100:
                # return the file name once the download is completed
                return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content  #file-link').text")
        except:
            pass
        time.sleep(1)
        if time.time() > endTime:
            break

Firefox:

火狐：

def getDownLoadedFileName(waitTime):
    driver.execute_script("window.open()")
    WebDriverWait(driver,10).until(EC.new_window_is_opened)
    driver.switch_to.window(driver.window_handles[-1])
    driver.get("about:downloads")

    endTime = time.time()+waitTime
    while True:
        try:
            fileName = driver.execute_script("return document.querySelector('#contentAreaDownloadsView .downloadMainArea .downloadContainer description:nth-of-type(1)').value")
            if fileName:
                return fileName
        except:
            pass
        time.sleep(1)
        if time.time() > endTime:
            break

Once you click on the download link/button, just call the above method.

单击下载链接/按钮后，只需调用上述方法即可。

 # click on download link
 browser.find_element_by_partial_link_text("Excel").click()
 # get the downloaded file name
 latestDownloadedFileName = getDownLoadedFileName(180) #waiting 3 minutes to complete the download
 print(latestDownloadedFileName)

JAVA + Chrome:

JAVA + 铬：

Here is the method in java.

这是java中的方法。

public String waitUntilDonwloadCompleted(WebDriver driver) throws InterruptedException {
      // Store the current window handle
      String mainWindow = driver.getWindowHandle();

      // open a new tab
      JavascriptExecutor js = (JavascriptExecutor)driver;
      js.executeScript("window.open()");
     // switch to new tab
    // Switch to new window opened
      for(String winHandle : driver.getWindowHandles()){
          driver.switchTo().window(winHandle);
      }
     // navigate to chrome downloads
      driver.get("chrome://downloads");

      JavascriptExecutor js1 = (JavascriptExecutor)driver;
      // wait until the file is downloaded
      Long percentage = (long) 0;
      while ( percentage!= 100) {
          try {
              percentage = (Long) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value");
              //System.out.println(percentage);
          }catch (Exception e) {
            // Nothing to do just wait
        }
          Thread.sleep(1000);
      }
     // get the latest downloaded file name
      String fileName = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text");
     // get the latest downloaded file url
      String sourceURL = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').href");
      // file downloaded location
      String donwloadedAt = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div.is-active.focus-row-active #file-icon-wrapper img').src");
      System.out.println("Download deatils");
      System.out.println("File Name :-" + fileName);
      System.out.println("Donwloaded path :- " + donwloadedAt);
      System.out.println("Downloaded from url :- " + sourceURL);
     // print the details
      System.out.println(fileName);
      System.out.println(sourceURL);
     // close the downloads tab2
      driver.close();
     // switch back to main window
      driver.switchTo().window(mainWindow);
      return fileName;
  }

This is how to call this in your java script.

这是如何在您的 Java 脚本中调用它。

// download triggering step 
downloadExe.click();
// now waituntil download finish and then get file name
System.out.println(waitUntilDonwloadCompleted(driver));

Output:

输出：

Download deatils
File Name :-RubyMine-2019.1.2 (7).exe
Donwloaded path :- chrome://fileicon/C%3A%5CUsers%5Csupputuri%5CDownloads%5CRubyMine-2019.1.2%20(7).exe?scale=1.25x
Downloaded from url :- https://download-cf.jetbrains.com/ruby/RubyMine-2019.1.2.exe
RubyMine-2019.1.2 (7).exe

下载详情
文件名：-RubyMine-2019.1.2 (7).exe
下载路径：- chrome://fileicon/C%3A%5CUsers%5Csupputuri%5CDownloads%5CRubyMine-2019.1.2%20(7).exe?scale=1.25x
从网址下载：- https://download-cf.jetbrains.com/ruby/RubyMine-2019.1.2.exe
RubyMine-2019.1.2 (7).exe

Answer 6

回答by Negrali Selest

Using @dmb 's trick. Ive just made one correction: after .partcontrol, below time.sleep(1)we must request filename again. Otherwise, the line below will try to rename a .partfile, which no more exists.

使用@dmb 的技巧。我刚刚做了一个更正：.part控制后，下面time.sleep(1)我们必须再次请求文件名。否则，下面的行将尝试重命名.part不再存在的文件。

Answer 7

回答by ePandit

Here is the code sample I used to download pdf with a specific file name. First you need to configure chrome webdriver with required options. Then after clicking the button (to open pdf popup window), call a function to wait for download to finish and rename the downloaded file.

这是我用来下载具有特定文件名的 pdf 的代码示例。首先，您需要使用所需的选项配置 chrome webdriver。然后点击按钮（打开pdf弹出窗口）后，调用一个函数等待下载完成并重命名下载的文件。

import os
import time
import shutil

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

# function to wait for download to finish and then rename the latest downloaded file
def wait_for_download_and_rename(newFilename):
    # function to wait for all chrome downloads to finish
    def chrome_downloads(drv):
        if not "chrome://downloads" in drv.current_url: # if 'chrome downloads' is not current tab
            drv.execute_script("window.open('');") # open a new tab
            drv.switch_to.window(driver.window_handles[1]) # switch to the new tab
            drv.get("chrome://downloads/") # navigate to chrome downloads
        return drv.execute_script("""
            return document.querySelector('downloads-manager')
            .shadowRoot.querySelector('#downloadsList')
            .items.filter(e => e.state === 'COMPLETE')
            .map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
            """)
    # wait for all the downloads to be completed
    dld_file_paths = WebDriverWait(driver, 120, 1).until(chrome_downloads) # returns list of downloaded file paths
    # Close the current tab (chrome downloads)
    if "chrome://downloads" in driver.current_url:
        driver.close()
    # Switch back to original tab
    driver.switch_to.window(driver.window_handles[0]) 
    # get latest downloaded file name and path
    dlFilename = dld_file_paths[0] # latest downloaded file from the list
    # wait till downloaded file appears in download directory
    time_to_wait = 20 # adjust timeout as per your needs
    time_counter = 0
    while not os.path.isfile(dlFilename):
        time.sleep(1)
        time_counter += 1
        if time_counter > time_to_wait:
            break
    # rename the downloaded file
    shutil.move(dlFilename, os.path.join(download_dir,newFilename))
    return

# specify custom download directory
download_dir = r'c:\Downloads\pdf_reports'

# for configuring chrome pdf viewer for downloading pdf popup reports
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', {
    "download.default_directory": download_dir, # Set own Download path
    "download.prompt_for_download": False, # Do not ask for download at runtime
    "download.directory_upgrade": True, # Also needed to suppress download prompt
    "plugins.plugins_disabled": ["Chrome PDF Viewer"], # Disable this plugin
    "plugins.always_open_pdf_externally": True, # Enable this plugin
    })

# get webdriver with options for configuring chrome pdf viewer
driver = webdriver.Chrome(options = chrome_options)

# open desired webpage
driver.get('https://mywebsite.com/mywebpage')

# click the button to open pdf popup
driver.find_element_by_id('someid').click()

# call the function to wait for download to finish and rename the downloaded file
wait_for_download_and_rename('My file.pdf')

# close the browser windows
driver.quit()

Set timeout (120) to the wait time as per your needs.

根据您的需要将超时 (120) 设置为等待时间。

Python Selenium 下载时给出文件名

提问by

采纳答案by parishodak

回答by James Lemieux

回答by toshiro92

回答by dmb

回答by supputuri

回答by Negrali Selest

回答by ePandit

相关推荐

最近更新

标签

Python Selenium 下载时给出文件名

提问by

采纳答案by parishodak

回答by James Lemieux

回答by toshiro92

回答by dmb

回答by supputuri

回答by Negrali Selest

回答by ePandit

相关推荐

Python Angular 的 Flask RESTful 跨域问题：PUT、OPTIONS 方法

Python 抓取 https://www.thenewboston.com/ 时出现“SSL：certificate_verify_failed”错误

Python Django 导入错误 - 没有名为 django.conf.urls.defaults 的模块

使用 Python 请求模块下载并保存 PDF 文件

相关推荐

最近更新

标签