Python Selenium 下载时给出文件名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34548041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selenium give file name when downloading
提问by
I am working with a selenium script where I am trying to download a Excel file and give it a specific name. This is my code:
我正在使用 selenium 脚本,我试图在其中下载 Excel 文件并为其指定特定名称。这是我的代码:
Is there anyway that I can give the file being downloaded a specific name ?
无论如何,我可以给正在下载的文件一个特定的名称吗?
Code:
代码:
#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile()
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
profile.set_preference("browser.download.dir", "C:\Downloads" )
browser = webdriver.Firefox(firefox_profile=profile)
browser.get('https://test.com/')
browser.find_element_by_partial_link_text("Excel").click() # Download file
采纳答案by parishodak
You cannot specify name of download file through selenium. However, you can download the file, find the latest file in the downloaded folder, and rename as you want.
您不能通过 selenium 指定下载文件的名称。但是,您可以下载文件,在下载的文件夹中找到最新的文件,然后根据需要重命名。
Note: borrowed methods from google searches may have errors. but you get the idea.
注意:从谷歌搜索中借用的方法可能有错误。但你明白了。
import os
import shutil
filename = max([Initial_path + "\" + f for f in os.listdir(Initial_path)],key=os.path.getctime)
shutil.move(filename,os.path.join(Initial_path,r"newfilename.ext"))
回答by James Lemieux
You can download the file and name it at the same time using urlretrieve
:
您可以使用以下命令下载文件并同时命名它urlretrieve
:
import urllib
url = browser.find_element_by_partial_link_text("Excel").get_attribute('href')
urllib.urlretrieve(url, "/choose/your/file_name.xlsx")
回答by toshiro92
There is something i would correct for @parishodak answer:
对于@parishodak 的回答,我要纠正一些事情:
the filename here will only return the relative path (here the name of the file) not the absolute path.
这里的文件名只会返回相对路径(这里是文件名)而不是绝对路径。
That is why @FreshRamen got the following error after:
这就是为什么@FreshRamen 之后出现以下错误的原因:
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/??python2.7/genericpath.py",
line 72, in getctime return os.stat(filename).st_ctime OSError:
[Errno 2] No such file or directory: '.localized'
There is the correct code:
有正确的代码:
import os
import shutil
filepath = 'c:\downloads'
filename = max([filepath +"\"+ f for f in os.listdir(filepath)], key=os.path.getctime)
shutil.move(os.path.join(dirpath,filename),newfilename)
回答by dmb
Hope this snippet is not that confusing. It took me a while to create this and is really useful, because there has not been a clear answer to this problem, with just this library.
希望这个片段不会那么令人困惑。我花了一段时间来创建这个,它真的很有用,因为这个问题没有明确的答案,只有这个库。
import os
import time
def tiny_file_rename(newname, folder_of_download):
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
if '.part' in filename:
time.sleep(1)
os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))
else:
os.rename(os.path.join(folder_of_download, filename),os.path.join(folder_of_download,newname))
Hope this saves someone's day, cheers.
希望这可以挽救某人的一天,欢呼。
EDIT: Thanks to @Om Prakash editing my code, it made me remember that I didn't explain the code thoughly.
编辑:感谢@Om Prakash 编辑了我的代码,这让我想起了我没有详细解释代码。
Using the max([])
function could lead to a race condition, leaving you with empty or corrupted file(I know it from experience). You want to check if the file is completely downloaded in the first place. This is due to the fact that selenium don't wait for the file download to complete, so when you check for the last created file, an incomplete file will show up on your generated list and it will try to move that file. And even then, you are better off waiting a little bit for the file to be free from Firefox.
使用该max([])
函数可能会导致竞争条件,使您的文件为空或损坏(我从经验中知道)。您首先要检查文件是否已完全下载。这是因为 selenium 不会等待文件下载完成,因此当您检查上次创建的文件时,生成的列表中将显示一个不完整的文件,它会尝试移动该文件。即便如此,您最好稍等片刻,以便文件从 Firefox 中释放出来。
EDIT 2: More Code
编辑 2:更多代码
I was asked if 1 second was enough time and mostly it is, but in case you need to wait more than that you could change the above code to this:
有人问我 1 秒是否足够,大部分时间是足够的,但如果您需要等待更多时间,您可以将上面的代码更改为:
import os
import time
def tiny_file_rename(newname, folder_of_download, time_to_wait=60):
time_counter = 0
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
while '.part' in filename:
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:
raise Exception('Waited too long for file to download')
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))
回答by supputuri
Here is another simple solution, where you can wait until the download completed and then get the downloaded file name from chrome downloads.
这是另一个简单的解决方案,您可以等待下载完成,然后从 chrome 下载中获取下载的文件名。
Chrome:
铬合金:
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Firefox:
火狐:
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
WebDriverWait(driver,10).until(EC.new_window_is_opened)
driver.switch_to.window(driver.window_handles[-1])
driver.get("about:downloads")
endTime = time.time()+waitTime
while True:
try:
fileName = driver.execute_script("return document.querySelector('#contentAreaDownloadsView .downloadMainArea .downloadContainer description:nth-of-type(1)').value")
if fileName:
return fileName
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Once you click on the download link/button, just call the above method.
单击下载链接/按钮后,只需调用上述方法即可。
# click on download link
browser.find_element_by_partial_link_text("Excel").click()
# get the downloaded file name
latestDownloadedFileName = getDownLoadedFileName(180) #waiting 3 minutes to complete the download
print(latestDownloadedFileName)
JAVA + Chrome:
JAVA + 铬:
Here is the method in java.
这是java中的方法。
public String waitUntilDonwloadCompleted(WebDriver driver) throws InterruptedException {
// Store the current window handle
String mainWindow = driver.getWindowHandle();
// open a new tab
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("window.open()");
// switch to new tab
// Switch to new window opened
for(String winHandle : driver.getWindowHandles()){
driver.switchTo().window(winHandle);
}
// navigate to chrome downloads
driver.get("chrome://downloads");
JavascriptExecutor js1 = (JavascriptExecutor)driver;
// wait until the file is downloaded
Long percentage = (long) 0;
while ( percentage!= 100) {
try {
percentage = (Long) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value");
//System.out.println(percentage);
}catch (Exception e) {
// Nothing to do just wait
}
Thread.sleep(1000);
}
// get the latest downloaded file name
String fileName = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text");
// get the latest downloaded file url
String sourceURL = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').href");
// file downloaded location
String donwloadedAt = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div.is-active.focus-row-active #file-icon-wrapper img').src");
System.out.println("Download deatils");
System.out.println("File Name :-" + fileName);
System.out.println("Donwloaded path :- " + donwloadedAt);
System.out.println("Downloaded from url :- " + sourceURL);
// print the details
System.out.println(fileName);
System.out.println(sourceURL);
// close the downloads tab2
driver.close();
// switch back to main window
driver.switchTo().window(mainWindow);
return fileName;
}
This is how to call this in your java script.
这是如何在您的 Java 脚本中调用它。
// download triggering step
downloadExe.click();
// now waituntil download finish and then get file name
System.out.println(waitUntilDonwloadCompleted(driver));
Output:
输出:
Download deatils
File Name :-RubyMine-2019.1.2 (7).exe
Donwloaded path :- chrome://fileicon/C%3A%5CUsers%5Csupputuri%5CDownloads%5CRubyMine-2019.1.2%20(7).exe?scale=1.25x
Downloaded from url :- https://download-cf.jetbrains.com/ruby/RubyMine-2019.1.2.exe
RubyMine-2019.1.2 (7).exe
下载详情
文件名:-RubyMine-2019.1.2 (7).exe
下载路径:- chrome://fileicon/C%3A%5CUsers%5Csupputuri%5CDownloads%5CRubyMine-2019.1.2%20(7).exe?scale=1.25x
从网址下载:- https://download-cf.jetbrains.com/ruby/RubyMine-2019.1.2.exe
RubyMine-2019.1.2 (7).exe
回答by Negrali Selest
Using @dmb 's trick. Ive just made one correction: after .part
control, below time.sleep(1)
we must request filename again. Otherwise, the line below will try to rename a .part
file, which no more exists.
使用@dmb 的技巧。我刚刚做了一个更正:.part
控制后,下面time.sleep(1)
我们必须再次请求文件名。否则,下面的行将尝试重命名.part
不再存在的文件。
回答by ePandit
Here is the code sample I used to download pdf with a specific file name. First you need to configure chrome webdriver with required options. Then after clicking the button (to open pdf popup window), call a function to wait for download to finish and rename the downloaded file.
这是我用来下载具有特定文件名的 pdf 的代码示例。首先,您需要使用所需的选项配置 chrome webdriver。然后点击按钮(打开pdf弹出窗口)后,调用一个函数等待下载完成并重命名下载的文件。
import os
import time
import shutil
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
# function to wait for download to finish and then rename the latest downloaded file
def wait_for_download_and_rename(newFilename):
# function to wait for all chrome downloads to finish
def chrome_downloads(drv):
if not "chrome://downloads" in drv.current_url: # if 'chrome downloads' is not current tab
drv.execute_script("window.open('');") # open a new tab
drv.switch_to.window(driver.window_handles[1]) # switch to the new tab
drv.get("chrome://downloads/") # navigate to chrome downloads
return drv.execute_script("""
return document.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloadsList')
.items.filter(e => e.state === 'COMPLETE')
.map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
""")
# wait for all the downloads to be completed
dld_file_paths = WebDriverWait(driver, 120, 1).until(chrome_downloads) # returns list of downloaded file paths
# Close the current tab (chrome downloads)
if "chrome://downloads" in driver.current_url:
driver.close()
# Switch back to original tab
driver.switch_to.window(driver.window_handles[0])
# get latest downloaded file name and path
dlFilename = dld_file_paths[0] # latest downloaded file from the list
# wait till downloaded file appears in download directory
time_to_wait = 20 # adjust timeout as per your needs
time_counter = 0
while not os.path.isfile(dlFilename):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:
break
# rename the downloaded file
shutil.move(dlFilename, os.path.join(download_dir,newFilename))
return
# specify custom download directory
download_dir = r'c:\Downloads\pdf_reports'
# for configuring chrome pdf viewer for downloading pdf popup reports
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', {
"download.default_directory": download_dir, # Set own Download path
"download.prompt_for_download": False, # Do not ask for download at runtime
"download.directory_upgrade": True, # Also needed to suppress download prompt
"plugins.plugins_disabled": ["Chrome PDF Viewer"], # Disable this plugin
"plugins.always_open_pdf_externally": True, # Enable this plugin
})
# get webdriver with options for configuring chrome pdf viewer
driver = webdriver.Chrome(options = chrome_options)
# open desired webpage
driver.get('https://mywebsite.com/mywebpage')
# click the button to open pdf popup
driver.find_element_by_id('someid').click()
# call the function to wait for download to finish and rename the downloaded file
wait_for_download_and_rename('My file.pdf')
# close the browser windows
driver.quit()
Set timeout (120) to the wait time as per your needs.
根据您的需要将超时 (120) 设置为等待时间。