使用 python 登录到 SAML/Shibboleth 认证服务器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16512965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:53:10  来源:igfitidea点击:

Logging into SAML/Shibboleth authenticated server using python

pythonloginsamlsaml-2.0shibboleth

提问by David Perlaza

I'm trying to login my university's server via python, but I'm entirely unsure of how to go about generating the appropriate HTTP POSTs, creating the keys and certificates, and other parts of the process I may be unfamiliar with that are required to comply with the SAML spec. I can login with my browser just fine, but I'd like to be able to login and access other contents within the server using python.

我正在尝试通过 python 登录我大学的服务器,但我完全不确定如何生成适当的 HTTP POST、创建密钥和证书以及我可能不熟悉的其他部分符合 SAML 规范。我可以用我的浏览器登录就好了,但我希望能够使用 python 登录和访问服务器内的其他内容。

For reference, here is the site

供参考,这里是网站

I've tried logging in by using mechanize (selecting the form, populating the fields, clicking the submit button control via mechanize.Broswer.submit(), etc.) to no avail; the login site gets spat back each time.

我尝试过使用 mechanize 登录(选择表单,填充字段,通过 mechanize.Broswer.submit() 等单击提交按钮控件)无济于事;登录站点每次都会被吐回去。

At this point, I'm open to implementing a solution in whichever language is most suitable to the task. Basically, I want to programatically login to SAML authenticated server.

在这一点上,我愿意使用最适合该任务的语言来实施解决方案。基本上,我想以编程方式登录到 SAML 身份验证服务器。

采纳答案by Gianluca

Basically what you have to understand is the workflow behind a SAML authentication process. Unfortunately, there is no PDF out there which seems to really provide a good help in finding out what kind of things the browser does when accessing to a SAML protected website.

基本上,您必须了解的是 SAML 身份验证过程背后的工作流程。不幸的是,目前没有 PDF 文件似乎确实可以很好地帮助您了解浏览器在访问受 SAML 保护的网站时所做的事情。

Maybe you should take a look to something like this: http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Signand obviously to this: http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language. In particular, focus your attention to this scheme:

也许你应该看看这样的事情:http: //www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign显然是这样的:http://en.wikipedia .org/wiki/Security_Assertion_Markup_Language。尤其要注意这个方案:

enter image description here

在此处输入图片说明

What I did when I was trying to understand SAML way of working, since documentation was sopoor, was writing down (yes! writing - on the paper) all the steps the browser was doing from the first to the last. I used Opera, setting it in order to notallow automatic redirects (300, 301, 302 response code, and so on), and also not enabling Javascript. Then I wrote down all the cookies the server was sending me, what was doing what, and for what reason.

什么时候我试图了解工作的SAML方式,因为文件是我没有那么差,被写下来(是写! -纸张)的所有步骤的浏览器是从第一个做的到最后。我用歌剧,以便将其设置为允许自动重定向(300,301,302响应代码,等等),并且还没有启用JavaScript。然后我写下了服务器发送给我的所有 cookie,正在做什么,以及出于什么原因。

Maybe it was way too much effort, but in this way I was able to write a library, in Java, which is suited for the job, and incredibily fast and efficient too. Maybe someday I will release it public...

也许这是太多的努力,但通过这种方式,我能够用 Java 编写一个库,它适合这项工作,而且速度和效率也令人难以置信。也许有一天我会公开发布它......

What you should understand is that, in a SAML login, there are two actors playing: the IDP (identity provider), and the SP (service provider).

您应该了解的是,在 SAML 登录中,有两个角色在扮演:IDP(身份提供者)和 SP(服务提供者)。

A. FIRST STEP: the user agent request the resource to the SP

A. 第一步:用户代理向 SP 请求资源

I'm quite sure that you reached the link you reference in your question from another page clicking to something like "Access to the protected website". If you make some more attention, you'll notice that the link you followed is notthe one in which the authentication form is displayed. That's because the clicking of the link from the IDP to the SP is a stepfor the SAML. The first step, actally. It allows the IDP to define who are you, and why you are trying to access its resource. So, basically what you'll need to do is making a request to the link you followed in order to reach the web form, and getting the cookies it'll set. What you won't see is a SAMLRequest string, encoded into the 302 redirect you will find behind the link, sent to the IDP making the connection.

我很确定您是从另一个页面访问了您在问题中引用的链接,点击“访问受保护的网站”之类的内容。如果您多加注意,您会注意到您点击的链接不是显示身份验证表单的链接。这是因为单击从 IDP 到 SP 的链接是SAML 的一个步骤。第一步,实际上。它允许 IDP 定义您是谁,以及您尝试访问其资源的原因。因此,基本上您需要做的是向您遵循的链接发出请求以访问网络表单,并获取它将设置的 cookie。你不会看到的是一个 SAMLRequest 字符串,它被编码到你会在链接后面找到的 302 重定向中,

I think that it's the reason why you can't mechanize the whole process. You simply connected to the form, with no identity identification done!

我认为这就是你不能机械化整个过程的原因。您只需连接到表单,无需完成身份识别!

B. SECOND STEP: filling the form, and submitting it

B. 第二步:填写表格并提交

This one is easy. Please be careful! The cookies that are nowset are not the same of the cookies above. You're now connecting to a utterly different website. That's the reason why SAML is used: different website, same credentials. So you may want to store these authentication cookies, provided by a successful login, to a different variable. The IDP now is going to send back you a response (after the SAMLRequest): the SAMLResponse. You have to detect it getting the source code of the webpage to which the login ends. In fact, this page is a big form containing the response, with some code in JS which automatically subits it, when the page loads. You have to get the source code of the page, parse it getting rid of all the HTML unuseful stuff, and getting the SAMLResponse (encrypted).

这个很容易。请小心!那些饼干现在设定是不一样上面的饼干。您现在正在连接到一个完全不同的网站。这就是使用 SAML 的原因:不同的网站,相同的凭据。因此,您可能希望将这些由成功登录提供的身份验证 cookie 存储到不同的变量中。IDP 现在将向您发回一个响应(在 SAMLRequest 之后):SAMLResponse。您必须检测它获取登录结束的网页的源代码。事实上,这个页面是一个包含响应的大表单,JS 中的一些代码会在页面加载时自动子它。您必须获取页面的源代码,解析它以去除所有 HTML 无用的内容,并获取 SAMLResponse(加密)。

C. THIRD STEP: sending back the response to the SP

C. 第三步:将响应发回给 SP

Now you're ready to end the procedure. You have to send (via POST, since you're emulating a form) the SAMLResponse got in the previous step, to the SP. In this way, it will provide the cookies needed to access to the protected stuff you want to access.

现在您已准备好结束该过程。您必须将上一步中获得的 SAMLResponse(通过 POST,因为您正在模拟表单)发送到 SP。通过这种方式,它将提供访问您想要访问的受保护内容所需的 cookie。

Aaaaand, you're done!

Aaaaand,你完成了!

Again, I think that the most precious thing you'll have to do is using Opera and analyzing ALL the redirects SAML does. Then, replicate them in your code. It's not that difficult, just keep in mind that the IDP is utterly different than the SP.

同样,我认为您必须做的最宝贵的事情是使用 Opera 并分析 SAML 所做的所有重定向。然后,在您的代码中复制它们。这并不难,只要记住 IDP 与 SP 完全不同。

回答by andrebask

You can find herea more detailed description of the Shibboleth authentication process.

您可以在此处找到有关 Shibboleth 身份验证过程的更详细说明。

回答by chladni

I wrote a simple Python script capable of logging into a Shibbolized page.

我编写了一个简单的 Python 脚本,能够登录到 Shibbolized 页面。

First, I used Live HTTP Headers in Firefox to watch the redirects for the particular Shibbolized page I was targeting.

首先,我在 Firefox 中使用 Live HTTP Headers 来观察我所针对的特定 Shibbolized 页面的重定向。

Then I wrote a simple script using urllib.request(in Python 3.4, but the urllib2in Python 2.x seems to have the same functionality). I found that the default redirect following of urllib.requestworked for my purposes, however I found it nice to subclass the urllib.request.HTTPRedirectHandlerand in this subclass (class ShibRedirectHandler) add a handler for all the http_error_302 events.

然后我使用urllib.request(在 Python 3.4 中,但urllib2在 Python 2.x 中似乎具有相同的功能)编写了一个简单的脚本。我发现默认的重定向跟随urllib.request对我的目的有用,但是我发现子类化urllib.request.HTTPRedirectHandler和在这个子类(类ShibRedirectHandler)中为所有 http_error_302 事件添加一个处理程序很好。

In this subclass I just printed out values of the parameters (for debugging purposes); please note that in order to utilize the default redirect following, you need to end the handler with return HTTPRedirectHandler.http_error_302(self, args...)(i.e. a call to the base class http_errror_302 handler.)

在这个子类中,我只是打印了参数的值(用于调试目的);请注意,为了利用以下默认重定向,您需要结束处理程序return HTTPRedirectHandler.http_error_302(self, args...)(即调用基类 http_errror_302 处理程序。)

The most important component to make urllibwork with Shibbolized Authentication is to create OpenerDirectorwhich has Cookie handling added. You build the OpenerDirectorwith the following:

urllib使用 Shibbolized 身份验证最重要的组件是创建OpenerDirector添加了 Cookie 处理的组件。您OpenerDirector使用以下内容构建:

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)
response = opener.open("https://shib.page.org")

Here is a full script that may get your started (you will need to change a few mock URLs I provided and also enter valid username and password). This uses Python 3 classes; to make this work in Python2 replace urllib.request with urllib2 and urlib.parse with urlparse:

这是一个可以帮助您入门的完整脚本(您需要更改我提供的一些模拟 URL,并输入有效的用户名和密码)。这使用 Python 3 类;要在 Python2 中完成这项工作,请将 urllib.request 替换为 urllib2,将 urlib.parse 替换为 urlparse:

import urllib.request
import urllib.parse

#Subclass of HTTPRedirectHandler. Does not do much, but is very
#verbose. prints out all the redirects. Compaire with what you see
#from looking at your browsers redirects (using live HTTP Headers or similar)
class ShibRedirectHandler (urllib.request.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print (req)
        print (fp.geturl())
        print (code)
        print (msg)
        print (headers)
        #without this return (passing parameters onto baseclass) 
        #redirect following will not happen automatically for you.
        return urllib.request.HTTPRedirectHandler.http_error_302(self,
                                                          req,
                                                          fp,
                                                          code,
                                                          msg,
                                                          headers)

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)

#Edit: should be the URL of the site/page you want to load that is protected with Shibboleth
(opener.open("https://shibbolized.site.example").read())

#Inspect the page source of the Shibboleth login form; find the input names for the username
#and password, and edit according to the dictionary keys here to match your input names
loginData = urllib.parse.urlencode({'username':'<your-username>', 'password':'<your-password>'})
bLoginData = loginData.encode('ascii')

#By looking at the source of your Shib login form, find the URL the form action posts back to
#hard code this URL in the mock URL presented below.
#Make sure you include the URL, port number and path
response = opener.open("https://test-idp.server.example", bLoginData)
#See what you got.
print (response.read())

回答by Stéphane Bruckert

Selenium with the headless PhantomJS webkit will be your best bet to login into Shibboleth, because it handles cookies and even Javascript for you.

带有无头 PhantomJS webkit 的 Selenium 将是您登录 Shibboleth 的最佳选择,因为它会为您处理 cookie 甚至 Javascript。

Installation:

安装:

$ pip install selenium
$ brew install phantomjs


from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form

driver = webdriver.PhantomJS()
# On Windows, use: webdriver.PhantomJS('C:\phantomjs-1.9.7-windows\phantomjs.exe')

# Service selection
# Here I had to select my school among others 
driver.get("http://ent.unr-runn.fr/uPortal/")
select = Select(driver.find_element_by_name('user_idp'))
select.select_by_visible_text('ENSICAEN')
driver.find_element_by_id('IdPList').submit()

# Login page (https://cas.ensicaen.fr/cas/login?service=https%3A%2F%2Fshibboleth.ensicaen.fr%2Fidp%2FAuthn%2FRemoteUser)
# Fill the login form and submit it
driver.find_element_by_id('username').send_keys("myusername")
driver.find_element_by_id('password').send_keys("mypassword")
driver.find_element_by_id('fm1').submit()

# Now connected to the home page
# Click on 3 links in order to reach the page I want to scrape
driver.find_element_by_id('tabLink_u1240l1s214').click()
driver.find_element_by_id('formMenu:linknotes1').click()
driver.find_element_by_id('_id137Pluto_108_u1240l1n228_50520_:tabledip:0:_id158Pluto_108_u1240l1n228_50520_').click()

# Select and print an interesting element by its ID
page = driver.find_element_by_id('_id111Pluto_108_u1240l1n228_50520_:tableel:tbody_element')
print page.text


Note:

笔记:

  • during development, use Firefox to preview what you are doing driver = webdriver.Firefox()
  • this script is provided as-is and with the corresponding links, so you can compare each line of code with the actual source code of the pages (until login at least).
  • 在开发过程中,使用 Firefox 来预览你正在做的事情 driver = webdriver.Firefox()
  • 此脚本按原样提供并带有相应的链接,因此您可以将每一行代码与页面的实际源代码进行比较(至少在登录之前)。

回答by Stéphane Bruckert

Mechanize can do the work as well except it doesn't handle Javascript. Authentification successfully worked but once on the homepage, I couldn't load such link:

Mechanize 也可以完成这项工作,只是它不处理 Javascript。身份验证成功,但一旦在主页上,我无法加载这样的链接:

<a href="#" id="formMenu:linknotes1"
   onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');">

In case you need Javascript, better use Selenium with PhantomJS. Otherwise, I hope you will find inspiration from this script:

如果您需要 Javascript,最好将Selenium 与 PhantomJS 一起使用。否则,我希望你能从这个脚本中找到灵感:

#!/usr/bin/env python
#coding: utf8
import sys, logging
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

br = mechanize.Browser() # Browser
cj = cookielib.LWPCookieJar() # Cookie Jar
br.set_cookiejar(cj) 

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')]

br.open('https://ent.unr-runn.fr/uPortal/')
br.select_form(nr=0)
br.submit()

br.select_form(nr=0)
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'
br.submit()

br.select_form(nr=0)
br.submit()

rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml')

# Eventually comparing the cookies with those on Live HTTP Header: 
print "Cookies:"
for cookie in cj:
    print cookie

# Displaying page information
print rs.read()
print rs.geturl()
print rs.info();

# And that last line didn't work
rs = br.follow_link(id="formMenu:linknotes1", nr=0)

回答by bjw

Extending the answer from Stéphane Bruckert above, once you have used Selenium to get the auth cookies, you can still switch to requests if you want to:

扩展上面 Stéphane Bruckert 的回答,一旦您使用 Selenium 获取 auth cookie,如果您想,您仍然可以切换到请求:

import requests
cook = {i['name']: i['value'] for i in driver.get_cookies()}
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)

回答by Arthur.V

I faced a similar problem with my university page SAML authentication as well.

我的大学页面 SAML 身份验证也遇到了类似的问题。

The base idea is to use a requests.sessionobject to automatically handle most of the http redirects and cookie storing. However, there were many redirects using both javascript as well, and this caused multiple problems using the simple requests solution.

基本思想是使用一个requests.session对象来自动处理大部分 http 重定向和 cookie 存储。但是,也有许多重定向同时使用 javascript,这导致使用简单请求解决方案时出现多个问题。

I ended up using fiddlerto keep track of every request my browser made to the university server to fill up the redirects I've missed. It really made the process easier.

我最终使用fiddler来跟踪我的浏览器向大学服务器发出的每个请求,以填补我错过的重定向。它确实使这个过程变得更容易。

My solution is far from ideal, but seems to work.

我的解决方案远非理想,但似乎有效。

回答by TheBroda

Though already answered , hopefully this helps someone.I had a task of downloading files from an SAML Website and got help from Stéphane Bruckert's answer.

虽然已经回答,但希望这对某人有所帮助。我的任务是从 SAML 网站下载文件,并从 Stéphane Bruckert 的回答中得到了帮助。

If headless is used then the wait time would need to be specified at the required intervals of redirection for login. Once the browser logged in I used the cookies from that and used it with the requests module to download the file - Got help from this.

如果使用 headless,则需要以登录所需的重定向间隔指定等待时间。浏览器登录后,我使用了其中的 cookie,并将其与请求模块一起使用来下载文件 -从中获得帮助

This is how my code looks like-

这就是我的代码的样子-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()

回答by draysams

I wrote this code following the accepted answer. This worked for me in two separate projects

我按照接受的答案编写了此代码。这在两个独立的项目中对我有用

import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib


cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)

br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


br.open("The URL goes here")

br.select_form(nr=0)

br.form['username'] = 'Login Username'
br.form['password'] = 'Login Password'
br.submit()

br.select_form(nr=0)
br.submit()

response = br.response().read()
print response