Requests-HTML: HTML Parsing for Humans (writing Python 3)!¶
This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.
When using this library you automatically get:
- Full JavaScript support!
- CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).
- XPath Selectors, for the faint of heart.
- Mocked user-agent (like a real web browser).
- Automatic following of redirects.
- Connection–pooling and cookie persistence.
- The Requests experience you know and love, with magical parsing abilities.
- Async Support
Tutorial & Usage¶
Make a GET request to python.org, using Requests:
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://python.org/')
Or want to try our async session:
>>> from requests_html import AsyncHTMLSession
>>> asession = AsyncHTMLSession()
>>> r = await asession.get('https://python.org/')
But async is fun when fetching some sites at the same time:
>>> from requests_html import AsyncHTMLSession
>>> asession = AsyncHTMLSession()
>>> async def get_pythonorg():
... r = await asession.get('https://python.org/')
>>> async def get_reddit():
... r = await asession.get('https://reddit.com/')
>>> async def get_google():
... r = await asession.get('https://google.com/')
>>> session.run(get_pythonorg, get_reddit, get_google)
Grab a list of all links on the page, as–is (anchors excluded):
>>> r.html.links
{'//docs.python.org/3/tutorial/', '/about/apps/', 'https://github.com/python/pythondotorg/issues', '/accounts/login/', '/dev/peps/', '/about/legal/', '//docs.python.org/3/tutorial/introduction.html#lists', '/download/alternatives', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', '/download/other/', '/downloads/windows/', 'https://mail.python.org/mailman/listinfo/python-dev', '/doc/av', 'https://devguide.python.org/', '/about/success/#engineering', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', '/success-stories/industrial-light-magic-runs-python/', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '/', 'http://pyfound.blogspot.com/', '/events/python-events/past/', '/downloads/release/python-2714/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://status.python.org/', '/community/workshops/', '/community/lists/', 'http://buildbot.net/', '/community/awards', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', '/psf/donations/', 'http://wiki.python.org/moin/Languages', '/dev/', '/events/python-user-group/', 'https://wiki.qt.io/PySide', '/community/sigs/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'http://planetpython.org/', '/events/python-events', '/about/help/', '/events/python-user-group/past/', '/about/success/', '/psf-landing/', '/about/apps', '/about/', 'http://www.wxpython.org/', '/events/python-user-group/665/', 'https://www.python.org/psf/codeofconduct/', '/dev/peps/peps.rss', '/downloads/source/', '/psf/sponsorship/sponsors/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://bugs.python.org/', '/community/merchandise/', 'http://tornadoweb.org', '/events/python-user-group/650/', 'http://flask.pocoo.org/', '/downloads/release/python-364/', '/events/python-user-group/660/', '/events/python-user-group/638/', '/psf/', '/doc/', 'http://blog.python.org', '/events/python-events/604/', '/about/success/#government', 'http://python.org/dev/peps/', 'https://docs.python.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/users/membership/', '/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', '/downloads/', '/jobs/', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', '/privacy/', 'https://pypi.python.org/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'http://www.scipy.org', '/community/forums/', '/about/success/#scientific', '/about/success/#software-development', '/shell/', '/accounts/signup/', 'http://www.facebook.com/pythonlang?fref=ts', '/community/', 'https://kivy.org/', '/about/quotes/', 'http://www.web2py.com/', '/community/logos/', '/community/diversity/', '/events/calendars/', 'https://wiki.python.org/moin/BeginnersGuide', '/success-stories/', '/doc/essays/', '/dev/core-mentorship/', 'http://ipython.org', '/events/', '//docs.python.org/3/tutorial/controlflow.html', '/about/success/#education', '/blogs/', '/community/irc/', 'http://pycon.blogspot.com/', '//jobs.python.org', 'http://www.pylonsproject.org/', 'http://www.djangoproject.com/', '/downloads/mac-osx/', '/about/success/#business', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://docs.python.org/faq/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions'}
Grab a list of all links on the page, in absolute form (anchors excluded):
>>> r.html.absolute_links
{'https://github.com/python/pythondotorg/issues', 'https://docs.python.org/3/tutorial/', 'https://www.python.org/about/success/', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', 'https://www.python.org/dev/peps/', 'https://mail.python.org/mailman/listinfo/python-dev', 'https://www.python.org/doc/', 'https://www.python.org/', 'https://www.python.org/about/', 'https://www.python.org/events/python-events/past/', 'https://devguide.python.org/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', 'https://docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', 'http://pyfound.blogspot.com/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://www.python.org/events/python-events', 'https://status.python.org/', 'https://www.python.org/about/apps', 'https://www.python.org/downloads/release/python-2714/', 'https://www.python.org/psf/donations/', 'http://buildbot.net/', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', 'http://wiki.python.org/moin/Languages', 'https://docs.python.org/faq/', 'https://jobs.python.org', 'https://www.python.org/about/success/#software-development', 'https://www.python.org/about/success/#education', 'https://www.python.org/community/logos/', 'https://www.python.org/doc/av', 'https://wiki.qt.io/PySide', 'https://www.python.org/events/python-user-group/660/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'https://www.python.org/dev/peps/peps.rss', 'http://planetpython.org/', 'https://www.python.org/events/python-user-group/past/', 'https://docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/community/diversity/', 'https://docs.python.org/3/tutorial/controlflow.html', 'https://www.python.org/community/awards', 'https://www.python.org/events/python-user-group/638/', 'https://www.python.org/about/legal/', 'https://www.python.org/dev/', 'https://www.python.org/download/alternatives', 'https://www.python.org/downloads/', 'https://www.python.org/community/lists/', 'http://www.wxpython.org/', 'https://www.python.org/about/success/#government', 'https://www.python.org/psf/', 'https://www.python.org/psf/codeofconduct/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://www.python.org/downloads/source/', 'https://bugs.python.org/', 'https://www.python.org/downloads/mac-osx/', 'https://www.python.org/about/help/', 'http://tornadoweb.org', 'http://flask.pocoo.org/', 'https://www.python.org/users/membership/', 'http://blog.python.org', 'https://www.python.org/privacy/', 'https://www.python.org/about/gettingstarted/', 'http://python.org/dev/peps/', 'https://www.python.org/about/apps/', 'https://docs.python.org', 'https://www.python.org/success-stories/', 'https://www.python.org/community/forums/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/community/merchandise/', 'https://www.python.org/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', 'https://pypi.python.org/', 'https://www.python.org/events/python-user-group/650/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'https://www.python.org/about/quotes/', 'https://www.python.org/downloads/windows/', 'https://www.python.org/events/calendars/', 'http://www.scipy.org', 'https://www.python.org/community/workshops/', 'https://www.python.org/blogs/', 'https://www.python.org/accounts/signup/', 'https://www.python.org/events/', 'https://kivy.org/', 'http://www.facebook.com/pythonlang?fref=ts', 'http://www.web2py.com/', 'https://www.python.org/psf/sponsorship/sponsors/', 'https://www.python.org/community/', 'https://www.python.org/download/other/', 'https://www.python.org/psf-landing/', 'https://www.python.org/events/python-user-group/665/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org/accounts/login/', 'https://www.python.org/downloads/release/python-364/', 'https://www.python.org/dev/core-mentorship/', 'https://www.python.org/about/success/#business', 'https://www.python.org/community/sigs/', 'https://www.python.org/events/python-user-group/', 'http://ipython.org', 'https://www.python.org/shell/', 'https://www.python.org/community/irc/', 'https://www.python.org/about/success/#engineering', 'http://www.pylonsproject.org/', 'http://pycon.blogspot.com/', 'https://www.python.org/about/success/#scientific', 'https://www.python.org/doc/essays/', 'http://www.djangoproject.com/', 'https://www.python.org/success-stories/industrial-light-magic-runs-python/', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://www.python.org/jobs/', 'https://www.python.org/events/python-events/604/'}
Select an Element
with a CSS Selector (learn more):
>>> about = r.html.find('#about', first=True)
Grab an Element
’s text contents:
>>> print(about.text)
About
Applications
Quotes
Getting Started
Help
Python Brochure
Introspect an Element
’s attributes (learn more):
>>> about.attrs
{'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}
Render out an Element
’s HTML:
>>> about.html
'<li aria-haspopup="true" class="tier-1 element-1 " id="about">\n<a class="" href="/about/" title="">About</a>\n<ul aria-hidden="true" class="subnav menu" role="menu">\n<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>\n<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>\n<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>\n<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>\n<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>\n</ul>\n</li>'
Crab an Element
’s root tag name:
>>> about.tag
'li'
Show the line number that an Element
’s root tag located in:
>>> about.lineno
249
Select an Element
list within an Element
:
>>> about.find('a')
[<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]
Search for links within an element:
>>> about.absolute_links
{'http://brochure.getpython.info/', 'https://www.python.org/about/gettingstarted/', 'https://www.python.org/about/', 'https://www.python.org/about/quotes/', 'https://www.python.org/about/help/', 'https://www.python.org/about/apps/'}
Search for text on the page:
>>> r.html.search('Python is a {} language')[0]
programming
More complex CSS Selector example (copied from Chrome dev tools):
>>> r = session.get('https://github.com/')
>>> sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p'
>>> print(r.html.find(sel, first=True).text)
GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers.
XPath is also supported (learn more):
>>> r.html.xpath('a')
[<Element 'a' class='btn' href='https://help.github.com/articles/supported-browsers'>]
You can also select only elements containing certain text:
>>> r = session.get('http://python-requests.org/')
>>> r.html.find('a', containing='kenneth')
[<Element 'a' href='http://kennethreitz.com/pages/open-projects.html'>, <Element 'a' href='http://kennethreitz.org/'>, <Element 'a' href='https://twitter.com/kennethreitz' class=('twitter-follow-button',) data-show-count='false'>, <Element 'a' class=('reference', 'internal') href='dev/contributing/#kenneth-reitz-s-code-style'>]
JavaScript Support¶
Let’s grab some text that’s rendered by JavaScript:
>>> r = session.get('http://python-requests.org/')
>>> r.html.render()
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
'<time>25</time>'
Or you can do this async also:
>>> r = asession.get('http://python-requests.org/')
>>> await r.html.arender()
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
'<time>25</time>'
Note, the first time you ever run the render()
method, it will download
Chromium into your home directory (e.g. ~/.pyppeteer/
). This only happens
once. You may also need to install a few Linux packages to get pyppeteer working.
Pagination¶
There’s also intelligent pagination support (always improving):
>>> r = session.get('https://reddit.com')
>>> for html in r.html:
... print(html)
<HTML url='https://www.reddit.com/'>
<HTML url='https://www.reddit.com/?count=25&after=t3_81puu5'>
<HTML url='https://www.reddit.com/?count=50&after=t3_81nevg'>
<HTML url='https://www.reddit.com/?count=75&after=t3_81lqtp'>
<HTML url='https://www.reddit.com/?count=100&after=t3_81k1c8'>
<HTML url='https://www.reddit.com/?count=125&after=t3_81p438'>
<HTML url='https://www.reddit.com/?count=150&after=t3_81nrcd'>
…
For async pagination use the new async for:
>>> r = await asession.get('https://reddit.com')
>>> async for html in r.html:
... print(html)
<HTML url='https://www.reddit.com/'>
<HTML url='https://www.reddit.com/?count=25&after=t3_81puu5'>
…
You can also just request the next URL easily:
>>> r = session.get('https://reddit.com')
>>> r.html.next()
'https://www.reddit.com/?count=25&after=t3_81pm82'
Using without Requests¶
You can also use this library without Requests:
>>> from requests_html import HTML
>>> doc = """<a href='https://httpbin.org'>"""
>>> html = HTML(html=doc)
>>> html.links
{'https://httpbin.org'}
You can also render JavaScript pages without Requests:
# ^^ proceeding from above ^^
>>> script = """
() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}
"""
>>> val = html.render(script=script, reload=False)
>>> print(val)
{'width': 800, 'height': 600, 'deviceScaleFactor': 1}
>>> print(html.html)
<html><head></head><body><a href="https://httpbin.org"></a></body></html>
For using arender just pass async_=True to HTML.
# ^^ using above script ^^
>>> html = HTML(html=doc, async_=True)
>>> val = await html.arender(script=script, reload=False)
>>> print(val)
{'width': 800, 'height': 600, 'deviceScaleFactor': 1}
API Documentation¶
Main Classes¶
These classes are the main interface to requests-html
:
-
class
requests_html.
HTML
(*, session: Union[HTMLSession, AsyncHTMLSession] = None, url: str = 'https://example.org/', html: Union[str, bytes], default_encoding: str = 'utf-8', async_: bool = False)[source]¶ An HTML document, ready for parsing.
Parameters: - url – The URL from which the HTML originated, used for
absolute_links
. - html – HTML from which to base the parsing upon (optional).
- default_encoding – Which encoding to default to.
-
absolute_links
¶ All found links on page, in absolute form (learn more).
-
arender
(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, sleep: int = 0, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False, cookies: list = [{}], send_cookies_session: bool = False)[source]¶ Async version of render. Takes same parameters.
-
base_url
¶ The base URL for the page. Supports the
<base>
tag (learn more).
-
encoding
¶ The encoding string to be used, extracted from the HTML and
HTMLResponse
headers.
-
find
(selector: str = '*', *, containing: Union[str, List[str]] = None, clean: bool = False, first: bool = False, _encoding: str = None) → Union[List[requests_html.Element], requests_html.Element]¶ Given a CSS Selector, returns a list of
Element
objects or a single one.Parameters: - selector – CSS Selector to use.
- clean – Whether or not to sanitize the found HTML of
<script>
and<style>
tags. - containing – If specified, only return elements that contain the provided text.
- first – Whether or not to return just the first result.
- _encoding – The encoding format.
Example CSS Selectors:
a
a.someClass
a#someID
a[target=_blank]
See W3School’s CSS Selectors Reference for more details.
If
first
isTrue
, only returns the firstElement
found.
-
html
¶ Unicode representation of the HTML content (learn more).
-
links
¶ All found links on page, in as–is form.
-
next
(fetch: bool = False, next_symbol: List[str] = ['next', 'more', 'older']) → Union[requests_html.HTML, List[str]][source]¶ Attempts to find the next page, if there is one. If
fetch
isTrue
(default), returnsHTML
object of next page. Iffetch
isFalse
, simply returns the next URL.
-
raw_html
¶ Bytes representation of the HTML content. (learn more).
-
render
(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, sleep: int = 0, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False, cookies: list = [{}], send_cookies_session: bool = False)[source]¶ Reloads the response in Chromium, and replaces HTML content with an updated version, with JavaScript executed.
Parameters: - retries – The number of times to retry loading the page in Chromium.
- script – JavaScript to execute upon page load (optional).
- wait – The number of seconds to wait before loading the page, preventing timeouts (optional).
- scrolldown – Integer, if provided, of how many times to page down.
- sleep – Integer, if provided, of how many seconds to sleep after initial render.
- reload – If
False
, content will not be loaded from the browser, but will be provided from memory. - keep_page – If
True
will allow you to interact with the browser page throughr.html.page
. - send_cookies_session – If
True
sendHTMLSession.cookies
convert. - cookies – If not
empty
sendcookies
.
If
scrolldown
is specified, the page will scrolldown the specified number of times, after sleeping the specified amount of time (e.g.scrolldown=10, sleep=1
).If just
sleep
is provided, the rendering will wait n seconds, before returning.If
script
is specified, it will execute the provided JavaScript at runtime. Example:script = """ () => { return { width: document.documentElement.clientWidth, height: document.documentElement.clientHeight, deviceScaleFactor: window.devicePixelRatio, } } """
Returns the return value of the executed
script
, if any is provided:>>> r.html.render(script=script) {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
Warning: the first time you run this method, it will download Chromium into your home directory (
~/.pyppeteer
).
-
search
(template: str) → parse.Result¶ Search the
Element
for the given Parse template.Parameters: template – The Parse template to use.
-
search_all
(template: str) → Union[List[parse.Result], parse.Result]¶ Search the
Element
(multiple times) for the given parse template.Parameters: template – The Parse template to use.
-
xpath
(selector: str, *, clean: bool = False, first: bool = False, _encoding: str = None) → Union[List[str], List[requests_html.Element], str, requests_html.Element]¶ Given an XPath selector, returns a list of
Element
objects or a single one.Parameters: - selector – XPath Selector to use.
- clean – Whether or not to sanitize the found HTML of
<script>
and<style>
tags. - first – Whether or not to return just the first result.
- _encoding – The encoding format.
If a sub-selector is specified (e.g.
//a/@href
), a simple list of results is returned.See W3School’s XPath Examples for more details.
If
first
isTrue
, only returns the firstElement
found.
- url – The URL from which the HTML originated, used for
-
class
requests_html.
Element
(*, element, url: str, default_encoding: str = None)[source]¶ An element of HTML.
Parameters: - element – The element from which to base the parsing upon.
- url – The URL from which the HTML originated, used for
absolute_links
. - default_encoding – Which encoding to default to.
-
absolute_links
¶ All found links on page, in absolute form (learn more).
-
attrs
¶ Returns a dictionary of the attributes of the
Element
(learn more).
-
base_url
¶ The base URL for the page. Supports the
<base>
tag (learn more).
-
encoding
¶ The encoding string to be used, extracted from the HTML and
HTMLResponse
headers.
-
find
(selector: str = '*', *, containing: Union[str, List[str]] = None, clean: bool = False, first: bool = False, _encoding: str = None) → Union[List[requests_html.Element], requests_html.Element]¶ Given a CSS Selector, returns a list of
Element
objects or a single one.Parameters: - selector – CSS Selector to use.
- clean – Whether or not to sanitize the found HTML of
<script>
and<style>
tags. - containing – If specified, only return elements that contain the provided text.
- first – Whether or not to return just the first result.
- _encoding – The encoding format.
Example CSS Selectors:
a
a.someClass
a#someID
a[target=_blank]
See W3School’s CSS Selectors Reference for more details.
If
first
isTrue
, only returns the firstElement
found.
-
html
¶ Unicode representation of the HTML content (learn more).
-
links
¶ All found links on page, in as–is form.
-
raw_html
¶ Bytes representation of the HTML content. (learn more).
-
search
(template: str) → parse.Result¶ Search the
Element
for the given Parse template.Parameters: template – The Parse template to use.
-
search_all
(template: str) → Union[List[parse.Result], parse.Result]¶ Search the
Element
(multiple times) for the given parse template.Parameters: template – The Parse template to use.
-
xpath
(selector: str, *, clean: bool = False, first: bool = False, _encoding: str = None) → Union[List[str], List[requests_html.Element], str, requests_html.Element]¶ Given an XPath selector, returns a list of
Element
objects or a single one.Parameters: - selector – XPath Selector to use.
- clean – Whether or not to sanitize the found HTML of
<script>
and<style>
tags. - first – Whether or not to return just the first result.
- _encoding – The encoding format.
If a sub-selector is specified (e.g.
//a/@href
), a simple list of results is returned.See W3School’s XPath Examples for more details.
If
first
isTrue
, only returns the firstElement
found.
Utility Functions¶
HTML Sessions¶
These sessions are for making HTTP requests:
-
class
requests_html.
HTMLSession
(**kwargs)[source]¶ -
-
delete
(url, **kwargs)¶ Sends a DELETE request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
get
(url, **kwargs)¶ Sends a GET request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
get_adapter
(url)¶ Returns the appropriate connection adapter for the given URL.
Return type: requests.adapters.BaseAdapter
-
get_redirect_target
(resp)¶ Receives a Response. Returns a redirect URI or
None
-
head
(url, **kwargs)¶ Sends a HEAD request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
merge_environment_settings
(url, proxies, stream, verify, cert)¶ Check the environment and merge it with some settings.
Return type: dict
-
mount
(prefix, adapter)¶ Registers a connection adapter to a prefix.
Adapters are sorted in descending order by prefix length.
-
options
(url, **kwargs)¶ Sends a OPTIONS request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
patch
(url, data=None, **kwargs)¶ Sends a PATCH request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
post
(url, data=None, json=None, **kwargs)¶ Sends a POST request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - json – (optional) json to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
prepare_request
(request)¶ Constructs a
PreparedRequest
for transmission and returns it. ThePreparedRequest
has settings merged from theRequest
instance and those of theSession
.Parameters: request – Request
instance to prepare with this session’s settings.Return type: requests.PreparedRequest
-
put
(url, data=None, **kwargs)¶ Sends a PUT request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
rebuild_auth
(prepared_request, response)¶ When being redirected we may want to strip authentication from the request to avoid leaking credentials. This method intelligently removes and reapplies authentication where possible to avoid credential loss.
-
rebuild_method
(prepared_request, response)¶ When being redirected we may want to change the method of the request based on certain specs or browser behavior.
-
rebuild_proxies
(prepared_request, proxies)¶ This method re-evaluates the proxy configuration by considering the environment variables. If we are redirected to a URL covered by NO_PROXY, we strip the proxy configuration. Otherwise, we set missing proxy keys for this URL (in case they were stripped by a previous redirect).
This method also replaces the Proxy-Authorization header where necessary.
Return type: dict
-
request
(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None)¶ Constructs a
Request
, prepares it and sends it. ReturnsResponse
object.Parameters: - method – method for the new
Request
object. - url – URL for the new
Request
object. - params – (optional) Dictionary or bytes to be sent in the query
string for the
Request
. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - json – (optional) json to send in the body of the
Request
. - headers – (optional) Dictionary of HTTP Headers to send with the
Request
. - cookies – (optional) Dict or CookieJar object to send with the
Request
. - files – (optional) Dictionary of
'filename': file-like-objects
for multipart encoding upload. - auth – (optional) Auth tuple or callable to enable Basic/Digest/Custom HTTP Auth.
- timeout (float or tuple) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.
- allow_redirects (bool) – (optional) Set to True by default.
- proxies – (optional) Dictionary mapping protocol or protocol and hostname to the URL of the proxy.
- stream – (optional) whether to immediately download the response
content. Defaults to
False
. - verify – (optional) Either a boolean, in which case it controls whether we verify
the server’s TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to
True
. - cert – (optional) if String, path to ssl client cert file (.pem). If Tuple, (‘cert’, ‘key’) pair.
Return type: requests.Response
- method – method for the new
-
resolve_redirects
(resp, req, stream=False, timeout=None, verify=True, cert=None, proxies=None, yield_requests=False, **adapter_kwargs)¶ Receives a Response. Returns a generator of Responses or Requests.
-
response_hook
(response, **kwargs) → requests_html.HTMLResponse¶ Change response enconding and replace it by a HTMLResponse.
-
send
(request, **kwargs)¶ Send a given PreparedRequest.
Return type: requests.Response
-
should_strip_auth
(old_url, new_url)¶ Decide whether Authorization header should be removed when redirecting
-
-
class
requests_html.
AsyncHTMLSession
(loop=None, workers=None, mock_browser: bool = True, *args, **kwargs)[source]¶ An async consumable session.
-
delete
(url, **kwargs)¶ Sends a DELETE request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
get
(url, **kwargs)¶ Sends a GET request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
get_adapter
(url)¶ Returns the appropriate connection adapter for the given URL.
Return type: requests.adapters.BaseAdapter
-
get_redirect_target
(resp)¶ Receives a Response. Returns a redirect URI or
None
-
head
(url, **kwargs)¶ Sends a HEAD request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
merge_environment_settings
(url, proxies, stream, verify, cert)¶ Check the environment and merge it with some settings.
Return type: dict
-
mount
(prefix, adapter)¶ Registers a connection adapter to a prefix.
Adapters are sorted in descending order by prefix length.
-
options
(url, **kwargs)¶ Sends a OPTIONS request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
patch
(url, data=None, **kwargs)¶ Sends a PATCH request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
post
(url, data=None, json=None, **kwargs)¶ Sends a POST request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - json – (optional) json to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
prepare_request
(request)¶ Constructs a
PreparedRequest
for transmission and returns it. ThePreparedRequest
has settings merged from theRequest
instance and those of theSession
.Parameters: request – Request
instance to prepare with this session’s settings.Return type: requests.PreparedRequest
-
put
(url, data=None, **kwargs)¶ Sends a PUT request. Returns
Response
object.Parameters: - url – URL for the new
Request
object. - data – (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the
Request
. - **kwargs – Optional arguments that
request
takes.
Return type: requests.Response
- url – URL for the new
-
rebuild_auth
(prepared_request, response)¶ When being redirected we may want to strip authentication from the request to avoid leaking credentials. This method intelligently removes and reapplies authentication where possible to avoid credential loss.
-
rebuild_method
(prepared_request, response)¶ When being redirected we may want to change the method of the request based on certain specs or browser behavior.
-
rebuild_proxies
(prepared_request, proxies)¶ This method re-evaluates the proxy configuration by considering the environment variables. If we are redirected to a URL covered by NO_PROXY, we strip the proxy configuration. Otherwise, we set missing proxy keys for this URL (in case they were stripped by a previous redirect).
This method also replaces the Proxy-Authorization header where necessary.
Return type: dict
-
resolve_redirects
(resp, req, stream=False, timeout=None, verify=True, cert=None, proxies=None, yield_requests=False, **adapter_kwargs)¶ Receives a Response. Returns a generator of Responses or Requests.
-
response_hook
(response, **kwargs) → requests_html.HTMLResponse¶ Change response enconding and replace it by a HTMLResponse.
-
run
(*coros)[source]¶ Pass in all the coroutines you want to run, it will wrap each one in a task, run it and wait for the result. Return a list with all results, this is returned in the same order coros are passed in.
-
send
(request, **kwargs)¶ Send a given PreparedRequest.
Return type: requests.Response
-
should_strip_auth
(old_url, new_url)¶ Decide whether Authorization header should be removed when redirecting
-