jopen 6年前


  • Automatic cookies (session) support
  • HTTP and SOCKS proxy with and without authorization
  • Keep-Alive support
  • IDN support
  • Tools to work with web forms
  • Easy multipart file uploading
  • Flexible customization of HTTP requests
  • Automatic charset detection
  • Powerful API of extracting info from HTML documents with XPATH queries
  • Asynchronous API to make thousands of simultaneous queries. This part of library called Spider and it is too big to even list its features in this README.
  • Python 3 ready

Grab Example

from grab import Grab  import logging    logging.basicConfig(level=logging.DEBUG)  g = Grab()  g.go('')  g.set_input('login', '***')  g.set_input('password', '***')  g.submit()'/tmp/x.html')    g.doc('//span[contains(@class, "octicon-sign-out")]').assert_exists()  home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()  repo_url = home_url + '?tab=repositories'    g.go(repo_url)  for elem in'//h3[@class="repo-list-name"]/a'):      print('%s: %s' % (elem.text(),                        g.make_url_absolute(elem.attr('href'))))