when learning Python crawler, often meet to crawl the website to take the anti crawling technology, high strength and high efficiency web crawling information will often bring enormous pressure to the web server, so the same IP repeatedly climb from the same page, it is likely to be about a letter here a crawler skill set agent IP.

  • configuration to install the requests library
  • to install the BS4 library to install lxml


 # code to display IP addresses from a domestic anonymous proxy website: http://www.xicidaili.com/nn/? IP # just crawled home IP the address is general enough to use from BS4 import BeautifulSoup import requests import random def get_ip_list (URL, headers): web_data = requests.get (URL, headers=headers) soup = BeautifulSoup (web_data.text,'lxml') IPS = soup.find_all ('tr') ip_list for I in range ([] = 1, len (IPS): ip_info = TDS = ips[i]) ip_info.find_all ('td') ip_list.append (tds[1].text + [2].text + TDS':' Return ip_list) def get_random_ip (ip_list): proxy_list for IP in ip_list: proxy_list.append [] = ('http://'+ IP) proxy_ip = random.choice (proxy_list) proxies = {'http': proxy_ip} return proxies if __name__'__main__': url ='http:/ = = /www.xicidaili.com/nn/' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64 AppleWebKit/537.36 (KHTML, like) Gecko Chrome/53.0.2785.143 Safari/537.36' ip_list}) = get_ip_list (URL, headers=headers) proxies = get_random_ip (ip_list) print (proxies) function get_ip_list (URL, headers) to URL and headers, and returns a list of IP, the elements of the list are similar to the format, this list includes domestic? Anonymous proxy IP website all IP address and port. 

function get_random_ip (ip_list) into the first list function, return a random proxies, this proxies can be passed to the get method of the requests, so that you can do every run using different IP access is crawling the site, effectively avoid the risk of real IP was closed. The format of proxies is a dictionary :{'http': ''}.

(three) proxy IP uses

, running the above code will get a random proxies, and it can be directly imported into the

method of requests.

 web_data = requests.get (URL, headers=headers, proxies=proxies) 

summarized above is Xiaobian to introduce the setting method of Python crawler agent IP (crawler skills), I hope to help you, if you have any questions, welcome to my message, Xiaobian your timely reply!

This paper fixed link:http://www.script-home.com/python-crawler-set-up-proxy-ip-method-crawler-technique.html | Script Home | +Copy Link

Article reprint please specify:Python crawler set up proxy IP method (crawler technique) | Script Home

You may also be interested in these articles!