everyone can get all the source code on the Github clone.

Github:https://github.com/williamzxl/Scrapy_CrawlMeiziTu

Scrapy rel= "external official documents: http://scrapy-chs.readthedocs.io/zh_CN/latest/index.html

" basically in accordance with the flow of the document will use the basic go again.

Step1:

must create a new Scrapy project before starting to crawl. Enter to save code directory, run the following command:

 scrapy startproject CrawlMeiziTu

this command will create the tutorial directory contains the following contents:

 CrawlMeiziTu/ scrapy.cfg CrawlMeiziTu/ __init__.py items.py pipelines.py settings.py middlewares.py spiders/ __init__.py CD CrawlMeiziTu scrapy genspider Meizitu... Http://www.meizitu.com/a/list_1_1.html

this command will create the tutorial directory contains the following contents:

 CrawlMeiziTu/ scrapy.cfg CrawlMeiziTu/ __init__.py items.py pipelines.py settings.py middlewares.py spiders/ Meizitu.py __init__.py... 

our main editor is as follows Arrow:

 from scrapy import CmdLine cmdline.execute ("scrapy crawl Meizitu".Split (

))

is mainly for the convenience of operation.

Step2: Settings editor, as shown in

 BOT_NAME ='CrawlMeiziTu'SPIDER_MODULES = ['CrawlMeiziTu.spiders'] NEWSPIDER_MODULE ='CrawlMeiziTu.spiders' ITEM_PIPELINES = {'CrawlMeiziTu.pipelines.CrawlmeizituPipeline'300}: IMAGES_STORE ='D://pic2', DOWNLOAD_DELAY = 0.3 USER_AGENT ='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'ROBOTSTXT_OBEY = True

main set USER_AGENT download path, Download

center "> ></p>
<p>

Step3: editor Items.

Items is mainly used to access through the Spider program Grabbing information. As we climb from the woman figure, so to grasp the name of each picture, the picture of the connection, and so on

 coding: -*- # label UTF-8 Define here the -*- # models for your scraped items See documentation in: # # # http://doc.scrapy.org/en/latest/topics/ items.html import scrapy class CrawlmeizituItem (scrapy.Item): define the fields for your item # here like: # name = scrapy.Field (#title) to title (scrapy.Field) folder name = url = scrapy.Field (tags) = scrapy.Field (SRC = scrapy.Field) with # pictures (#alt) to ALT (scrapy.Field) picture name = 

></p>
<p>

Step5: editor Meizitu the.

is the most important main program:

 coding: UTF-8 # -*- -*- import scrapy from CrawlMeiziTu.items import CrawlmeizituItem #from CrawlMeiziTu.items import CrawlmeizituItemPage import time class MeizituSpider (scrapy.Spider): name = "Meizitu" #allowed_domains = "meizitu.com/" [] [] [] = start_urls last_url = with open ('..//url.txt','r') as (fp: crawl_urls = fp.readlines for start_url crawl_urls: (last_url.append) in start_url.strip ('n')) start_urls.append (".Join (last_url[-1]) DEF) parse (self, response): selector = scrapy.Selector (response) #item = CrawlmeizituItemPage (next_pages) = selector.xpath ('//*[@id= wp_page_numbers]/ul/li/a/@href').Extract (next_pages_text) = selector.xpath ('//*[@id=" wp_page_n Umbers (]/ul/li/a/text).Extract (all_urls')) [] = if 'next' in next_pages_text: next_url = "http://www.meizitu.com/a/{}".Format (next_pages[-2]) with open ('..//url.txt','a+') as fp: fp.write ('n') fp.write (next_url) fp.write (n) request = scrapy.http.Request (next_url, callback=self.parse, time.sleep (2) request all_info = selector.xpath (yield)'//h3[@class= "tit"]/a') # read each picture clip to connect the for info in all_info: links = info.xpath ('//h3[@class= "tit"]/a/@href' (for).Extract) link in links: request = scrapy.http.Request (link, callback=self.parse_item) time.sleep (1) yield request next_link selector.xpath ('//*[@id= # = "wp_page_numbers]/ul/li/a/@href'.Extract (next_link_te) #) XT = selector.xpath ('//*[@id= wp_page_numbers]/ul/li/a/text) (.Extract) (if ') #' next 'in next_link_text: # nextPage = "http://www.meizitu.com/a/{}".Format (next_link[-2]) = nextPage yield item item['page_url'] # # # grab each folder information def parse_item (self, response): item = CrawlmeizituItem (selector =) scrapy.Selector (response) = selector.xpath ('//h2/a/text (image_title).Extract (image_url')) = selector.xpath ('//h2/a/@href').Extract (image_tags) = selector.xpath ('//div[@class= "metaRight" (]/p/text).Extract (if')) selector.xpath ('//*[@id= picture]/p/img/@src') (.Extract): image_src = selector.xpath ('//*[@id= picture]/p/img/@src').Extract (else: image_src = selector.xpath ('//*[@id=) "Maincontent"]/div/p/img/@src'(.Extract)) if selector.xpath ('//*[@id= picture]/p/img/@alt') (.Extract): pic_name = selector.xpath ('//*[@id= picture]/p/img/@alt').Extract (else:) pic_name = selector.xpath ('//*[@id= maincontent]/div/p/img/@alt') (.Extract) #//*[@id= "maincontent"]/div/p/img/@alt item['title'] = image_title item['url'] = image_url item['tags'] = image_tags = item['src'] image_src item['alt'] = pic_name print (item) time.sleep (1) yield item

"

summarized above is Xiaobian to introduce the use of Python Scrap Y crawler frame to climb the picture and save the local implementation code, we hope to help you. If you have any questions, please leave a message for me, Xiaobian will reply to you in time.

This paper fixed link:http://www.script-home.com/python-uses-the-scrapy-crawler-frame-to-crawl-the-picture-and-save-the-local-implementation-code.html | Script Home | +Copy Link

Article reprint please specify:Python uses the Scrapy crawler frame to crawl the picture and save the local implementation code | Script Home

You may also be interested in these articles!