1, Puppeteer brief introduction

Puppeteer is a node library, he provides a group of API for manipulating Chrome, generally speaking is a headless browser (of course, you can also configure it, default is no). Since it is a browser, then we can do all the things that we can do on the browser by hand Puppeteer are competent, in addition, Puppeteer translates into Chinese "puppet" meaning, so listening to the name is known, it is convenient to manipulate it, and you can easily manipulate her to implement the

1) to generate web page screenshots or PDF
2). Advanced crawler, can climb a large number of asynchronous rendering content web page
3) simulation keyboard input, form automatic submission, login page, UI automation test
4) to capture the time line of the site, in order to track your site, help the analysis of the site performance problem

< p> if you used PhantomJS, if you used it, You will find them a bit similar, but Puppeteer is maintained by the Chrome official team.

2, run the environment

to view Puppeteer's official API you will find the full screen async, await and so on, these are all specifications, so you need: You need the latest chrome driver, this system will automatically download

 NPM install Puppeteer --save 

when you install Puppeteer through NPM. T puppeteer = require ('puppeteer'); (async () => {const browser = await puppeteer.launch (); const page = {};

  1. first creates a browser instance Browser object
  2. through puppeteer.launch () and then creates a page Page object
  3. by the Browser object and then page.goto () jumps to the specified page. Screenshots of the page
  4. close browser

is not that easy? Anyway, I think it's simpler than PhantomJS. As for selenium-webdriver, let alone. Here are some of the commonly used API of puppeteer.

3.1 puppeteer.launch (options)

uses puppeteer.launch () to run puppeteer, which will return a promise, using the then method to get the browser instances, of course, the higher version has supported the feature, so the above example uses the keyword, this point. It is necessary to specify that almost all of the operations of Puppeteer are asynchronous. In order to use a large number of then to reduce the readability of the code, all demo code in this article is implemented in async, await. This is also the official recommendation of Puppeteer.

options parameters detailed

. Information, the default is whether the false

to execute the file. If you want to specify a webdriver path of your own, you can set a

. Tr>

headless boolean runs chrome in a "headless" mode, that is, no UI, and the default is the true
slowMo number to slow down the Puppeteer operation in milliseconds, if you want to specify a webdriver path for your own webdriver. If you want to see the whole working process of Puppeteer, this parameter will be very useful.
args Array (String) is passed to other parameters of the chrome instance, for example, you can use "? Ash-host-window-bounds=1024x768" to set the browser window size. The list of more parameter parameters can be referred to as
handleSIGINT boolean timeout number waits for the longest time that Chrome instances start. The default is 30000 (30 seconds). If the 0 is passed in 0, it is not limited to whether the time
dumpio boolean will import the browser process stdout and stderr into process.stdout and process.stderr. The default is false. The
userDataDir string sets the user's data directory, the default Linux is in the ~/.config directory, and the window default is in the C:Users{USER}AppDataLocalGoogleChromeUser Data, where {USER} represents the current login user name Bject specifies the environment variables that are visible to Chromium. The default is process.env. Whether
devtools boolean automatically opens the DevTools panel for each tab, this option is valid only when headless is set to false. Rome instances will create a Browser object in two ways:

Puppeteer.launch and Puppeteer.connect.

, the DEMO implementation, after the disconnected connection is reconnected to the browser instance

 const puppeteer = require. T; {/ / / save Endpoint so that Chromium const browserWSEndpoint = browser.wsEndpoint () can be reconnected; / / / / / Chromium is disconnected from browser.disconnect (); / / / / / / using Chromiunm to establish connection const browser2 = {]; / / / M await browser2.close ();}); 

Browser object API

Promise create a Page instance
turn off browser
browser.disconnect ()
browser.pages () > 119 "> 120") Rget) targets
browser.version () return to the browser instance's socket connection URL, which can be reconnected to the chrome instance

through this URL. After

4, Puppeteer real

to understand API, we can come to some actual combat. Before, we first learned about the design principle of Puppeteer, and the difference between Puppeteer and webdriver and the biggest is that it stands at the station. Webdriver and PhantomJS are originally designed for automated testing, so it is designed from the perspective of machine browsing, so they are using different philosophy of design. To lift a chestnut, I need to open the home page of the Jingdong and carry out a product search, see the implementation process using Puppeteer and webdriver, respectively: the implementation process of


  1. opens the Jingdong homepage
  2. to click the cursor to the search box
  3. keyboard click Input text
  4. click the search button

webdriver implementation process:

  1. opens the Jingdong homepage
  2. to find the input element of the input box Puppeteer design philosophy is more natural than any operation habit.

    below we use a simple need to achieve Puppeteer entry learning. This simple demand is that

    grabs 10 mobile phone products in the Jingdong mall and screenshots the details page of the product.

    first let us comb the operation process

    1. open the Jingdong home page
    2. input the "mobile" keyword and search
    3. to get the first 10 items of the A tag, and get the href attribute value, access to commodity details link
    4. to open the details of the details of the 10 items, separate the page picture < / To achieve the above functions, li>

    needs to search for elements, get attributes, keyboard events, etc. next, let's explain them one by one. The

    4.1 gets the element

    Page object to provide 2 API to get the page element

    (1). Page.$(selector) gets a single element, the bottom is the call of document.querySelector (), so the selector format of the selector follows the

    This concludes the body part

    This paper fixed link:http://www.script-home.com/detailed-explanation-of-the-puppeteer-introductory-tutorial.html | Script Home | +Copy Link

    Article reprint please specify:Detailed explanation of the Puppeteer introductory tutorial | Script Home

    You may also be interested in these articles!