Headless Chrome and the Puppeteer Library for Scraping and Testing the Web
Written by Nikos Vaggalis
Wednesday, 29 November 2017
With the advent of Single Page Applications, scraping pages for information as well as running automated user interaction tests has become much harder due to its highly dynamic nature. The solution? Headless Chrome and the Puppeteer library.
While there's always been Selenium, PhantomJS and others, and despite headless Chrome and Puppeteer arriving late to the party, they make for valuable additions to the team of web testing automation tools, which allow developers to simulate interaction of real users with a web site or application.
Headless Chrome is able to run without Puppeteer, as it can be programmatically controlled through the Chrome DevTools Protocol, typically invoked by attaching to a remotely running Chrome instance:
From the official documentation, here is an example that navigates to https://example.com and saves a screenshot as example.png::
For example, let's go to www.smadeseek.com and load a list of all smartphones availability.Then programmaticaly click on the img element of the second displayed device to bring up its detailed specifications page. From there we can access the innerHTML of the first table element:
There's just one caveat. Since CDP only works with Chromium, Chrome and other Blink-based browsers, so does Puppeteer. If you require more than that, then sticking to Selenium and its WebDriver API still remains the best option..
Copyright reforms proposed by the European Union, that could have devastating consequences for the Internet, have been halted at least temporarily. Have we escaped the controversial Link Tax (Article [ ... ]