Sample workflows

The following example workflows show how to use the Selenium Nodes. You can download the workflows and import them into KNIME using File → Import KNIME Workflow…

Did you know? You can also find this documentation on NodePit — the world’s first search engine that allows you to easily search, find and install KNIME nodes and workflows.

Search results scraping#w1

This workflow shows how to scrape Google search results for a given search query. The workflow simulates a query submission and loops through the result pages to collect more search results than displayed on the first result page. Beside that, it demonstrates how to take screenshots and extract the rendered DOM source of webpages. Download here.

Facebook scraping#w2

This workflow logs into your Facebook account and extracts a list of all your Facebook friends and their profile URLs. As the Facebook friends are rendered through “infinite scrolling”, the workflow repeatedly scrolls down the page in a loop to fetch all data prior to extracting the information. Make sure to provide your credentials before running as workflow variables: Right click on the imported workflow in the “KNIME Explorer” view, choose “Workflow Variables…”). Download here.

Saving files#w3

This workflow demonstrates how to download a binary file within the browser using an XMLHttpRequest, encodes the download to a Base64 string and transfers the content back to KNIME, where it is written to the file system. This workflow is handy when you need to download files from an authenticated environment. See the FAQ for more details, download here.

Get Cookies#w4

This example shows how to get the cookies for the currently displayed page. The cookies are extracted in a “Java Snippet” node, returned as a JSON object and split up into a KNIME table subsequently. Note: The Selenium API allows to access cookies which are accessible via JavaScript in the browser. Cookies marked as “httpOnly” can not be seen by client-side JavaScript and Selenium. Download here.

Get Current URL#w5

This example shows how one can access the currently shown URL in the browser. We use the WebDriver’s getCurrentUrl method in a Java Snippet node. Download here.

Right click and context menu#w6

This workflow demonstrates how to perform a “right click” to open a website’s custom context menu and select a menu item with the Selenium Actions API. Download here.

Yelp review scraping#w7

A workflow for extracting restaurant reviews from Yelp pages. Based on this post in the KNIME forums. Beside the general concepts of data extraction using CSS selectors with the Find Elements nodes, it shows how to ensure sequential execution by using flow variables instead of Synchronize nodes. Download here.