Published on

Chromedp

Authors
  • avatar
    Name
    Galuh Pradipta
    Twitter

Chromedp is an open-source Golang library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. With chromedp, web scraping can be done in a more efficient and user-friendly way. In this document, we will discuss how to use chromedp library for scraping websites.

Installation

Before we dive into the usage of chromedp, we need to install the chromedp library and Chromium. Chromedp library can be installed using the following command:

$ go get -u github.com/chromedp/chromedp

To install Chromium on Ubuntu or Debian, run the following command:

$ sudo apt-get install chromium-browser

To install Chromium on MacOS using Homebrew, run the following command:

$ brew install chromium

For other operating systems, please refer to the Chromium download page: https://www.chromium.org/getting-involved/download-chromium

Usage

After installing the chromedp library and Chromium, we need to import the library in our project. Here is an example code for scraping a website:

package main

import (
	"context"
	"fmt"

	"github.com/chromedp/chromedp"
)

func main() {
	// create context
	ctx, cancel := chromedp.NewContext(context.Background())
	defer cancel()

	// navigate to the website
	err := chromedp.Run(ctx, chromedp.Navigate("<https://www.example.com>"))
	if err != nil {
		fmt.Println(err)
	}

	// get the title of the website
	var title string
	err = chromedp.Run(ctx, chromedp.Title(&title))
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(title)

	// get the text of a specific element
	var text string
	err = chromedp.Run(ctx, chromedp.Text("#element-id", &text))
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(text)
}

In the above code, we first create a context using chromedp.NewContext. Then, we navigate to the website using chromedp.Navigate. After that, we get the title of the website using chromedp.Title. Next, we get the text of a specific element using chromedp.Text. Finally, we print the title and text of the website.

Another use case for chromedp is scraping data from websites that require authentication. Chromedp can handle authentication by allowing you to pass cookies or credentials through the context. Here is an example code for scraping a website that requires authentication:

package main

import (
	"context"
	"fmt"

	"github.com/chromedp/chromedp"
)

func main() {
	// create context with authentication
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag("headless", false),
		chromedp.Flag("disable-gpu", true),
		chromedp.UserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"),
		chromedp.Flag("ignore-certificate-errors", true),
		chromedp.Flag("allow-insecure-localhost", true),
		chromedp.Flag("disable-web-security", true),
		chromedp.Flag("disable-extensions", true),
		chromedp.Flag("disable-popup-blocking", true),
		chromedp.Flag("no-first-run", true),
		chromedp.Flag("no-default-browser-check", true),
		chromedp.Flag("no-sandbox", true),
	)
	execAllocator := chromedp.NewExecAllocator(context.Background(), opts...)

	ctx, cancel := chromedp.NewContext(execAllocator)
	defer cancel()

	// authenticate
	err := chromedp.Run(ctx,
		chromedp.Navigate("<https://www.example.com/login>"),
		chromedp.WaitVisible("#username"),
		chromedp.SendKeys("#username", "myusername"),
		chromedp.SendKeys("#password", "mypassword"),
		chromedp.Click("#login-button"),
		chromedp.WaitVisible("#dashboard"),
	)
	if err != nil {
		fmt.Println(err)
	}

	// scrape data
	var data string
	err = chromedp.Run(ctx,
		chromedp.Navigate("<https://www.example.com/data>"),
		chromedp.WaitVisible("#data"),
		chromedp.Text("#data", &data),
	)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(data)
}

In the above code, we first create a context with authentication options using chromedp.NewExecAllocator. Then, we authenticate by navigating to the login page and filling in the username and password fields using chromedp.SendKeys. After that, we click the login button using chromedp.Click and wait for the dashboard to appear using chromedp.WaitVisible. Finally, we scrape data from the data page using chromedp.Navigate and chromedp.Text.

Conclusion

Chromedp is a powerful tool for web scraping using Golang. It provides a simple and efficient way to scrape websites. With its high-level API and support for Chrome or Chromium, it makes web scraping easier and more accessible for everyone. Chromedp can also handle authentication, making it a great option for scraping data from websites that require authentication.