Member-only story

Simple Nodejs scraper using jsdom

Jonas K
2 min readNov 24, 2021

It doesn’t have to be super complicated, if you want to scrape content from websites, that doesn’t require any login.

In this example, we will scrape titles for all the articles you can find on my personal website. Let’s get right to it.

mkdir my-webscraper
cd my-webscraper
npm init
npm install express got jsdom
touch index.js

Run these commands and your project is ready. We will use express, just so we can see the output as JSON and you now have the possibility to make it as REST API for later usage. If you wanna do something else with the data you have scraped — for example, send a mail or whatever with the output, express is not necessary.

And remember to add start to scripts in package.json

"scripts": {
"start": "node index.js"
}

Nice! Let’s have a look at the spider.

const express = require("express");
const got = require("got");
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const app = express();
const url = "https://www.jonas-k.me/articles/";
let articles = [];
got(url).then((response) => {
const dom = new JSDOM(response.body);
const selector = dom.window.document.querySelectorAll(".list a");

[...selector].forEach((article) => {…

--

--

Jonas K
Jonas K

Written by Jonas K

Building stuff on the world wide web. Hi👋

No responses yet