nodejs|April 08, 2020|2 min read

How to check whether a website link has your URL backlink or not - NodeJs implementation

TL;DR

Read a list of URLs from a text file and use Node.js to fetch each page, parse the HTML, and check if it contains your website's backlink.

How to check whether a website link has your URL backlink or not - NodeJs implementation

Introduction

I got my seo backlink work done from a freelancer. It was like 3000 links, and usually the links that freelancer provides are broken. So, I wanted to really test each single of them to check if those URLs are actually active and having my url ot backlink.

NodeJs automation

I wrote a simple nodejs automation which read list of urls from a text file, and one by one check the validity of url and backlink.

Input

  1. A text file having list of urls
  2. My website name: xyz.com

Code

Following is the directory structure:

project
    - app.js
    - src/http/url_checker.js
    - package.json

package.json

{
  "name": "check_links_seo",
  "version": "1.0.0",
  "description": "For checking link validity work given by freelancers",
  "main": "app.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Gorav Singal",
  "license": "ISC",
  "dependencies": {
    "async": "^3.2.0",
    "cheerio": "^1.0.0-rc.3",
    "request": "^2.88.2",
    "request-promise": "^4.2.5"
  }
}

app.js

const urlChecker = require('./src/http/url_checker');
const fs = require('fs');

const urls = fs.readFileSync('urls.txt').toString().split('\n');

//remember to put your website here
const myWeb = 'XYZ.com';

return urlChecker.checkYourLinkInUrls(urls, myWeb)
    .then(() => {
        console.log('Successful finished...');
    })
    .catch(err => {
        console.error(err);
    });

url_checker.js

const rp = require('request-promise');
const cheerio = require('cheerio');
const async = require('async');

class UrlChecker {
    checkYourLinkInUrls(urls, desiredWebsite) {
        return new Promise((resolve, reject) => {
            async.eachLimit(urls, 1, (url, callback) => {
                return this.__checkYourLinkInUrl(url, desiredWebsite)
                    .then(function (res) {
                        if (!res) {
                            console.log('failed', url);
                        }
                        else {
                            console.log('success', url);
                        }
                        callback();
                    }).catch(function (err) {
                        callback(err);
                    });
            }, function (err) {
                if (err) {
                    reject(err);
                } else {
                    resolve();
                }
            });
        });
    }

    __checkYourLinkInUrl(url, desiredWebsite) {
        // console.log('Checking url: ', url);
        return rp(url)
            .then(html => {
                return html.indexOf(desiredWebsite) > -1;
                // const $ = cheerio.load(html);
                // const links = $('a');

                // let found = false;
                // $(links).each(function(i, link){
                //     const web = $(link).attr('href');
                //     console.log(web);
                //     // console.log($(link).text() + ':\n  ' + $(link).attr('href'));
                //     if (web.startsWith(desiredWebsite)) {
                //         found = true;
                //         return found;
                //     }
                // });
                // // console.log($(links));
                // return found;
            })
            .catch(err => {
                // console.error('Error in url', url, err);
                return false;
            });
    }
}

module.exports = new UrlChecker();

Note: In above code, I’m just checking whether given web page is having my website or not. And in commented code, I’ve also checked for actual links. But, this code is bit expensive in computation as well as memory.

Run code

node app.js

Thanks for reading…

Related Posts

A Practical Guide on how to work with Git log command and history

A Practical Guide on how to work with Git log command and history

Introduction In this post, we will see ways to look at git history logs. For…

How to solve - Apache Ftp Client library is printing password on console

How to solve - Apache Ftp Client library is printing password on console

The problem comes while using FTPS. When developer uses login method of this…

Drupal 8 - How to hide a view block if content is empty

Drupal 8 - How to hide a view block if content is empty

Introduction I have created a view, with some filters and content fields. I will…

Drupal 8 - How to hide help link About text formats and text format guidelines

Drupal 8 - How to hide help link About text formats and text format guidelines

Problem In drupal textarea field, it was always a pain to see the two links…

Implement a command line shell by using Command Dispatcher in Python

Implement a command line shell by using Command Dispatcher in Python

Lets implement a command shell by using a command dispatcher. The objective is…

Common used Elastic Search queries

Common used Elastic Search queries

Listing down the commonly used Elastic Search queries. You can get search…

Latest Posts

REST API Design: Pagination, Versioning, and Best Practices

REST API Design: Pagination, Versioning, and Best Practices

Every time two systems need to talk, someone has to design the contract between…

Efficient Data Modelling: A Practical Guide for Production Systems

Efficient Data Modelling: A Practical Guide for Production Systems

Most engineers learn data modelling backwards. They draw an ER diagram…

Deep Dive on Caching: From Browser to Database

Deep Dive on Caching: From Browser to Database

“There are only two hard things in Computer Science: cache invalidation and…

System Design Patterns for Real-Time Updates at High Traffic

System Design Patterns for Real-Time Updates at High Traffic

The previous articles in this series covered scaling reads and scaling writes…

System Design Patterns for Scaling Writes

System Design Patterns for Scaling Writes

In the companion article on scaling reads, we covered caching, replicas, and…

System Design Patterns for Managing Long-Running Tasks

System Design Patterns for Managing Long-Running Tasks

Introduction Some operations simply can’t finish in the time a user is willing…