How To Automate Broken URLs Check with Google Sheets and App Script: A Case Study

| | 3 min read

Introduction

We are working with Victoria University (VU), one of Australia's renowned educational institutions, to support their team in testing their website after a Drupal 7 to Drupal 9 migration and regression on every new release.

The website already housed an extensive collection of content, with over 60,000 pages, meticulously curated by administrators. However, while developing a new module, we encountered a challenge; some pages throw errors. To proactively tackle this issue and prevent potential mishaps in the future, the client requested a solution to verify the status of all 60,000+ pages, ensuring each one had an HTTP 200 OK success status response code before deployment to the live environment.

The Business Challenge: Testing More than 60K Pages

While working on a module, our team noticed that certain pages were flagged as broken. Given the vast number of pages, detecting these isolated issues before someone reported them seemed improbable. The client, recognizing the significance of delivering a flawless user experience, urgently required a method to automatically check the status of all pages to ensure that they were error-free.

 

 

The Solution: Automation with Google Sheets

URL HTTP Status

To address the challenge, we decided to leverage the power of automation by implementing Google Apps Script with Google Sheets.

We can check whether a page is broken by accessing it, but manually checking all pages is practically impossible. So we decided to check the HTTP status code for all the pages in bulk, and if we get any status other than 200, we can investigate further on those pages.

By creating a seamless integration between the two platforms, we could efficiently manage and monitor the page status for any errors. The steps we followed were as follows:

  1. Importing Page URLs to a Google Sheet:  We began by importing all the URLs into a Google sheet.
  2. Adding an Extra Column for Status Code:  To facilitate the fetching of status codes, we added an extra column in the sheet.
  3. Google Apps Script:  We created an App Script that loops through all the URLs in the sheet and updates the HTTP Status against each URL.

The following script was used to fetch the response code for each page:

// Google Apps Script to fetch HTTP response codes for URLs with HTTP Authentication

function getResponseCodeWithAuth(url, username, password) {
  try {
    var options = {
      "method": "get"
    };
    if (username && password) {
      var headers = {
        "Authorization": "Basic " + Utilities.base64Encode(username + ":" + password)
      };
      options.headers = headers;
    }
    var response = UrlFetchApp.fetch(url, options);
    return response.getResponseCode();
  } catch (error) {
    // Return 0 for any errors
    return 0;
  }
}

function updateStatusCodesWithAuth() {
  // Get the "Data" sheet which contains the URLs
  var dataSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Data");
  var urlData = dataSheet.getDataRange().getValues();

  // Get the common username and password from the "config" sheet
  var configSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Config");
  var configData = configSheet.getDataRange().getValues();
  var username = configData[0][0]; // Assuming the username is in the first row (row 1) of "config" sheet
  var password = configData[0][1]; // Assuming the password is in the second row (row 2) of "config" sheet

  for (var i = 0; i < urlData.length; i++) {
    var url = urlData[i][0];
    var statusCode = getResponseCodeWithAuth(url, username, password);
    dataSheet.getRange(i + 1, 2).setValue(statusCode);
  }
}

The URLs were behind HTTP auth; in the Google Sheet, we add a config sheet to input the HTTP credentials, and the script takes that data from it.

The Result

The solution proved highly effective thanks to the automation capabilities of Google Sheets with Apps Script. We successfully obtained the HTTP response codes for all 60,000+ pages in the Google sheet by running the script. Consequently, we could promptly identify and fix the missing pages, eliminating potential issues before the website's deployment.

Conclusion

When harnessed intelligently, automation can significantly streamline and enhance website testing processes. In the case of Victoria University, Google Apps Script and Google Sheets proved to be a dynamic duo, enabling us to efficiently verify the status of all pages and ensure a seamless user experience. By proactively detecting and resolving issues before they impact end-users, VU can confidently deploy new releases without fear of unexpected errors. As businesses increasingly embrace automation, it becomes evident that these innovative solutions save time and resources and contribute to overall customer satisfaction and success.

We have experience in automation testing frameworks like Playwright, Selenium and Cypress. Contact us if you are looking for a partner for website testing.