Coffee and Chat

A blog (mostly) about coding

Making this blog part 3 (Backend Refactoring)

When I finished part 2 of my making this blog article, I originally concluded that I was happy with the technical outcome. My program fetched data from a Google docs file and parsed that data into HTML content which was then displayed on the website.

However after further testing a critical issue with the blog listing function was starting to exhibit. It was becoming very slow. I isolated the specific problem, in that individual fetch commands take Google Cloud Services around 200 to 250 ms to complete. Individual blog articles which pull information from a single Docs file, this is acceptable, with the bulk of the time taken to process being taken up with processing the data into JSON. But in the blog listing function, when looking at multiple files the process becomes extremely slow and starts to cause timeouts once more than 6 blog articles are being fetched. Logs suggest that over 10 seconds are spent waiting for documents to load.

The reason being the procedural nature of the program meaning that one request isn’t started until the one prior had been concluded. This led me to the conclusion that I would need to employ some basic database management solutions to my backend inorder to deliver content in a timely fashion. So I developed a database index using Google Drive, Google Sheets and Google Apps Script.

Indexing, the classic file management system

Photo byMaksym Kaharlytskyi onUnsplash

Indexing as a concept predates computers, but in database management an index is often a table that stores key information to allow for easy and rapid data access. Here I will use an index to store all the data required for the front page in the form of a single Google Sheet and then create a function that creates the front page from the index file, hopefully vastly improving load times.

Getting started, what the index needs

While not always possible, starting with a list of needs is a great way to program, and luckily I have the old blogFetcher program I can use to create a list of items an index would need.

{"title": "Making this blog (part 2 front end)","imageDescription": "NextJs","imageCuri": "https://drive.google.com/uc?id=1noUNbDb0epQTGY-WhJQEmcrPerIg7N2Y","introText": "The first part of this blog development diary left with the back end being completed, using a combination of Google Drive As the database comma Google docs as the word processing unit and Google app script as the back end server providing restful API performance. This part we'll focus on the front end and fetching the data via the rest of API and then parsing it into content for displaying on the website.","fileId": "1JKw2SnIotJtoLHnI-XFQWm9ZXlQic4_giwctsfTJXFQ","fileDate": "2023-07-25T20:58:07.565Z"},{"title": "Making This Blog (part 1 backend)","imageDescription": "Googles suite is easy to use and powerful","imageCuri": "https://drive.google.com/uc?id=1uS9UfRmsSB96tf1Ml7BfNSLQSArkNj_n","introText": "So one thing that a lot of coders do is write a blog. While I'm not normally a trend chaser, I do see the benefit of doing something like a blog to show your inner thoughts on coding and the like.","fileId": "1VBdL5h6a8hiY7C3S3xBNrvElvBCpGQEwZQ8kEZusGhY","fileDate": "2023-07-25T15:17:38.949Z"},

As we can look at the API call we need:

title: A string representing the title of the document.

imageDescription: A string describing the image.

imageCuri: A string representing the image URI (Uniform Resource Identifier).

introText: A string containing introductory text.

fileId: A unique identifier for the document.

fileDate: A date value representing the document's date.

blogIndex the one sheet solution

So, knowing what data is needed to propagate the json file, I structured the google sheets into columns with the first row labelled with title, imageDescription, imageCuri, introText, fileId, fileDate in the first row. While not strictly needed it does help for clarity. So now to move forward with

Dealing with the image problem

As I mentioned in the first article of making this blog, one of the design choices for the initial blog was to directly pull images from the docs file via the imageCuri, but the imageCuri is a temporary link that has a lifespan of 30 mins. This won’t work for an indexing file as the link will need to be more permanent. Therefore I conclude that I have to move files out of the blog and into Google Drive for linking to work.

I envision the system to be structured like the following image

Structure of my project with the blogIndex and supporting file structure

The blogFetch can remain the same, but the blogIndexer is and formIndexFetch will need to be completely new.

Expanded database

The original database was a single drive folder which docs were deposited into. In addition to this folder I added a folder for indexing, and in this folder a google sheet called blogIndex and a subfolder for images to be posted.

Folder structure for the blogs database

Subfolder Index

logIndexer

The blogIndexer serves a similar function to the old Blog Listing API, but instead of interacting with the front end via a RESTful JSON, it instead updates the blogIndex file. As a part of this update, it will also need to extract an image from the blog and put this image into a specific folder that will be referenced in the formIndexFetch command.

doGet()

The functional flow of the blogIndexer first opens the drive id that contains the blogs to be indexed. This uses the getFolderById command as the id is static, then getFiles returns a fileiterator that can be iterated through to update the blogIndex file with the function updateIndex

function doGet() {
  const id = '1fs11nkZ5cZNWQ73YC-Zttk2OTvC4k0Vn';
  const folder = DriveApp.getFolderById(id);
  const files = folder.getFiles();
  while (files.hasNext()) {
    var file = files.next();
    const fileDate = file.getLastUpdated();
    const fileId = file.getId();
    updateIndex(fileId, fileDate);
  }
}

updateIndex(fileId, fileDate)

The updateIndex function takes the fileDate and fileId from the fileiterator for each document the title, image imageCuri, imageDescription and introText are gathered and then added to the blogIndexer by a custom addToSheet function.

The image details are gained from the Docs file via the checkHeroImage function.

function updateIndex(fileId, fileDate) {
  const gdoc = DocumentApp.openById(fileId)

  const title = gdoc.getName();
  const image = checkHeroImage(gdoc,fileId)
  const imageCuri = `https://drive.google.com/uc?id=${image[0]}`
  const imageDescription = image[1]
  const introText = getHeroText(gdoc)

  addToSheet([title, imageDescription, imageCuri, introText, fileId, fileDate])
}

checkHeroImage(gdoc)

The checkHeroImage function takes the parameter gdoc, which is the Google Docs document the image will be drawn from. The function locates the first image (hero) from the document and uploads it to the pre-established Google Drive folder for permanent linking. If the image already exists then the process is skipped. The function

function checkHeroImage(gdoc) {
  const imageFolderId = '1CHWJd6qhHu2ZQaMGrtJ3b4r_Zdvy6yzE'
  const images = gdoc.getBody().getImages();
  const firstImage = images[0];
  const imageDescription = firstImage.getAltDescription();
  const imageBlob = firstImage.getBlob();
  const imageName = fileId+"."+imageBlob.getContentType().split('/')[1]
  const imageFolder = DriveApp.getFolderById(imageFolderId);
  if (!imageFolder.getFilesByName(imageName).hasNext()) {
    imageFolder.createFile(imageBlob.setName(imageName));
  }
  const newImage = imageFolder.getFilesByName(imageName).next();
  const newImageId = newImage.getId();
  const returnValue = [newImageId, imageDescription];
  return returnValue;
}

getHeroText(gdoc)

This function is based on the original function designed in part 1. It looks for the first paragraph of normal text and returns the whole paragraph.

function getHeroText(gdoc) {
  const paragraphs = (gdoc.getBody().getParagraphs())
  for (let i = 1; i < paragraphs.length; i++) {
    if (paragraphs[i].getText().length !== 0) {
      const returnValue = paragraphs[i].getText()
      return returnValue
    }
  }
}

addToSheet([title, imageDescription, imageCuri, introText, fileId, fileDate])

The add sheet function opens the blogIndex spreadsheet and assigns it to the ss variable. It then gets the sheet named 'index' from the spreadsheet and assigns it to the sheet variable. All data from the index sheet is retrieved. This creates an array of values that can be iterated through.

This array is looped through looking to see if the fileId is already present, if it is then it updates that row with new values using the setValues(), and then organises the database via the fileDate in a reverse order so the newest blog article is the first entry via the range.sort() function.

If no matching fileId is found during the loop, the function appends a new row to the 'index' sheet using sheet.appendRow() with the provided data.

function addToSheet([title, imageDescription, imageCuri, introText, fileId, fileDate]) {
  // checks the indexSheet to see if the document has already been added, and if it has it skips
  const sheetID = '1FOlvDtvjHWS0Moko2LHHZSO9CuqLl2l072T0Z8E338U';
  const ss = SpreadsheetApp.openById(sheetID);
  const sheet = ss.getSheetByName('index');
  const values = sheet.getDataRange().getValues()
  for (row in values) {
    if (fileId === values[row][4]) {
      const rowToReplace = sheet.getRange(Number(row)+1, 1, 1, 6)
      rowToReplace.setValues([[title, imageDescription, imageCuri, introText, fileId, fileDate]])

      //organise sheets by date
      const range = sheet.getRange(2, 1, sheet.getLastRow() - 1, sheet.getLastColumn());
      range.sort([{column: 6, ascending: false}]);
      return
    }
  }
  sheet.appendRow([title, imageDescription, imageCuri, introText, fileId, fileDate])

  //organise sheets by date
  const range = sheet.getRange(2, 1, sheet.getLastRow() - 1, sheet.getLastColumn());
  range.sort([{column: 6, ascending: true}]);
}

BlogIndexer summary and automation

The blogIndexer function maintains the blogIndex Google sheet by scanning the relevant drive folder for documents, it then creates appropriate meta-data for the blogIndex, including extracting an image from the blog and making a copy in an index folder. This process is automated using Google Apps Scripts automation feature (trigger) to run the blogIndexer once a day between 12am and 1am.

Trigger function, set for timed activation.

FormIndexFetch

The formIndexFetch is a relatively simple function compared to the previous function as it only requires a single sheet. Because the blogIndex is the only sheet that is required, this Apps Script is specifically tied to the blogIndex sheet. The advantage of this process is that there is no need to get sheet ids.

function doGet() {
  // get values from active sheet
  const sheet = SpreadsheetApp.getActiveSheet();
  const data = sheet.getDataRange().getValues();
  const keys = [];
  const blogs = [];
  for (key in data[0]) {
    keys.push(data[0][key]);
  }
  for (let i = 1; i < data.length; i++) {
    blogArray = {};
    for (j in data[i]) {
      const key = keys[j];
      const blogValue = data[i][j];
      blogArray[key] = blogValue;
    }
    blogs.push(blogArray);
  }
  const returnValue = JSON.stringify(blogs);
  return ContentService.createTextOutput(returnValue).setMimeType(ContentService.MimeType.JSON)
}

The function retrieves all data from the active sheet using the conjoined functions getDataRange().getValues(), which returns a 2D array of values from the entire sheet.

Following this process, keys are generated from the first row of the spreadsheet. While this isn’t 100% necessary as positions are static and could be input here, it does make it easier to follow the code.

A simple for loop is then used to generate JSON data, which is formatted to be identical in layout to the previous blogFetcher code. Thus no changes to the front end is required bar the change of address of the fetch request.

Conclusion

Indexing is a commonly used method for speeding up processes. In this project an index was created that stored all the information required for the blog landing page. This index is populated by a blogIndexer that runs every night, and is accessed by a simple fetch.

The resulting fetch is around 1.7 seconds which provides a page load substantially faster than previous fetches. And the system is automated thanks to Google apps script so by dropping a doc into the blog folder it will automatically update the indexer. I can run the indexer manually from the apps script page if I want an instant update.

Future work

Individual blog loads are still relatively slow due to the parsing process of a docs article being slow, the potential for doing something similar with the indexer to create index files for individual blog posts has potential to speed up the website for two reasons.

First the individual fetch can be potentially sped up, reducing load times, but additionally permanent image link can be used to generate static pages which has potential to greatly speed up page load.

But that’s for a future blog…

About

Keith is a full stack developer with a PhD in Tissue Engineering and experience in startup generation and technology transfer. Analytical thinking and a love of problem solving, he's happiest when he's halfway up a climbing project.