Creating Semi-Static Website with Contents from AWS S3

For the past three weeks, I have developed and refined a web scrapping script that fetches contents from a certain blog and archives them, including the images into my AWS S3 bucket. I’m able to serve these contents using a simple Angular website. Using AWS S3 for these kind of contents proves to be very useful.

Scrapping

For the scrapping script, it is just a simple ruby script, with a couple of files on it. I have a JSON file containing the list of URLs to crawl. These URLs are organized the way blog posts are organized, that is, by year and by month. I’m sure my script will break if the URL structure is different.

So, for each page, my script will fetch the target contents, via a CSS selector, since I don’t want to get the sidebar and the comments. Images are being copied and all image instances will now point to my local copy. This also includes thumbnail to original image by fetching the parent link’s image so I’m able to serve the full size image too locally.

All these scrapped pages are then organized into a JSON file like a table or contents. These JSON file, html pages and images are then uploaded into a specific AWS S3 bucket which is served publicly ready to serve anytime.

Tools and libraries used:

  • Mechanize and HTTParty ruby gems
  • s3cmd tool for uploading to S3
  • A couple of ruby scripts to complete the package

Updating contents is done simply by re-running the script and re-uploading the contents/assets.

Serving Content

For the website to serve the contents, I’m thinking of using AWS S3 bucket too, but since I want to use Angular for this, I decided to just create a simple Angular Universal website that serves contents from AWS S3. The Angular app is very simple.

  • Have a route/page to show the table of contents
  • Table of contents is taken straight from the JSON file in my S3 bucket
  • Have a route/page to show the article content
  • Since the URL is organized like a blog, it is easy to match the URL with the URL in my S3 bucket
  • Just fetch the HTML file then load it into the content page
  • Caching can be a problem though

With this setup, if I’m going to add a new blog source, I just repeat the process and probably add 2 more routes.

Can’t share the website as it is for personal use only.

Posted in Angular, Ruby | Tagged , , | Leave a comment

My Nahaz Stats Don’t Lie as a Trader – Sixth Month

So, after a paradigm shift of my trading strategy (from no system to having a very limited set of setups kind of system, from BOSO to BOBO), I’m very eager to track my progress. However, I haven’t really track my … Continue reading

Posted in Investments, Stock Trading | Leave a comment

Stock Trading for almost 6 months, what has changed?

So, it has been almost 6 months since I started stock trading. I have witnessed several disasters in the stock market like the bear market, the ghost month, the inflation rate, the US-China trade wars, the US market’s Dow Jones … Continue reading

Posted in Investments, Stock Trading | Tagged , | Leave a comment

Introducing Price Stalker – PSE stock price alerting

So I have this pet project that allows you to track stock price movements from PSE. Price Stalker’s primary focus is to deliver notifications/alerts about stock price movement of your favorite stocks at the Philippine Stocks Exchange through email or … Continue reading

Posted in Investments | Leave a comment

Finally found a task scheduler/cron for Rails

I have been working with a cron-like task scheduler in the past and among the very few options, we choose crono. It works because it allows us to run tasks at 10-second interval and as well as the regular daily/hourly/weekly … Continue reading

Posted in Cron, Rails, Ruby, Web Development | Tagged , , , | Leave a comment

Docker Compose down does not remove volumes

I was trying to fix a deployment issue on one of my Docker setup where source code is being pulled periodically from master branch inside a volume, however, I ended up having the same outdated branch over and over again. … Continue reading

Posted in Web Development | Tagged , | Leave a comment

Angular 6 – Cannot resolve crypto, fs, net, path, stream when building Angular

My current setup is Angular 6.1.0 with Angular Universal too but the project is based on an old (dead) project which was built on Angular 5. Upon building the server side assets, the builder throws a lot of error saying … Continue reading

Posted in Angular | Tagged , | 1 Comment

Angular 6.x – 404 page with correct header using Angular Universal

Based on my previous post about adding a 404 page in Angular 4.x, I have added a tweak to return the correct 404 status code header. This process, however, requires the Angular Universal integration. This post assumes that you already … Continue reading

Posted in Angular, Web Development | Tagged , , | 1 Comment

Angular 6 – Add scroll to top when route changes

Since Angular apps are SPAs, the page does not reload when navigating through the application/website. If you happen to scroll to the bottom of the page and clicked a link, the next page will show the content near the bottom … Continue reading

Posted in Angular, Web Development | Tagged , | 1 Comment

Setting Title Tag for Angular Applications

Due to the nature of Angular apps being SPAs (single page application), historically, changing title tag or meta tags are not supported by default. However, due to SEO reasons, these features were added and works best when used in Angular … Continue reading

Posted in Angular, SEO, Web Development | Tagged | Leave a comment