The press is bad

I began making a small utility that should meet the following requirements:

  • periodically, maybe twice a day, it should visit a couple of the most popular Serbian news sites and just take a snapshot of the front page, and maybe of the politics section
  • the generated images (in PNG format) should be uploaded to Clodinary and thus receive a unique url
  • a simple MongoDB collection should be used to gather the basic data: the name of the site, the time the screenshot was taken and the url of the Cloudinary image
  • a very bare-bones express app should query the MongoDB collection and retrieve the links, serving them on a free Heroku dynamo
  • everything should be covered by test (mocha, chai etc)

But, why?

It is quite simple, actually. I wanted to make a simple visual collection representing the horror of the serbian daily newspapers and tabloids and, on the other hand, I want to get my feet wet with the node ecosystem.

I started with a horrid oldschool setup where I would actually save the screenshot to a png file on the server, then upload it and finally delete (manually) the image. After a bit of searching the cloudinary docs and googling, I found this interesting setup. Ire Aderinokun basically uses the buffer to avoid writing and deleting images, so I modified my initial approach.

I am still getting used to this async/await world of promises, rejections and sad puns. JS is a different beast compared to python and, hard as it seems, I really feel like it is broadening my horizons. Hooking the system with a MongoDB instance was pretty easy with mongoose, although I have to check why my script (it’s still just a standalone script, no server!) hangs when it finishes. Anyway, the screenshot urls are being stacked in mongo, and building a simple express app around it shouldn’t be difficult. I plan to deploy it on Heroku, make a small css wrapper with Bulma and… that would be it for the time being. The list of pages/sites will remain hardcoded, although it would be trivial to add a list of links to the mongo instance.

Btw, I’m calling it TraShot - because, y’know, it takes shots at trash… Like, real trash. Example image

Part two

Ok, the timeout - Puppeteer waiting time has to be longer as per this site.