Our client was looking for a solution that would let them collect large amounts of publicly available data about the energy sector in the US. They then wanted that data to be run through an R model resulting in predictive analysis. The results generated than had to be fed into an API.
There were two major components to this project, scraping the data through NodeJS scrapers and then building a the R model on top of that data to create the predictive analysis
Crafting the Ideal Solution
The backend involved a full on queuing system which was based on the Kue library. The backend was written in ES2015 NodeJS and used MongoDB to persist the data. The scrapper was configured differently for every source of information, some required scrapping only once an hour while others required scrapping every 5 minutes.