The Cht Mod Program Schedule is a sample application created to show how to do web crawler periodically. The project is based on the Express framework and Bootstrap to build a simple app that is deployed to AWS Elastic Beanstalk. And its original project is from AWS Sample. Also I would recommend to follow the Getting Started with Node.js on Elastic Beanstalk to build up the system.
- express
- cheerio
- node-cron
Execute this command to install the project:
npm install
Execute this command to run the project:
node app
Live Demo on AWS Elastic Beanstalk Sorry, the live demo is no longer available :( (updates on 2018/12/02)
We use the request module to make http calls.
request(url, (err, res, body) => {
//
//process here
//
});
Put the result of web crawler into cheerio
const $ = cheerio.load(body)
And finally we analysis and break down the DOM to fetch the data. The program information is wrapped in class wrapper. So the first level is calss rowat. We also need to fetch the class rowat_gray which represents the information in highlighted grey row as well.
$('.wrapper .rowat, .rowat_gray').each(function(i, elem) {
tvshows.push(
$(this).text().split('\n')
)
})
update the latest program every hour
cron.schedule('0 0 */1 * * *', function(){
// ↑execute on every hour (*/1 -> (0~24 hour by every one hour) 0 minute 0 second
});
I'm going to revise this project into a Line bot which provides the program query service. The user can enter keywords and query the most recently programs which contains the keywords.
- Line bot
- Carousel
- Buttons
- TODO: add entertainment and news