Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.

Installation

Zeyuan Shang edited this page Nov 16, 2015 · 19 revisions

db-webcrawler

Database Application Web Crawler. All command should be run in the project root directory, only support linux

Install Requirements

Software

Please install the following softwares

  • virtualbox
  • python
  • vagrant
  • unzip
  • pip
Python Packages
  • Use pip to install the required python packages:
sudo pip install -r requirements.txt)         

Environments

  • Set temperary pip user path:
export PYTHONUSERBASE="/tmp/pip"
  • Start virtualbox:
vagrant up

If the host machine is using proxy, please edit the script bootstrap.sh and set http_proxy to the proxy that is to be used.

Setup Database

  • Create an database with utf8 character set. (especially these columns need utf8: character set. repository.description, repository.homepage, attempt.log)

  • Rename cmudbal/settings_example.py to cmudbal/settings.py, and set DATABASE configurations of this file according to your database.

  • Migrate database and load initial data:

python manage.py migrate
python manage.py loaddata library/fixtures/*.json

Crawler and Deployer

Repository Crawler

start repository crawler (python run_repo_crawler.py)

Repository Deployer

start repository deployer (python run_repo_deployer.py)
try deploy a repository(python run_repo_deployer.py <repository_name>)
start repository deployer for a type (python run_repo_deployer.py )

deploy a repository using previous attempt id (python deploy.py <attempt_id>)

Package Crawler

start package_crawler (python run_package_crawler.py)

Package Deployer

start package_deployer (python run_package_deployer.py)

Website

python manage.py runserver 0.0.0.0:8001

Warning

the attempts_count field in repository table can only increase now, don't delete from entries attempt table, otherwise the attempt_count field in repository table is not correct.

Todo

Fill in the place holder for block_ports and unblock_ports functons in utils.py for security. virtual machine ports in use:
port 22: for ssh
port 3000: for running ruby on rails apps
port 8000: for running django apps

The log system needs improvement

Note:

  1. Add SQL General Log(my.cnf, chown sql.log file)
Clone this wiki locally