Skip to content

Commit

Permalink
Fix Spark params and assets
Browse files Browse the repository at this point in the history
  • Loading branch information
brnaguiar committed Sep 19, 2023
1 parent 6026d26 commit cd925a9
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 10 deletions.
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ dependencies: test_environment
$(PYTHON_INTERPRETER) -m pip install -r requirements.minimal
sudo curl -o ./assets/hadoop-aws-3.3.4.jar https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar
sudo curl -o ./assets/aws-java-sdk-bundle-1.12.506.jar https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.506/aws-java-sdk-bundle-1.12.506.jar
sudo curl -o ./assets/aws-java-sdk-core-1.12.506.jar https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.12.506/aws-java-sdk-core-1.12.506.jar
sudo curl -o ./assets/hadoop-common-3.3.4.jar https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.3.4/hadoop-common-3.3.4.jar
sudo curl -o ./assets/postgresql-42.6.0.jar https://jdbc.postgresql.org/download/postgresql-42.6.0.jar
#cp ./assets/hadoop-aws-3.3.4.jar ~/$(CONDA_FOLDER_NAME)/envs/next-watch/lib/python3.10/site-packages/pyspark/jars/
Expand Down Expand Up @@ -56,6 +57,7 @@ run:

## Populate Databse with Users from production datasets
users:
docker compose exec dev-spark bash -c "python3.9 src/main.py -p 'de'"
docker compose exec dev-spark bash -c "python3.9 src/scripts/populate_db_with_users.py"

## Run DE pipelines
Expand Down
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,31 +28,45 @@ git clone https://github.com/brnaguiar/mlops-next-watch.git
make env
```

3. Install requirements / dependencies and assets
3. Activate conda env
```sh
source activate nwenv
```

4. Install requirements / dependencies and assets
```sh
make dependencies
```

4. Pull the datasets
5. Pull the datasets
```sh
make datasets
```

5. Configure containers and secrets
6. Configure containers and secrets
```sh
make init
```

6. Run Docker Compose
7. Run Docker Compose
```sh
make run
```

7. Populate production Database with users
8. Populate production Database with users
```sh
make users
```

## Useful Service Endpoints
```
- Jupyter `http://localhost:8888`
- Minio `http://localhost:9001`
- MLFlow `http://localhost:5000`
- FastAPI `http://localhost:8086/`
- Streamlit UI `http://localhost:8501`
- Grafana Dashboard `http://localhost:3000`
```
## Architecture
<img src="./images/project_diagram.jpg">
<!-- #4. Create a `.env` file (`.env` sample below)#
Expand Down
10 changes: 5 additions & 5 deletions src/conf/params.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ class Airflow:
postgres_conn_config: dict = {
"conn_id": "postgres_connection",
"conn_type": "postgres",
"host": f"{os.getenv('postgres_ip')}",
"login": f"{os.getenv('pguser')}",
"password": f"{os.getenv('pgpassword')}",
"port": f"{os.getenv('postgres_port')}",
"schema": f"{os.getenv('postgres_app_database')}",
"host": f"{os.getenv('POSTGRES_IP')}",
"login": f"{os.getenv('PGUSER')}",
"password": f"{os.getenv('PGPASSWORD')}",
"port": f"{os.getenv('POSTGRES_PORT')}",
"schema": f"{os.getenv('POSTGRES_APP_DATABASE')}",
}
aws_conn_config: dict = {
"conn_id": "aws_connection",
Expand Down

0 comments on commit cd925a9

Please sign in to comment.