Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes to xgboost example #2832

Merged
merged 3 commits into from
Aug 23, 2024
Merged

Conversation

hwpang
Copy link
Contributor

@hwpang hwpang commented Aug 23, 2024

Description

Two minor fixes:

  • Change the DATASET_ROOT environment variable in the notebook to $HOME/dataset. This is to be consistent with the path used in other scripts in this example, such as NVFlare/examples/advanced/xgboost/prepare_data.sh and NVFlare/examples/advanced/xgboost/histogram-based/run_experiment_centralized.sh.
  • The training_mode argument should be training_algo:
    parser.add_argument(
        "--training_algo", type=str, default="bagging", choices=list(ALGO_DIR_MAP.keys()), help="Training algorithm"
    )

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@chesterxgchen
Copy link
Collaborator

@hwpang question: why prefer to use HOME directory to /data directory ? HOME directory is usually in the source code directory which something mixing the data and code together. /data at least refer the data in a different location, more close to the real case.

@hwpang
Copy link
Contributor Author

hwpang commented Aug 23, 2024

@hwpang question: why prefer to use HOME directory to /data directory ? HOME directory is usually in the source code directory which something mixing the data and code together. /data at least refer the data in a different location, more close to the real case.

Thanks for the question. I was trying to make the path to the dataset consistent throughout this example. In the other scripts used in this example, the $HOME/dataset is assumed to be the directory of the dataset. For example, in NVFlare/examples/advanced/xgboost/prepare_data.sh, it looks for the dataset in $HOME/dataset:

DATASET_PATH="$HOME/dataset/HIGGS.csv"
OUTPUT_PATH="/tmp/nvflare/xgboost_higgs_dataset"
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

if [ ! -f "${DATASET_PATH}" ]
then
    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
fi

I don't have a strong preference of $HOME/dataset vs. /data, as long as the path is consistent with other scripts used in this example.

@ZiyueXu77
Copy link
Collaborator

@hwpang question: why prefer to use HOME directory to /data directory ? HOME directory is usually in the source code directory which something mixing the data and code together. /data at least refer the data in a different location, more close to the real case.

Thanks for the question. I was trying to make the path to the dataset consistent throughout this example. In the other scripts used in this example, the $HOME/dataset is assumed to be the directory of the dataset. For example, in NVFlare/examples/advanced/xgboost/prepare_data.sh, it looks for the dataset in $HOME/dataset:

DATASET_PATH="$HOME/dataset/HIGGS.csv"
OUTPUT_PATH="/tmp/nvflare/xgboost_higgs_dataset"
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

if [ ! -f "${DATASET_PATH}" ]
then
    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
fi

I don't have a strong preference of $HOME/dataset vs. /data, as long as the path is consistent with other scripts used in this example.

Thanks for noticing this! @chesterxgchen at the time we wanted to avoid creating new folders within examples, and avoid putting HIGGS to "/tmp" since it is big, so arbitrarily chose '$HOME'

@YuanTingHsieh
Copy link
Collaborator

I think consistency throughout notebook and the example folder is good, so this PR LGTM

@ZiyueXu77
Copy link
Collaborator

/build

Copy link
Collaborator

@ZiyueXu77 ZiyueXu77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZiyueXu77 ZiyueXu77 enabled auto-merge (squash) August 23, 2024 17:21
@ZiyueXu77 ZiyueXu77 merged commit 141c8bd into NVIDIA:main Aug 23, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants