Tried evaluate the model on a local network only machine #57

zwsjink · 2023-09-12T15:22:08Z

Well, I first use

python download_evalsets.py $download_dir

to download all the necessary datasets on an internet-accessible machine and then migrate the data to my machine with limited internet access.
All the other evaluation went well but the retrieval datasets , which use hf_cache/ directory instead.

The error goes like this :

>>> datasets.load_dataset("nlphuji/flickr_1k_test_image_text_retrieval",split="test", cache_dir=os.path.join("/mnt/data/datacom2023/evaluate_datasets", "hf_cache"))Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
  File "/root/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/load.py", line 1815, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/root/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/load.py", line 1512, in dataset_module_factory
    raise e1 from None
  File "/root/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/load.py", line 1468, in dataset_module_factory
    raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
ConnectionError: Couldn't reach 'nlphuji/flickr_1k_test_image_text_retrieval' on the Hub (ConnectTimeout)

Seems like the huggingface datasets module is still trying to connect to the internet. Is there any trick I can play to skip the connection to huggingface? The evaluation command :

python evaluate.py --train_output_dir /mnt/data/datacomp2023/train_output/basic_train --data_dir /mnt/data/datacomp2023/evaluate_datasets

The text was updated successfully, but these errors were encountered:

gabrielilharco · 2023-09-28T13:19:19Z

@djghosh13

djghosh13 · 2023-09-28T20:56:48Z

Hi, thanks for bringing this up! I assumed that the HF datasets would work properly without Internet connection because the download_evalsets.py script loads them once to put them in the cache already. I'll look into potential solutions to this issue

djghosh13 · 2023-09-28T21:07:01Z

Can you try setting the environment variable HF_DATASETS_OFFLINE to 1? (from https://huggingface.co/docs/datasets/v2.14.5/en/loading#offline)
It seems like even if the dataset is cached, HF will by default check the online version. So hopefully this should fix things.

If that doesn't work, could you check to make sure the files are indeed in the hf_cache folder?

zwsjink · 2023-10-11T06:24:39Z

Sorry to get back to you late, but I was able to bypass this issue by modifying the datacomp source code as follows:

diff --git a/eval_utils/retr_eval.py b/eval_utils/retr_eval.py
index 3c19917..647edf7 100644
--- a/eval_utils/retr_eval.py
+++ b/eval_utils/retr_eval.py
@@ -37,7 +37,7 @@ def evaluate_retrieval_dataset(
 
     dataset = RetrievalDataset(
         datasets.load_dataset(
-            f"nlphuji/{task.replace('retrieval/', '')}",
+            f"/mnt/data/datacomp2023/evaluate_datasets/{task.replace('retrieval/', '')}.py",
             split="test",
             cache_dir=os.path.join(data_root, "hf_cache")
             if data_root is not None

which force the hf to use my local dataset repository instead of checking any online updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tried evaluate the model on a local network only machine #57

Tried evaluate the model on a local network only machine #57

zwsjink commented Sep 12, 2023 •

edited

Loading

gabrielilharco commented Sep 28, 2023

djghosh13 commented Sep 28, 2023

djghosh13 commented Sep 28, 2023

zwsjink commented Oct 11, 2023 •

edited

Loading

Tried evaluate the model on a local network only machine #57

Tried evaluate the model on a local network only machine #57

Comments

zwsjink commented Sep 12, 2023 • edited Loading

gabrielilharco commented Sep 28, 2023

djghosh13 commented Sep 28, 2023

djghosh13 commented Sep 28, 2023

zwsjink commented Oct 11, 2023 • edited Loading

zwsjink commented Sep 12, 2023 •

edited

Loading

zwsjink commented Oct 11, 2023 •

edited

Loading