KeyError when training streetsurf on seg100613 #46

amoghskanda · 2024-03-13T11:09:01Z

Firstly, great work and thanks for making it open-source. I setup everything following the readme for both streetsurf and nr3d. I wanted to use the withmask_nolidar.240219.yaml config file, made the path and sequence change to use seg100613(quick downloaded from streetsurf repo). The scenario.pt file is incomplete as waymo_dataset.py is accessing frame_timestamps(line 406) which is not a valid key in the scenario dictionary. There's another key error - line506 waymo_dataset.py, no global_timestamps key in the scenario['observers']['ego_car']['data'] dictionary. Can you share the complete scenario.pt file? or the zip file to segment-13476374534576730229_240_000_260_000_with_camera_labels sequence?

The text was updated successfully, but these errors were encountered:

zzzxxxttt · 2024-03-16T04:31:14Z

I encountered the same issue, the problem was solved after checking out the latest commit (faba099) and re-generate data.

zzzxxxttt · 2024-03-16T06:43:22Z

By the way if anyone encountered TypeError: __init__() takes 1 positional argument but 2 were given, just replace @torch.no_grad with with torch.no_grad(): in nr3d_lib/models/fields/nerf/lotd_nerf.py:

    # @torch.no_grad
    def query_density(self, x: torch.Tensor):
        with torch.no_grad():
            # NOTE: x must be in range [-1,1]
            ...

amoghskanda · 2024-03-18T04:42:14Z

@zzzxxxttt thank you for the reply. The key error persists. The problem is with the scenario.pt file as scenario['metas'] has no key under the name 'frame_timestamps'. Can you upload your scenario.pt file? This is for seg100613

zzzxxxttt · 2024-03-20T13:42:56Z

@amoghskanda sure, here it is
scenario.zip

amoghskanda · 2024-03-21T04:49:49Z

Thank you for the scenario.pt file.
@zzzxxxttt did you face the below error?

init() got an unexpected keyword argument 'fn_type'
Line 183, train.py, MonoDepthLoss takes different parameters which are missing in the init of the class, defined in app/loss/mono.py class MonoDepthLoss

amoghskanda · 2024-03-21T06:09:23Z

I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained?
@ventusff @zzzxxxttt

zzzxxxttt · 2024-03-21T06:35:22Z

Thank you for the scenario.pt file. @zzzxxxttt did you face the below error?

init() got an unexpected keyword argument 'fn_type' Line 183, train.py, MonoDepthLoss takes different parameters which are missing in the init of the class, defined in app/loss/mono.py class MonoDepthLoss

No, I didn't met this error

zzzxxxttt · 2024-03-21T06:37:25Z

I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxttt

I also use withmask_nolidar.240219.yaml and only modified the data location, I can train it on my 12G memory RTX3060 without error.

amoghskanda · 2024-03-21T06:40:14Z

so your data is loaded onto cache right? you did not make any changes when it comes to which device data and model are getting loaded onto? I have rtx3090 and data is loaded onto cpu, I run into
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
preload_on_gpu is false in withmask.yaml(by default)
I did not make any changes as to which device

sonnefred · 2024-03-21T07:32:03Z

I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxttt

I also use withmask_nolidar.240219.yaml and only modified the data location, I can train it on my 12G memory RTX3060 without error.

Hi, I also try to use withmask_nolidar.240219.yaml, but got an error when loading the images to make ImagePatchDataset. Have you met this error and how did you solve it? Thanks!

amoghskanda · 2024-03-21T08:07:12Z

yes, I removed **kwargs as an argument when calling get_frame_weights_uniform(), Line 66 dataloader/sampler.py because that function, defined later, takes only 2 arguments.

frame_weights = get_frame_weights_uniform(scene_loader, scene_weights)

sonnefred · 2024-03-21T08:44:24Z

yes, I removed **kwargs as an argument when calling get_frame_weights_uniform(), Line 66 dataloader/sampler.py because that function, defined later, takes only 2 arguments.

frame_weights = get_frame_weights_uniform(scene_loader, scene_weights)

Thank you for the reply, and I met a new error like this. Have you met this before?

amoghskanda · 2024-03-21T08:47:47Z

yes. I tried caching on gpu instead of cpu and changed the value of n_frames in the configs file from 163 to 30, for seg-10061, encountered the above error. When I reverted it to default settings(cache on cpu and 163), ran into #51

amoghskanda · 2024-03-21T09:01:52Z

I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxttt

I also use withmask_nolidar.240219.yaml and only modified the data location, I can train it on my 12G memory RTX3060 without error.

cache is on the cpu right. The tensors frame_ind,h,w are on cpu as well. _ret_image_raw is on cpu as well. Not sure why I'm facing #51

sonnefred · 2024-03-21T09:02:28Z

yes. I tried caching on gpu instead of cpu and changed the value of n_frames in the configs file from 163 to 30, for seg-10061, encountered the above error. When I reverted it to default settings(cache on cpu and 163), ran into #51

Ok, have you solved the problem?

amoghskanda · 2024-03-21T09:05:28Z

not yet, on it. Try training without changing the size of n_frames from the config file. Lmk if you run into the same issue as me

sonnefred · 2024-03-21T11:23:19Z

not yet, on it. Try training without changing the size of n_frames from the config file. Lmk if you run into the same issue as me

Sorry, I'm trying to run code_multi, but got the error like this, have you met this before?

amoghskanda · 2024-03-22T06:27:37Z

@sonnefred , I used another config(with mask with lidar) and was able to train and render as well

amoghskanda · 2024-03-22T06:28:18Z

@zzzxxxttt did you try rendering nvs with different nvs paths like spherical_spiral or small_circle?

sonnefred · 2024-03-22T10:40:56Z

@sonnefred , I used another config(with mask with lidar) and was able to train and render as well

ok, thank you, but I'd like to use monodepth supervision, still working on it ...

sonnefred · 2024-03-25T02:37:01Z

I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxttt

I also use withmask_nolidar.240219.yaml and only modified the data location, I can train it on my 12G memory RTX3060 without error.

@zzzxxxttt Hi, how do you run this exp successfully? I still met a CUDA error when using this ymal ... Could you give any help? Thanks.

lhp121 · 2024-06-11T11:19:47Z

2024-06-11 19:16:01,146-rk0-train.py#959:=> Start loading data, for experiment: logs/streetsurf/seg100613.nomask_withlidar_exp1
2024-06-11 19:16:01,146-rk0-base.py#88:=> Caching data to device=cpu...
2024-06-11 19:16:01,146-rk0-base.py#95:=> Caching camera data...
Caching cameras...: 0%| | 0/3 [00:00<?, ?it/s]
Process finished with exit code 137 (interrupted by signal 9:SIGKILL)

Has anyone encountered this error before, and how can I adjust the parameters to make it run on my GTX 1660 Ti graphics card?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when training streetsurf on seg100613 #46

KeyError when training streetsurf on seg100613 #46

amoghskanda commented Mar 13, 2024

zzzxxxttt commented Mar 16, 2024 •

edited

Loading

zzzxxxttt commented Mar 16, 2024 •

edited

Loading

amoghskanda commented Mar 18, 2024

zzzxxxttt commented Mar 20, 2024 •

edited

Loading

amoghskanda commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 •

edited

Loading

zzzxxxttt commented Mar 21, 2024

zzzxxxttt commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 •

edited

Loading

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 •

edited

Loading

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 22, 2024

amoghskanda commented Mar 22, 2024

sonnefred commented Mar 22, 2024

sonnefred commented Mar 25, 2024

lhp121 commented Jun 11, 2024

KeyError when training streetsurf on seg100613 #46

KeyError when training streetsurf on seg100613 #46

Comments

amoghskanda commented Mar 13, 2024

zzzxxxttt commented Mar 16, 2024 • edited Loading

zzzxxxttt commented Mar 16, 2024 • edited Loading

amoghskanda commented Mar 18, 2024

zzzxxxttt commented Mar 20, 2024 • edited Loading

amoghskanda commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 • edited Loading

zzzxxxttt commented Mar 21, 2024

zzzxxxttt commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 • edited Loading

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024 • edited Loading

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 21, 2024

sonnefred commented Mar 21, 2024

amoghskanda commented Mar 22, 2024

amoghskanda commented Mar 22, 2024

sonnefred commented Mar 22, 2024

sonnefred commented Mar 25, 2024

lhp121 commented Jun 11, 2024

zzzxxxttt commented Mar 16, 2024 •

edited

Loading

zzzxxxttt commented Mar 16, 2024 •

edited

Loading

zzzxxxttt commented Mar 20, 2024 •

edited

Loading

amoghskanda commented Mar 21, 2024 •

edited

Loading

amoghskanda commented Mar 21, 2024 •

edited

Loading

amoghskanda commented Mar 21, 2024 •

edited

Loading