Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Consider adding support to load data from RIT/Maya/etc. data #64

Open
duetosymmetry opened this issue Jul 15, 2022 · 6 comments

Comments

@duetosymmetry
Copy link
Member

duetosymmetry commented Jul 15, 2022

For example, if somebody calls

sxs.load('RIT:BBH:0123/rPsi4')

then based on the string before the colon, look in the RIT catalog instead of the SXS catalog. Is this an appropriate interface? Should this be within the sxs package, even though it is others' data? Or should it instead be in a different package? Ping @moble request for comment.

@moble
Copy link
Member

moble commented Jul 15, 2022

I think this would be entirely appropriate. Since sxs already has so much code written around capabilities that would probably be common to all groups, I personally would find it exhausting to suggest a different interface. But it would require a little work.

First, we would need to get their catalog data into a compatible format — at least some subset of the metadata included in our catalog.json — so that sxs would know where to find their data. (There may also be some other assumptions built into sxs about what sort of metadata is contained in the catalog entries, which could require rewriting bits of sxs, but I wouldn't think there would be too much.)

Second, we would need load (and optionally save) functions written to handle their formats, so they would result in WaveformModes objects. I think this for RIT and something in here for GT should just about work — though they'd probably need tweaks for their respective flavors of those formats.

Third, we would need logic in sxs.load that could figure out which load function to call for a given file. At least to start with, sxs.load could probably just parse the strings to try to figure out the format.

So, not too much work, but not trivial.

@duetosymmetry
Copy link
Member Author

Pinging @akhadse who has expressed interest in working on this.

Ideally the caching mechanism should work for others' waveforms, not just for SXS waveforms. I have not looked in detail at how the caching works, so I don't know how difficult that is!

Since others do not provide catalog.json files, I suggested the following idea to Akshay. The BeautifulSoup code Mike wrote for parsing the RIT catalog would provide a good current snapshot of the web page. I don't think we should attempt to re-scrape the page every time somebody tries to load an RIT waveform. Instead, we include a static rit_catalog.json file within this package itself, along with the code for generating it from scraping. Hopefully this encourages them to provide a json catalog in the future, and then we can remove the scraping code and static catalog file.

Re: logic to figure out which load function to call: Given the current sxs_loader and sxs_handler functions, do you imagine adding more cases to them, or adding additional rit_loader and rit_handler etc. functions, and having logic higher up to decide if it should call the RIT or SXS functions?

@akhadse
Copy link

akhadse commented Jul 21, 2022

Thank you. I am working on the initial steps and will keep you updated.
Re. logic for load function: I think having the additional rit_loader and rit_handler functions might be easier but I will know better when I get there.

@duetosymmetry
Copy link
Member Author

I'd like to point out that @prayush, @md-arif-shaikh, and @adivijaykumar have done the legwork and written prayush/nr-catalog-tools. @moble any opinions on if you'd still like this included in sxs?

@prayush
Copy link

prayush commented Apr 24, 2023

Yeah, the context is that I had some old code to integrate RIT / GT data into gw data analysis frameworks since the time we were working on this analysis of gw150914, but it had gotten quite outdated. So, when we recently wanted to look at those catalogs again, @md-arif-shaikh and @adivijaykumar wrote a fresh set of utilities to work with all catalogs. Then we thought it would be good idea to write/maintain a unified interface for all 3 catalogs, and be done once and for all!

It wasn't immediately obvious to me if the sxs package or the 'sxs-collaboration' org would be the most intuitive place to put this code though, since it would be for non-sxs data specifically (and I hadn't read this issue). As I was familiar with the careful work already put into sxs, I thought the next best thing would be to have a separate wrapper package that is designed to inherit maximally from the sxs's classes for handling catalogs and waveform modes, but with the capability to uniformly handle other catalogs, as well as have some useful extras to allow for easy downstream interfacing with data analysis toolkits like pycbc , lalsuite and bilby.

There is a basic POC here now. It, for eg., calls into sxs for all mode/polarizations operations. For catalog handler classes, inheriting from sxs.Catalog helps a lot with the GT catalog, the RIT catalog needs to be scraped from their web directory and cached. We can port nr-catalog-tools under the sxs-collaboration org anytime (I was thinking of doing it myself), but why do we think the "sxs" package would be better for this code as opposed to it being a closely related thin wrapper with identical interface to sxs (say, sxs-collaboration/catalogs), containing only the non-"sxs" pieces essentially. I guess my issue is mostly that a package with name "sxs" implies tools-to-work-with-sxs-data to users, but I'm open to change. @moble and others, what do you think?

@moble
Copy link
Member

moble commented Apr 24, 2023

That looks great, Prayush! I'm happy to see any improvement in the situation — whether it lives in sxs or elsewhere.

IMO, this sort of capability would fit very nicely into sxs, since we don't just produce data, but also tools to work with data. I don't see any reason that shouldn't involve other people's data.

Obviously, as you guys are the ones who've done all the work, I'll defer to your judgment about where it would best fit, but I'd certainly be happy to help it fit into this package. For example, it would be easy to add methods like get_mode, get_td_waveform, to_pycbc, etc., directly to WaveformModes. Even if you don't want to fully incorporate your efforts into sxs, this would probably make things easier on everyone — and avoid standards proliferation. :)

(On a related note, I've long had a stub for a WaveformSignal class that I meant to be what you call the td_waveform. That's the sort of thing that could certainly be expanded.)

@duetosymmetry duetosymmetry changed the title Enhancement: Consider adding support to load data from RIT/GaTech/etc. data Enhancement: Consider adding support to load data from RIT/Maya/etc. data Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants