Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sdo ml v2 loader pt2 #21

Merged
merged 47 commits into from
Jun 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8d7a0ab
adds sdo ml v2 data loader
mariusgiger May 10, 2022
fcbfa1c
adds notebook to show how to use SDO ML v2 data loader, WIP temporal …
mariusgiger May 24, 2022
5bb1764
fixes
mariusgiger May 24, 2022
33d2bd4
preparations for training
mariusgiger May 24, 2022
40b423b
fixes SDOMLDatasetV2 not allowed
mariusgiger May 24, 2022
554187e
fixes invalid option
mariusgiger May 24, 2022
9004c8c
fixes invalid option
mariusgiger May 24, 2022
9246749
try disabling log graph
mariusgiger May 24, 2022
31e0ec3
readds transforms
mariusgiger May 24, 2022
54ccaf2
adds temporal splitting for train/validation sets
mariusgiger May 27, 2022
01d8aee
several imporvements and option to filter by date without temporal do…
mariusgiger May 27, 2022
2ae87ac
removes code used for debugging
mariusgiger May 27, 2022
5ff897a
adds chunk sampler to improve data loading efficiency
mariusgiger May 27, 2022
6907fed
Merge pull request #23 from i4Ds/feature/chunk-sampler
mariusgiger May 27, 2022
c7abfd7
prepares training
mariusgiger May 27, 2022
5845526
adds prefetch factor
mariusgiger May 27, 2022
b8ddf04
increases patience for early stopping
mariusgiger May 27, 2022
37ea73f
refactors config handling to be file-based
mariusgiger May 29, 2022
8e8bc1a
implements caching, adds run config for FHNW infrastructure
mariusgiger May 29, 2022
4b04c22
adjusts fhnw run config
mariusgiger May 29, 2022
3d576fc
expands home directory, fixes issue with lr
mariusgiger May 29, 2022
bd19570
enables profiling
mariusgiger May 29, 2022
520c434
use munchify to convert dict to object because of issues with seriali…
mariusgiger May 29, 2022
8c10a86
fix issues with device code, logs config
mariusgiger May 29, 2022
73f634d
log profile as file
mariusgiger May 29, 2022
9122c45
adds log
mariusgiger May 29, 2022
d0ba673
adds log
mariusgiger May 29, 2022
578a3af
adds log
mariusgiger May 29, 2022
fcfc3f4
adds log
mariusgiger May 29, 2022
2e85082
adds more params
mariusgiger May 29, 2022
762841a
adds dirpath
mariusgiger May 29, 2022
7256de1
removes stack trace again
mariusgiger May 29, 2022
1e0f612
adds pinned memory
mariusgiger May 29, 2022
911197d
adds utility for downloading goes fluxes
mariusgiger May 31, 2022
40a7fce
Merge pull request #22 from i4Ds/feature/temporal-split
mariusgiger May 31, 2022
f054eac
adds note about leap seconds
mariusgiger Jun 3, 2022
de1149b
converts GOES handling to dask for more efficiency
mariusgiger Jun 4, 2022
768b54a
allows to filter the SDO ML v2 dataset by irradiance
mariusgiger Jun 4, 2022
fafa42e
adds GOES plotting
mariusgiger Jun 7, 2022
ab3a0bc
adds illustrative example for working with GOES timeseries, adds filt…
mariusgiger Jun 11, 2022
0e3ccaf
Merge pull request #27 from i4Ds/feature/irradiance-filtering
mariusgiger Jun 11, 2022
ce20668
Merge pull request #26 from i4Ds/feature/goes-as-dask
mariusgiger Jun 11, 2022
e17e8b2
Merge pull request #24 from i4Ds/feature/goes-ts
mariusgiger Jun 11, 2022
81f03eb
Merge pull request #25 from i4Ds/feature/prepare-training
mariusgiger Jun 11, 2022
e2fcc3c
switches data dir
mariusgiger Jun 11, 2022
155a913
Merge pull request #28 from i4Ds/bugfix/data-dir
mariusgiger Jun 11, 2022
fa52573
Merge remote-tracking branch 'origin/main' into feature/sdo-ml-v2-loa…
mariusgiger Jun 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 58 additions & 4 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,15 @@
"version": "0.2.0",
"configurations": [
{
"name": "Python: SDO CLI",
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true
},
{
"name": "Python: SDO CLI get events",
"type": "python",
"request": "launch",
"stopOnEntry": false,
Expand All @@ -15,9 +23,55 @@
"args": [
"events",
"get",
"--start=2011-08-01T00:00:00",
"--end=2011-08-02T23:59:59",
"--event-type=AR"
"--start=2011-01-01T00:00:00",
"--end=2020-12-31T23:59:59",
"--event-type=FL"
]
},
{
"name": "Python: SDO CLI Predict",
"type": "python",
"request": "launch",
"stopOnEntry": false,
"justMyCode": false,
"program": "${workspaceRoot}/src/sdo/cli.py",
"console": "integratedTerminal",
"args": [
"sood",
"ce_vae",
"predict",
"--config-file=./config/ce-vae/run-1.yaml"
]
},
{
"name": "Python: SDO CLI GOES Download",
"type": "python",
"request": "launch",
"stopOnEntry": false,
"justMyCode": false,
"program": "${workspaceRoot}/src/sdo/cli.py",
"console": "integratedTerminal",
"args": [
"goes",
"download",
"--start=2010-01-01T00:00:00",
"--end=2020-12-31T23:59:59",
"--output=./tmp/new"
]
},
{
"name": "Python: SDO CLI GOES Get",
"type": "python",
"request": "launch",
"stopOnEntry": false,
"justMyCode": false,
"program": "${workspaceRoot}/src/sdo/cli.py",
"console": "integratedTerminal",
"args": [
"goes",
"get",
"--timestamp=2015-06-01T02:20:00",
"--cache-dir=./tmp"
]
}
]
Expand Down
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ sdo-cli data patch --path='./data/aia_171_2012_256' --targetpath='./data/aia_171
Loading Events from HEK:

```
pip install psycopg2-binary
docker-compose up
sdo-cli events get --start="2012-01-01T00:00:00" --end="2012-01-02T23:59:59" --event-type="AR"
```
Expand All @@ -86,6 +87,14 @@ make setup
make install
```

### Publishing

Add your pypi credentials to `~/.pypirc`, increase the version number in `setup.py` and run:

```
make publish
```

### Troubleshooting

Tensorflow only works with Python versions < 3.9.
Expand All @@ -103,3 +112,6 @@ Also refer to this [link](https://www.chrisjmendez.com/2017/08/03/installing-mul

- [1] Ahmadzadeh, Azim, Dustin J. Kempton, and Rafal A. Angryk. "A Curated Image Parameter Data Set from the Solar Dynamics Observatory Mission." The Astrophysical Journal Supplement Series 243.1 (2019): 18.
- [2] Zimmerer, David, et al. "Context-encoding variational autoencoder for unsupervised anomaly detection." arXiv preprint arXiv:1812.05941 (2018).


sdo-cli events get --start="2012-01-01T00:00:00" --end="2012-01-02T23:59:59" --event-type="FL"
95 changes: 95 additions & 0 deletions config/ce-vae/defaults.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
model:
target_size:
value: 256
desc: "Target size of the reconstructed output"
z_dim:
value: 128
desc: "Dimension of the latent space"
fmap_sizes:
value: [16, 64, 256, 1024]
desc: "Feature map sizes for the CNN"
ce_factor:
value: 0.5
desc: "Amount to which the context-encoder contributes to the model (between 0 only VAE and 1 only CE)"
load_path:
value: null
desc: "Path to a pretrained model"
data:
batch_size:
value: 16
desc: "How many samples per batch to load"
channel:
value: "171A"
desc: "Channel name that should be used. If None all available channels will be used."
data_dir:
value: ./data
desc: "Path to the root directory of the dataset"
dataset:
value: SDOMLDatasetV2
desc: "Which dataset to use (CuratedImageParameterDataset, SDOMLDatasetV1 or SDOMLDatasetV2)"
num_data_loader_workers:
value: 0
desc: "How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
prefetch_factor:
value: 8
desc: "Number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers samples prefetched across all workers."
pin_memory:
value: true
desc: "If true, the data loader will copy Tensors into CUDA pinned memory before returning them"
sdo_ml_v2:
storage_driver:
value: fs
desc: "Storage driver used to load the data. Either 'gcs' (Google Storage Bucket) or 'fs' (local file system)"
year:
value: 2010
desc: "Allows to prefilter the dataset by year. If None all available years will be used."
train_start_date:
value: "2010-08-30 00:00:00"
desc: "Allows to restrict the dataset temporally"
train_end_date:
value: "2010-08-30 23:59:59"
desc: "Allows to restrict the dataset temporally"
test_start_date:
value: "2010-08-29 00:00:00"
desc: "Allows to restrict the dataset temporally"
test_end_date:
value: "2010-08-29 23:59:59"
desc: "Allows to restrict the dataset temporally"
freq:
value: null
desc: "Allows to downsample the dataset temporally, should be bigger than the min interval for the observed channel. When using freq, start and end should also be specified for train and test"
train_val_split_ratio:
value: 0.7
desc: "Split-ratio for the train-validation split"
train_val_split_temporal_chunk_size:
value: 14d
desc: "Temporal chunk size for the train-validation splits."
predict:
mode:
value: pixel
desc: "Mode for anomaly scoring (pixel or sample)"
pred_dir:
value: "./output/predictions"
desc: "Output directory for predictions."
score_mode:
value: combi
desc: "Score mode used for anomaly scoring ('rec', 'grad' or 'combi')"
train:
n_epochs:
value: 10
desc: "Stop training once this number of epochs is reached."
lr:
value: 0.0001
desc: "Learning rate"
use_geco:
value: false
desc: "Whether to use Generalized ELBO with Constrained Optimization update step."
beta:
value: 0.01
desc: "Weighting factor for KL loss influence on loss."
print_every_iter:
value: 100
desc: ""
log_dir:
value: "./output/logs"
desc: "Output directory for log."
32 changes: 32 additions & 0 deletions config/ce-vae/run-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model:
load_path:
value: /Users/mariusgiger/Downloads/model-36tfoo4q.ckpt
data:
data_dir:
value: fdl-sdoml-v2/sdomlv2_small.zarr/
dataset:
value: SDOMLDatasetV2
num_data_loader_workers:
value: 0
prefetch_factor:
value: 2
batch_size:
value: 1
sdo_ml_v2:
storage_driver:
value: gcs
train_start_date:
value: "2010-08-30 00:00:00"
train_end_date:
value: "2010-08-30 23:59:59"
test_start_date:
value: "2010-08-29 00:00:00"
test_end_date:
value: "2010-08-29 23:59:59"
train_val_split_temporal_chunk_size:
value: 1h
predict:
mode:
value: sample
pred_dir:
value: "./output/predictions"
34 changes: 34 additions & 0 deletions config/ce-vae/run-fhnw-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
model:
load_path:
value: null
data:
data_dir:
value: /mnt/nas05/astrodata01/astroml_data/sdomlv2_small/sdomlv2_small.zarr
dataset:
value: SDOMLDatasetV2
num_data_loader_workers:
value: 16
prefetch_factor:
value: 8
batch_size:
value: 32
sdo_ml_v2:
train_start_date:
value: "2010-08-01 00:00:00"
train_end_date:
value: "2010-08-20 23:59:59"
test_start_date:
value: "2010-08-21 00:00:00"
test_end_date:
value: "2010-08-31 23:59:59"
train_val_split_ratio:
value: 0.8
train_val_split_temporal_chunk_size:
value: 3d
predict:
mode:
value: sample
pred_dir:
value: "./output/predictions"
log_dir:
value: "./output/train-sdo-ml"
Loading