Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

XP-227 - Merge v0.3.0 changes into master #233

Merged
merged 37 commits into from
Jan 23, 2020
Merged

XP-227 - Merge v0.3.0 changes into master #233

merged 37 commits into from
Jan 23, 2020

Conversation

rsaffi
Copy link
Contributor

@rsaffi rsaffi commented Jan 23, 2020

References

XP-227 and XP-356

Summary

For keeping master consistent with what was released on v0.3.0, we're merging into master from the commit on development where the release was done from.

Are there any open tasks/blockers for the ticket?

No


Reviewer checklist

Reviewer agreement:

  • Reviewers assign themselves at the start of the review.
  • Reviewers do not commit or merge the merge request.
  • Reviewers have to check and mark items in the checklist.

Merge request checklist

  • Conforms to the merge request title naming XP-XXX <a description in imperative form>.
  • Each commit conforms to the naming convention XP-XXX <a description in imperative form>.
  • Linked the ticket in the merge request title or the references section.
  • Added an informative merge request summary.

Code checklist

  • Conforms to the branch naming XP-XXX-<a_small_stub>.
  • Passed scope checks.
  • Added or updated tests if needed.
  • Added or updated code documentation if needed.
  • Conforms to Google docstring style.
  • Conforms to XAIN structlog style.

atymoshchuk and others added 30 commits December 18, 2019 14:45
* XP-255 change setup author to XAIN AG & remove codeowners for now

* XP-255 black reformat
* XP-265 remove benchmark directory

* XP-265 remove datasets directory

* XP-265 remove references to either examples or benchmark directory

* XP-265 remove benchmark participant

* XP-265 remove benchmark config relations

* XP-265 remove benchmark coordinator, related types and helpers

* XP-265 review comments + changed type import

* XP-265 commenting out benchmarks docs

* XP-243 remove flags from xain-fl codebase

* XP-255 update codeowners and authors in setup (#195)

* XP-255 change setup author to XAIN AG & remove codeowners for now

* XP-255 black reformat

* XP-265 remove benchmark directory

* XP-265 remove datasets directory

* XP-265 remove references to either examples or benchmark directory

* XP-265 remove benchmark participant

* XP-265 remove benchmark config relations

* XP-265 remove benchmark coordinator, related types and helpers

* XP-265 review comments + changed type import

* XP-265 commenting out benchmarks docs

* XP-243 remove flags from xain-fl codebase

* XP-265 add ticket number for deciding about EvoAgg
* XP-261 move tests to tests dir

* XP-261 add todo's to pytest.ini

* XP-261 fix pytest.ini after rebase
* XP-354 Remove proto files

XP-354 add xain proto package

* XP-354 Fix isort errors

* XP-354 Remove cproto dependencies

* XP-354 Update git workflow

* XP-354 Remove unused import

* XP-354 Remove _pb2 glob

* XP-354 Remove protobuf in dockerfiles
* XP-273: remove scripts/test_slow.sh

This script is basically a one-liner (`pytest --run-slow`), and is
effectively a noop since there is no test marked as slow.

* XP-273: remove scripts/setup{_pyenv}.sh

setup_pyenv.sh suffers multiple issues that justify its removal:

- it is very intrusive: it downloads software in the user home
  directory, edits their zshrc/bashrc and installs software using the
  system package manager which requires root permissions
- it can break the user configuration if they already have a pyenv
  installation
- it only works in very specific cases: if the user uses a Debian
  based distro with bash as a default shell or MacOS with zsh

setup.sh is not particularly useful because most python developers
will do `pip install -e .[dev]` by themself.

* XP-273: script/format.sh: print the commands

The commands in this script can make changes to the repo, so users may
appreciate seeing the actual commands being run.

* XP-273: make scripts/rm_cache.sh less aggressive

- print the `rm` commands being run so that users know what's being deleted
- do not delete the `__pycache` directories so aggressively. For
  instance I usually create a virtualenv in each projects I'm
  working on and don't want to have a script touching it.

* XP-273: remove scripts/ls_running_instances.sh

This script requires private credentials to run and is specific to our
infrastructure so it does not make sense to keep it public.
* XP-357 make controller parametrisable

* XP-357 add TODO on valid attribute values

* XP-357 minor spelling corrections

* XP-357 simplify Controller.get_num_ids_to_select()
* XP-384 remove unused files

* XP-384 update CI jobs
* XP-374 Clean up docs

* XP-374 Add version number
* XP-271 fix pylint issues

* XP-271 fix pylint issues

* XP-271 fix isort

* XP-271 disable fixme warning

* XP-271 add pytest protobuf env

* XP-271 align scripts with ci settings

* XP-271 add docstring todos

* XP-271 update sphinx config
* XP-424 Remove unused packages

* XP-424 Move typing-extensions to dev
* XP-373 reactivate tests by removing comments

* XP-373 fix test_end_training

* XP-373 move dependency to tests_require
* XP-433 Fix docker headings

* XP-433 Replace bold with h4
* XP-119 Close channel after each test

* XP-119 Remove unused import

* XP-119 Remove TODO
* XP-422 update coordinator metric handling

* XP-422 update xain-proto import

* XP-422 update sdk import

* XP-422 correct test mocking destinations
* XP-308: store aggregated weights in S3 buckets

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

Summary:

This commits adds a new module `xain-fl.coordinator.store`, which
contains a `Store()` class that can read from and write to S3
buckets. At the end of each round, the aggregated weights are uploaded
to an S3 bucket. Note that as a first step, the aggregated weights are
still sent to the participants via gRPC.

The versioning is straightforward: the key for storing each set of
weights is the round number corresponding to the weights.

Implementation notes:

We're trying to not have IO related code in the Coordinator class, so
the Store instance is held by the CoordinatorGrpc instead of
Coordinator. However, since the Coordinator is currently generating
the grpc messages, which are inherently coupled to the IO logic. This
means that if at some point the messages need to contain information
related to storage, we'll end up with storage logic in the
Coordinator, and things might become pretty messy. Thus, we should
already start thinking about refactoring xain-fl.coordinator such that
the state machine is completely decoupled for the IO logic.

* XP-308: run isort, black and mypy on the tests package

This is not directly related to task but was done as part of writing
the tests for the storage logic.

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

* XP-308: Add unit test doing an entire round

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

Summary:

This is the first test where multiple participants connect and do a
round of training.

Implementation notes:

- Because we use the peer address to identify a participant, we have to
  use a port forwarding trick to make distinct connections to the
  coordinator. This is implemented in the `tests.port_forwarding` module

- To test the store implementation, we use a custom TestStore class that
  writes data in memory instead of using an actual S3 bucket. That way
  our tests don't depend on an external service. This is implemented in
  the `tests.store` module.

* XP-308: docker: install git and make pip verbose

Git is needed to fetch package from git repositories (xain-proto for
instance).

This commit also makes `pip install` verbose because the command can
take a very long time to execute when gRPC and protobuf are compiled,
and without any output, it may seem like the command hangs.

* XP-308: don't install dev dependencies in docker

The docutils is a transitive dev dependency that is causing problem.

Specifically, when installing dev dependencies in docker,
`docutils==0.16` gets installed, but `boto3` is incompatible with it
and `xain-fl` crashes at runtime:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (docutils 0.16 (/usr/local/lib/python3.7/site-packages), Requirement.parse('docutils<0.16,>=0.10'), {'botocore'})
```

I don't know why pip pulls this version of docutils though, since
we don't have any dependency that requires such a high version number:

```
$ pipdeptree -r -p docutils
docutils==0.16
  - botocore==1.13.50 [requires: docutils>=0.10,<0.16]
    - boto3==1.10.48 [requires: botocore>=1.13.48,<1.14.0]
      - xain-fl==0.2.0 [requires: boto3==1.10.48]
    - s3transfer==0.2.1 [requires: botocore>=1.12.36,<2.0.0]
      - boto3==1.10.48 [requires: s3transfer>=0.2.0,<0.3.0]
        - xain-fl==0.2.0 [requires: boto3==1.10.48]
  - m2r==0.2.1 [requires: docutils]
  - readme-renderer==24.0 [requires: docutils>=0.13.1]
    - twine==2.0.0 [requires: readme-renderer>=21.0]
  - Sphinx==2.2.1 [requires: docutils>=0.12]
```

* XP-308: print debugging information for spurious failure
* XP-208 model sum aggregator as a mock for testing

* XP-208 add test integrating P and C

* XP-208 add identity controller as a mock for test

* XP-208 small typo in id-controller docstring

* XP-208 add coordinator test fixture with mocked components

* XP-208 test for top-level function start_participant

* XP-436 fix bug: coordinator not advertising FINISHED state

* XP-436 revert logging level

* XP-308: store aggregated weights in S3 buckets (#215)

* XP-308: store aggregated weights in S3 buckets

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

Summary:

This commits adds a new module `xain-fl.coordinator.store`, which
contains a `Store()` class that can read from and write to S3
buckets. At the end of each round, the aggregated weights are uploaded
to an S3 bucket. Note that as a first step, the aggregated weights are
still sent to the participants via gRPC.

The versioning is straightforward: the key for storing each set of
weights is the round number corresponding to the weights.

Implementation notes:

We're trying to not have IO related code in the Coordinator class, so
the Store instance is held by the CoordinatorGrpc instead of
Coordinator. However, since the Coordinator is currently generating
the grpc messages, which are inherently coupled to the IO logic. This
means that if at some point the messages need to contain information
related to storage, we'll end up with storage logic in the
Coordinator, and things might become pretty messy. Thus, we should
already start thinking about refactoring xain-fl.coordinator such that
the state machine is completely decoupled for the IO logic.

* XP-308: run isort, black and mypy on the tests package

This is not directly related to task but was done as part of writing
the tests for the storage logic.

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

* XP-308: Add unit test doing an entire round

Refs:

https://xainag.atlassian.net/browse/XP-40
https://xainag.atlassian.net/browse/XP-308
https://xainag.atlassian.net/browse/XP-309
https://xainag.atlassian.net/browse/XP-311

Summary:

This is the first test where multiple participants connect and do a
round of training.

Implementation notes:

- Because we use the peer address to identify a participant, we have to
  use a port forwarding trick to make distinct connections to the
  coordinator. This is implemented in the `tests.port_forwarding` module

- To test the store implementation, we use a custom TestStore class that
  writes data in memory instead of using an actual S3 bucket. That way
  our tests don't depend on an external service. This is implemented in
  the `tests.store` module.

* XP-308: docker: install git and make pip verbose

Git is needed to fetch package from git repositories (xain-proto for
instance).

This commit also makes `pip install` verbose because the command can
take a very long time to execute when gRPC and protobuf are compiled,
and without any output, it may seem like the command hangs.

* XP-308: don't install dev dependencies in docker

The docutils is a transitive dev dependency that is causing problem.

Specifically, when installing dev dependencies in docker,
`docutils==0.16` gets installed, but `boto3` is incompatible with it
and `xain-fl` crashes at runtime:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (docutils 0.16 (/usr/local/lib/python3.7/site-packages), Requirement.parse('docutils<0.16,>=0.10'), {'botocore'})
```

I don't know why pip pulls this version of docutils though, since
we don't have any dependency that requires such a high version number:

```
$ pipdeptree -r -p docutils
docutils==0.16
  - botocore==1.13.50 [requires: docutils>=0.10,<0.16]
    - boto3==1.10.48 [requires: botocore>=1.13.48,<1.14.0]
      - xain-fl==0.2.0 [requires: boto3==1.10.48]
    - s3transfer==0.2.1 [requires: botocore>=1.12.36,<2.0.0]
      - boto3==1.10.48 [requires: s3transfer>=0.2.0,<0.3.0]
        - xain-fl==0.2.0 [requires: boto3==1.10.48]
  - m2r==0.2.1 [requires: docutils]
  - readme-renderer==24.0 [requires: docutils>=0.13.1]
    - twine==2.0.0 [requires: readme-renderer>=21.0]
  - Sphinx==2.2.1 [requires: docutils>=0.12]
```

* XP-308: print debugging information for spurious failure

* XP-208 join monitor_thread to confirm response to terminate_event

* XP-208 change mock patch string to object called rather than defined

* XP-208 model sum aggregator as a mock for testing

* XP-208 add test integrating P and C

* XP-208 add identity controller as a mock for test

* XP-208 small typo in id-controller docstring

* XP-208 add coordinator test fixture with mocked components

* XP-208 test for top-level function start_participant

* XP-436 fix bug: coordinator not advertising FINISHED state

* XP-436 revert logging level

* XP-208 join monitor_thread to confirm response to terminate_event

* XP-208 change mock patch string to object called rather than defined

* XP-436 suggested documentation fixes from review

* XP-208 mock the store (following rebase)

* XP-208 fix import

* XP-208 test_start_participant: mark as slow, comment on aggregation

Co-authored-by: Corentin Henry <little-dude@users.noreply.github.com>
* XP-480 adjust variable names and imports

* XP-480 temporarly fix isort

* XP-480 update install req

* XP-480 enable isort

* XP-408 update isort config

* XP-480 update sequence diagram

* XP-480 update proto import
* XP-333 Replace numproto with xain-proto

* XP-333 Format setup.py

* XP-333 Update xain-proto

* XP-333 Update docs
little-dude and others added 7 commits January 20, 2020 10:09
* XP-505: use sphinx-autodoc-typehints

Ref: https://xainag.atlassian.net/browse/XP-505

This sphinx extension will allow us to simplify docstrings by not
having to explicitely declaire each type, since types can be infered
from the type hints in the function signatures. It also automatically
generate working links, many of which are currently broken.

* XP-505: clean up xain_fl.coordinator.coordinator doc

Ref: https://xainag.atlassian.net/browse/XP-505
* XP-508 Replace circleci badge with github workflow

* XP-508 Add link

* XP-508 Fix link

* XP-508 Fix styles

* XP-508 Escape parentheses
On some platforms, /bin/bash does not exist. Using /usr/bin/env let
users choose which `bash` should be used to run the scripts.

References:

- https://xainag.atlassian.net/browse/XP-498
- https://unix.stackexchange.com/questions/29608/why-is-it-better-to-use-usr-bin-env-name-instead-of-path-to-name-as-my
* XP-505: cleanup docstrings in xain_fl.coordinator

This is a follow-up of #224

Reference: https://xainag.atlassian.net/browse/XP-505

* XP-505: fix links to grpc documentation

Cc: Robert Steiner <robertt.debug@gmail.com>
…-proto`) (#230)

* XP-227 Prepare release of v0.3.0 (matching version with `xain-fl` and `xain-sdk`)
* XP-227 Update setup.py to point to released xain-proto=0.3.0
@rsaffi rsaffi added the devops label Jan 23, 2020
@rsaffi rsaffi self-assigned this Jan 23, 2020
@rsaffi rsaffi changed the title XP-277 - Merge v0.3.0 changes into master XP-227 - Merge v0.3.0 changes into master Jan 23, 2020
@rsaffi rsaffi merged commit fed62b7 into master Jan 23, 2020
@rsaffi rsaffi deleted the v0.3.0_pr_master branch January 23, 2020 10:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants