Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openstack backend #170

Open
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

sergiocazzolato
Copy link
Contributor

This is the new openstack backend. Through this backend it is possible to run tests/tasks in openstack infrastructure.

The documenatation is also added explaning how to setup spread to use it.

For the openstack backend implementation the lib goose was used. This lib provides the clients needed to interact with the different openstack modules (nova. neutron, glance and, keystone).

}
server, err := p.computeClient.RunServer(opts)
if err != nil {
return nil, &FatalError{fmt.Errorf("Could not create instance", err)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the format string include a format specifier here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the FatalError means it is not needed to retry and it is built from an error, I already updated the error in the backend to make sure we retry only when it makes sense

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, I see know thanks for explaining.

@corytodd
Copy link

corytodd commented Jun 7, 2023

Awesome, we are able to allocate and discard on our openstack tenant. Do you think it's feasible to support adding security groups automatically? For example, we do not use a default allow-ssh policy on our network so in order for these tests to work I had to manually attach the policy after allocation but before the test ran. We have other ports that may need to be open depending on the test so having a way to set these dynamically would be useful for us.

@sergiocazzolato
Copy link
Contributor Author

Awesome, we are able to allocate and discard on our openstack tenant. Do you think it's feasible to support adding security groups automatically? For example, we do not use a default allow-ssh policy on our network so in order for these tests to work I had to manually attach the policy after allocation but before the test ran. We have other ports that may need to be open depending on the test so having a way to set these dynamically would be useful for us.

Thanks for taking a look. I'll add few extra features:
add extra storage
specify the network
specify the security group

@corytodd
Copy link

corytodd commented Jun 9, 2023

Specifying the network and security group work great! This passes our testing, I would support a +1 on getting this merged. Thanks!

@sergiocazzolato sergiocazzolato changed the title tests: new openstack backend openstack backend Jun 14, 2023
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Sergio! I did a first review and have some suggestion inline. I will also sync with Gustavo to ask how he wants to see this moving forward.

README.md Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
return sameImage, fmt.Errorf("failed to retrieve images list: %s", errorTitle(err.Error()))
}

for _, i := range images {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this code should be slightly more elaborate and follow googleProvider:image or linode:tempate(). linode is simpler and just does a prefix search but afaict all do more than just check for "contains"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvo5 could you please elaborate a bit more this? In openstack the images dont have family or project associated as in gce, so because of that I used the contain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was mostly wondering what contraints there are about image names, I created a unit test for the code now so that we can explore various test cases and examples :)

spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
@mvo5 mvo5 force-pushed the migrate-openstack-backend branch from 04aa498 to b08c9ca Compare July 5, 2023 12:41
@mvo5

This comment was marked as resolved.

spread/project.go Outdated Show resolved Hide resolved
spread/project.go Outdated Show resolved Hide resolved
mvo5

This comment was marked as outdated.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gustavo also asked that backends that do not (yet) support the options network/groups should error when they are specified.

A smoke spread test against a real system should be included and unit tests as far as possible without modifying non-openstack code.

The way images are selected/filtered also needs a review.

spread/project.go Outdated Show resolved Hide resolved
spread/project.go Outdated Show resolved Hide resolved
@sergiocazzolato
Copy link
Contributor Author

@mvo5 about the network list associated to a machine, the consideration here is that where there are more than 1 network associated to a machine, the ip used by spread to connect has to be provided by the first network. I'll include that in the README.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I looekd a bit more and I really like the updated spread test! I also added a few comments and suggestions and pushed some small tweaks.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
spread/openstack.go Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
return sameImage, fmt.Errorf("failed to retrieve images list: %s", errorTitle(err.Error()))
}

for _, i := range images {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was mostly wondering what contraints there are about image names, I created a unit test for the code now so that we can explore various test cases and examples :)

@mvo5 mvo5 force-pushed the migrate-openstack-backend branch from 9053022 to 8fbae16 Compare August 17, 2023 16:28
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Copy link
Collaborator

@ZeyadYasser ZeyadYasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have some small questions.

spread/openstack.go Show resolved Hide resolved
spread/openstack.go Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to everyone involved in getting this backend cooked. Let's see if we can get it merged in the near future.

A while ago I had already done an initial high-level pass on the logic with Michael, and still need to review it in more detail, but I'm not expecting major surprises there as my understanding is it's heavily based on the GCE backend. So in this pass I reviewed mainly the bits surrounding the actual backend logic. Once we get to some agreements on those I'll go in and do a more complete review on the backend details.

Please let me know how you'd like to proceed from here, otherwise.

@@ -17,7 +17,7 @@ jobs:

- name: Run tests
run: |
spread google:
spread google:ubuntu-20.04-64: google:ubuntu-22.04-64-devstack:tests/openstack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here? If this makes sense (unclear for now), it should be documented so it's more obvious what was disabled and what was enabled here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the review,

In order to test openstack backend against a real openstack interface, we pre-configured a new image with devstack already installed. The main raeson was to speed up and simplify the test (so that image is only used to test openstack). Should I add this explanation in the workflow?

README.md Outdated
```

The Openstack backend gets all the information to authenticate from the
environment variables. The following variables have to be set:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like a great practice, as it's sending real authentication data to every single test run. It also disagrees with what we do with the Google backend, and every other backend maybe?

I'm almost certainly not the first one to point this out, so what is the actual alternative practice inside the OpenStack community?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most common mechanism to connect a client through the Openstack API is by sourcing a file with teh environment variables (like the one we get to use openstack client in canonistack). For example in PS5, in the environment I also see the openstack env vars in my environment (they are managed by vault tool), so I presume juju is using those to connect to openstack.

I agree with you this is not a good practice because this means we need to have an env var with the user and password. I'll research which workaround we could use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the documentation to explain better which env vars need to be defined. It is also supported to authenticate by using a key (similar to what we have in google). The key has to be stored in an env var to be loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement could be to load the vars from a file (as we hav in google) instead of the env. @niemeyer What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I added a similar approach that the used in google.

OS_PASSWORD
OS_REGION_NAME
OS_INTERFACE
OS_IDENTITY_API_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the above, which of those variables have a direct equivalent in the Google backend setup?

Copy link
Contributor Author

@sergiocazzolato sergiocazzolato Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In google we use the env var SPREAD_GOOGLE_KEY which has a link to the file with the following data:
"type"
"project_id"
"private_key_id"
"private_key"
"client_email"
"client_id"
"auth_uri"
"token_uri"
"auth_provider_x509_cert_url"
"client_x509_cert_url"

I updated the list of environment variables in the docs and I understand that the equivalent are:
project_id <-> OS_PROJECT_ID
auth_uri <-> OS_AUTH_URL
private_key_id <-> OS_ACCESS_KEY
private_key <-> OS_SECRET_KEY
client_id <-> ( OS_PROJECT_DOMAIN_NAME | OS_USER_DOMAIN_NAME )

spread.yaml Outdated
- .spread-reuse.yaml
- tests/.spread-reuse.yaml
- $CACHE_DISABLED
- "*.snap"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the quotes only on this one? Also, where do the snap files come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

"context"
"time"

gooseClient "github.com/go-goose/goose/v5/client"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gooseClient/gooseclient/

Package names in Go are not typically cammel-cased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Check the error in case the network does not exist
spread openstack:cirros-64-wrong-network: -v -reuse -resend &> task.out || true
grep 'cannot find valid network with name "noexist"' task.out
test -z "$(openstack server list)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a system with very nice abstraction for composing test scenarios. Is there a good reason for us to choose to cook all of them as a shell script inside a single task instead of using that composition to at least bundle closely related ideas together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I though this but the problem I found is the time that devstack consumes to start (about 10 minutes), so if I create variants it will require more machines to get results. If makes sense I could move the scenarios to different variants and run them in parallel using more workers.

fi
sleep 1
done
test -z "$(openstack server list)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a prior check that ensures that the list started empty in the first place? Also, please consider the comment above in this context.

Also, won't OpenStack show the any used servers as terminated instead of just showing an empty list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the initial check to verify there is not any server running.

The command openstack server list doesn't show terminated servers, it just includes active ones.

# The instance was created and the status has to be active to
# fail trying to access through ssh
spread openstack:cirros-64: -v -reuse -resend &> task.out || true
grep 'cannot find ready marker in console output for .*: timeout reached' task.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a great method to verify the contents, because on failure we'll get nothing, and won't know why we got nothing. I believe we have common practices for this in the Spread world. How do they look like?


# trigger 1 instance and check it can be listed and the garbage collect works
test "0" = "$(spread -gc | grep -c "Checking openstack instance")"
spread openstack:cirros-64: &>/dev/null &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this stopped/checked for correctness?

fi
sleep 1
done
openstack server show "$SERVER_ID" -f shell | MATCH 'status="ACTIVE"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all this test does is check that some server has shown up? Nothing else at all?

sergiocazzolato and others added 5 commits February 28, 2024 11:35
Updated order of openstack in list of backends
Updated name of gooseClient -> gooseclient
Fixed issues, it was not trying ssh connection when serial console is
not available
Updated spread.yaml noexist -> invalid
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
@niemeyer
Copy link
Contributor

Folks, please schedule a call for us next week so we can have a general conversation about the approach for authentication and testing before we all spend too many cycles on different avenues.

Also include test to validate the key/secret authentication and tne env
file
@ZeyadYasser ZeyadYasser removed the Ready label Mar 20, 2024
Copy link
Collaborator

@ZeyadYasser ZeyadYasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, small comments

README.md Outdated Show resolved Hide resolved
spread.yaml Outdated Show resolved Hide resolved
spread.yaml Outdated Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
spread/openstack.go Show resolved Hide resolved
spread/openstack.go Outdated Show resolved Hide resolved
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
devstack has many issues and cannot fully replicate a normal
openstack cluster for testing.

Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
@ZeyadYasser
Copy link
Collaborator

I dropped the spread test for openstack due to issues and inconsistencies faced with devstack where it cannot replicate a normal openstack cluster.

A better alternative is to do something similar to google. After the openstack backend is merged, we add it as a backend in spread.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants