Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable Partial Publishing in a CI/CD workflow? #821

Closed
trieloff opened this issue May 3, 2019 · 18 comments · Fixed by #834
Closed

How to enable Partial Publishing in a CI/CD workflow? #821

trieloff opened this issue May 3, 2019 · 18 comments · Fixed by #834
Assignees
Labels
question Further information is requested released

Comments

@trieloff
Copy link
Contributor

trieloff commented May 3, 2019

@filmaj [Yesterday at 9:05 PM]

i asked a few weeks back if a partial deploy / publish was possible, such that i could deploy+publish staging strains to one domain/host, run a battery of tests against that, and if that passes, then deploy+publish production strains, but that didnt seem possible

@trieloff [10 hours ago]

The problem is that the publish operation is atomic, i.e. you cannot partially update only a few strains, because the non-updated strains effectively disappear.

@trieloff [10 hours ago]

What you can do is to create staging strains that only work on a non-production domain, so no regular visitor can ever see them.

@trieloff [10 hours ago]

This still leaves you with a required edit operation once you promote it to master.

@trieloff [10 hours ago]

Maybe one way of building this (and I’d like to hear @dpfister’s take, too) is to add a CI or GitHub branch status as a possible condition to a strain. Here is how this could look like in a helix-config.yaml:

  - name: subsite-staging
    condition:
      and:
        - url: https://stage.example.com/support/
  - name: subsite-production
    condition:
    and:
      - url: https://www.example.com/support/
      - ci: https://circleci.com/gh/adobe/helix-publish/tree/fresh-config

The subsite-staging strain can be entered simply by going to stage.example.com, but the subsite-production strain is only enabled when CircleCI reports tests as passing on https://circleci.com/gh/adobe/helix-publish/tree/fresh-config. Wether the test is passing or not should be evaluated at publish time, so that it does not flip flop randomly.
You could then have a development flow where the developer who’s working on a new feature (@hireshah) creates his branch, with the two new strains and submits a PR. The maintainer (@maj) reviews it and merges it even when the branch is still not completely ready, because he has confidence that no production traffic will meet the strain until the CI build is green. (edited)

@dominique-pfister [8 hours ago]

Hm, interesting 🙂 But then the decision must be somehow persisted, right?

@trieloff [8 hours ago]

We could store this in an edge dictionary, simply using the CircleCI or GitHub URL as a key, or we could just store it in the plain VCL file.

@dominique-pfister [8 hours ago]

If the VCL is generated on publish anyway, that would be straightforward, of course. (edited)

@filmaj [1 hour ago]

It’s an interesting idea but but tests-passing is sometimes a false positive as well (“i forgot to add tests to my new feature”).
what about having two helix configs, one for staging and one for production? then continuous deployment would look like:
hlx deploy --config helix-config-staging.yaml && npm run runtime-tests-staging && hlx publish --config helix-config-staging.yaml && npm run end-to-end-tests-staging && hlx deploy --config helix-config-production.yaml && npm run runtime-tests-production && hlx publish --config helix-config-production.yaml && npm run end-to-end-tests-production && npm run tag-known-good

@filmaj [38 minutes ago]

is there a better venue for this discussion than a slack thread? this is quite urgent from our end. we’ve hit a few issues now in the transition from local hlx up dev to deployed-to-fastly, so we need to nail down a way to publish to, verify at and promote from a publicly accessible staging env to production

@filmaj [11 minutes ago]

i appreciate your suggestion’s use of automation, and its a cool idea, but i would also like the ability to promote and demote manually, especially as we build out the full test suite and figure it all out

@trieloff [6 minutes ago]

With the new conditions language, you will be able to easily promote and demote strains, because you can simply mix in a *prod or *stage into a condition.

@trieloff [4 minutes ago]

I don’t really like the idea of separate prod and stage configurations because it discourages incremental, feature-flag driven development and feels too much like having environments, which aren’t really a thing in serverless. (or shouldn’t be)

@trieloff trieloff added the question Further information is requested label May 3, 2019
@trieloff
Copy link
Contributor Author

trieloff commented May 3, 2019

Thinking about this, it's even more complicated:

  • we want to enable publish from all branches
  • we do not want all branches to have the ability to break the live site
  • we can only publish one atomic configuration

@filmaj
Copy link
Contributor

filmaj commented May 3, 2019

Ultimately, what I want to do with this issue is figure out how to catch errors when moving from running quality assurance on an hlx up instance to running quality assurance on something that was hlx deploy && hlx publish'ed. There is a big difference in behaviour between the helix simulator and fastly, and we've been burned a few times by this different behaviour. Typically how this is handled for old-fashioned websites is by deploying to a staging environment, running tests against it, and ensuring those pass before promoting to the production environment. The difference between these environments usually boils down to the domain.

Right now I feel like there is no good answer for the above. For the devsite, we have both staging and production in our helix-config. So when our continuous deployment kicks in, it's all or nothing. While rolling back is easy, it still exposes us to pushing bugs to production. That's what I want to avoid.

My old-fashioned-website-building mind therefore fell back to what was familiar: deploy to a publicly accessible URL that is considered a staging environment and run tests against it before deploying to production. So my off-the-top-of-my-head suggestion was: partial deploys? That is, having the ability to deploy staging strains separately from production strains. That would solve my problem.

In the end I lean on y'all to tell me how to use helix 😄

@filmaj
Copy link
Contributor

filmaj commented May 3, 2019

"Partial publish" was just my suggestion on a solution, but doesn't capture the underlying problem. The problem is: how to deploy/publish a helix site, have it be powered by Fastly (and not the simulator), give me the ability to verify that version works properly and finally let me deploy/publish to my production domain.

Bonus points for a solution to the above problem that works with a pull request system! That is, a pull request that passes hlx-up-based tests gets deployed somewhere (a specific staging environment that keeps getting overwritten? a different staging environment for each pull request?), tests are run against the deployed environment, results are reported back. This would also allow people to preview their PR on a live staging site.

@trieloff
Copy link
Contributor Author

trieloff commented May 3, 2019

That is, having the ability to publish [was: deploy] staging strains separately from production strains. That would solve my problem.

Maybe a quick solution would be to have a flag for hlx publish --only "*-stage" or hlx publish --exclude "*-prod" which would load the current branches' helix-config.yaml and filter it according to the expression above.

It would then merge the filtered list of strains with the list of strains of the current master branch and publish the whole thing. That way you can restrict effective changes to --only the strains you want, or --exclude the strains that see live traffic.

@trieloff
Copy link
Contributor Author

trieloff commented May 3, 2019

As long as you are disciplined about your strain names and don't have developers who take liberty with the conditions, that might work.

@filmaj
Copy link
Contributor

filmaj commented May 3, 2019

That sounds good to me. I was planning on adding static analysis tests for the helix config to ensure that our *-staging strains only define URLs for our staging domain (and do the same for *-production strains), so those could be expanded to ensure strain names follow a particular pattern too.

@filmaj
Copy link
Contributor

filmaj commented May 6, 2019

Do y'all have a plan to implement anything for this? Suffering on the devsite end due to lack of support for this 😞

@tripodsan
Copy link
Contributor

why not have a distinct staging branch with a different helix-config.yaml that uses a different fastly and runtime namespace?

@trieloff
Copy link
Contributor Author

trieloff commented May 7, 2019

@filmaj yes, I'll start on this today.

@tripodsan I don't think it is a good idea to replicate the old pattern of prod, stage, dev environments in a serverless stack where you don't have the resource limitations of old: https://medium.com/adobetech/the-newfinalfinal-v2-psd-of-serverless-computing-5d9b9965d9c1

Having a singular staging branch would make it harder to gradually release new features, because you'd have to cherry-pick which commits from the staging branch you want to promote.

By having one single config, albeit with flight control to determine what can get published from which branch, you have everything in one place, and can easily assess the impact of a single branch.

@trieloff trieloff self-assigned this May 7, 2019
trieloff added a commit to adobe/helix-shared that referenced this issue May 7, 2019
@filmaj
Copy link
Contributor

filmaj commented May 7, 2019

I would prefer not to have the oldschool gitflow-style of environment management via git branches, as that makes the overall configuration across domains/environments less visible. However, if that is the recommended approach with helix in the end, we would adopt it on the devsite side.

@trieloff
Copy link
Contributor Author

trieloff commented May 7, 2019

@filmaj can you elaborate?

@filmaj
Copy link
Contributor

filmaj commented May 7, 2019

If we can ensure a CD flow across multiple environments such that all the environment information sits in one place (i.e. one config file), and not spread that info across separate branches, that would be preferable. If we use different branches for different environments, then a developer needs to check out different branches to see details around different environments. To me that is not ideal.

@kptdobe
Copy link
Contributor

kptdobe commented May 8, 2019

I think the monolithic huge helix-config.yaml is not sustainable on the long run neither having multiple Fastly services in an env file: if a developer runs hlx deploy / publish with the wrong Fastly service loaded, he may kill an entire domain (experience: already happen to me ;) ).

@tripodsan
Copy link
Contributor

tripodsan commented May 8, 2019

if a developer runs hlx deploy / publish with the wrong Fastly service loaded, he may kill an entire domain (experience: already happen to me ;) ).

maybe we can save-guard against this, by adding a domains section to the config, eg:

version: 1.0
domains:
  - "*.project-helix.dev"
strains:
...

by default, hlx publish will fetch the domains of the service first and match them with the config. if diffrent, it aborts the publish.

@kptdobe
Copy link
Contributor

kptdobe commented May 8, 2019

We could yes. That's maybe the point where we might need to re-think the whole process, we start having too many dimensions: fastly service id, fastly domain, strain name, regex on strain names to classify the strains, strain domains, strain url, strain condition... and we still not have a good and simple way to describe a dev, stage, prod setup.

@trieloff
Copy link
Contributor Author

trieloff commented May 8, 2019

Wouldn't that mean that the config file sits somehow outside of the version control system (if it can't be branched)? I think with the mechanism in #834 we have a way to restrict what can take effect in non-master branches.

In addition, you could use the strain pinning feature (with the X-Strain cookie) to run tests against strains that have been introduced in branches.

@filmaj
Copy link
Contributor

filmaj commented May 8, 2019

maybe we can save-guard against this, by adding a domains section to the config

Great idea; we ended up implementing a set of basic static analysis tests for the devsite's helix-config.yaml to avoid exactly this kind of problem; see https://github.com/adobe/developer.adobe.com/blob/master/test/helix-config/test.config.js

On separate but related topic to this issue, this came up in a discussion with @simonwex yesterday. I want to plant a seed with y'all on the Helix team. One use case we want to cover is the following:

As a Content Author, I want to be able to edit a page and see what it will look like on the site so that I’m not surprised by changes I make

For content authors managing content repos, I think it would be good to study this use case from the content author perspective. What if content authors are working on multiple changes simultaneously? Imagine a content repo with several working branches; what would the preview flow through the devsite look like for them? My naive wish/suggestion would be something like: each content repo branch is available on a different (perhaps randomly-generated) subdomain in a stage environment.

This also applies to me-as-a-devsite-developer: I want to make a change (send a PR) to the devsite and preview the site on a publicly-available URL before accepting the change (that's what this issue is discussing). What if I have multiple PRs happening at the same time?

While the suggestions for this issue unblock me in the short term (I can manage a single staging environment and juggle different PRs between that single environment myself through automation), does that enable a longer-term vision of multiple content repos, possibly each containing multiple versions, that each need to be previewable, as well as the devsite itself needing multiple versions to be previewable?

I wanted to plant this seed, especially with the upcoming hackathon.

trieloff pushed a commit that referenced this issue May 8, 2019
# [2.1.0](v2.0.5...v2.1.0) (2019-05-08)

### Bug Fixes

* **package:** update snyk to version 1.163.0 ([3e96915](3e96915)), closes [#841](#841)
* **publish:** fix loggers ([1db7669](1db7669))
* **publish:** update description of `--exclude` param ([adf486f](adf486f))

### Features

* **git:** helper functions for getting the contents of a file at a ref ([0b349d8](0b349d8))
* **publish:** add new --only and --exclude flags for publish command line ([8739ef4](8739ef4))
* **publish:** implement filtering with `--only` and `--exclude` ([e94ea52](e94ea52)), closes [#821](#821)
@adobe-bot
Copy link
Collaborator

🎉 This issue has been resolved in version 2.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants