Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum Cypress 12.15.0 for new Google Chrome 117 needed #5479

Closed
MikeMcC399 opened this issue Sep 13, 2023 · 20 comments
Closed

Minimum Cypress 12.15.0 for new Google Chrome 117 needed #5479

MikeMcC399 opened this issue Sep 13, 2023 · 20 comments
Assignees

Comments

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Sep 13, 2023

Subject

Guides > Launching Browsers > Chrome Browsers

Description

According to discussion in https://discord.com/channels/755913899261296641/1151449941449912420

Google Chrome 117 needs a minimum of Cypress 12.15.0 to work together out-of-the-box.

I'm uncertain if this needs to be documented, since https://docs.cypress.io/guides/guides/launching-browsers#Download-specific-Chrome-version already says:

The Chrome browser is evergreen - meaning it will automatically update itself,
sometimes causing a breaking change in your automated tests. We host
chromium.cypress.io with links to download a
specific released version of Chrome (dev, Canary and stable) for every platform.

So this is just a "heads-up".

Edit: See #5479 (comment) for workaround.

@mjhenkes
Copy link
Member

I can't log into discord for whatever reason but if this is related to --headless=new, you should be able to enable it in any older version of cypress by modifying this guide: https://docs.cypress.io/api/plugins/browser-launch-api#Disable---headlessnew-for-Chrome

on('before:browser:launch', (browser = {}, launchOptions) => {
  if (browser.name === 'chrome' && browser.isHeadless) {
    launchOptions.args = launchOptions.args.map((arg) => {
      if (arg === '--headless') {
        return '--headless=new'
      }

      return arg
    })
  }

  return launchOptions
})

@MikeMcC399

This comment was marked as outdated.

@mjhenkes
Copy link
Member

@MikeMcC399, could you state what the root issue is that you're seeing outside of a link to discord. I'm just guessing it might be related to the headless new flag and demonstrating how you could change that flag in cypress versions prior to 12.15.0

@MikeMcC399
Copy link
Contributor Author

MikeMcC399 commented Sep 14, 2023

@mjhenkes

Sorry for the updates on the fly, which may have been a bit confusing!

So the root issue reported in Discord is that using Cypress 12.9.0 in headless mode resulted in a failure to connect to Chrome 117.

Still waiting to connect to Chrome, retrying in 1 second (attempt 62/62)
All promises were rejected
AggregateError: All promises were rejected

Edit:

@MikeMcC399

This comment was marked as outdated.

@PlaxoGhouse
Copy link

Cypress works fine on Chrome Headed browser. However, When I run on Chrome (Version 117.0.5938.63 (Official Build) (64-bit)) headless browser then it fails.

Still waiting to connect to Chrome, retrying in 1 second (attempt 62/62)
Cypress failed to make a connection to the Chrome DevTools Protocol after retrying for 50 seconds.
This usually indicates there was a problem opening the Chrome browser.
The CDP port requested was 54291.
Error: connect ECONNREFUSED 127.0.0.1:54291 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16)

@MikeMcC399
Copy link
Contributor Author

@PlaxoGhouse

Which version of Cypress, which operating system and which version of Node.js are you using? The error message is very similar to what I saw, but it is not identical.

@PlaxoGhouse
Copy link

For one of my Project, I am using "cypress": "^9.7.0", node : v18.16.0. Running it on Windows 10 and Linux machine.

 "cypress": "^9.6.0",
    "cypress-cucumber-attach-screenshots-to-failed-steps": "^1.0.0",
    "cypress-cucumber-preprocessor": "^4.3.0",
    "cypress-failed-log": "^2.9.5",
    "cypress-file-upload": "^4.1.1",
    "cypress-terminal-report": "^3.5.2",
    "cypress-wait-until": "^1.7.1",
    "cypress-xpath": "^1.8.0",
    "multiple-cucumber-html-reporter": "^3.1.0"

It was working fine on Headless with previous Chrome version (116).

@MikeMcC399
Copy link
Contributor Author

MikeMcC399 commented Sep 14, 2023

I can't see anything in the Chrome 117 blog which could explain the sudden Cypress compatibility issues, where older versions of Cypress which worked with Chrome 116 now stopped working with Chrome 117.

@drc-nloftsgard
Copy link

@MikeMcC399 any removal in this list chrome-117-beta that would explain the issue?

@MikeMcC399
Copy link
Contributor Author

@drc-nloftsgard

Thanks for the link chrome-117-beta! The one in the blog (https://developer.chrome.com/blog/deps-rems-117/) just gave a 404.

I'll have to defer to the browser experts to judge what has gone wrong. I'm not deeply enough into this to be able to sort it out myself. Perhaps it is simply a bug / unintended regression in Chrome 117?

@mschile
Copy link
Contributor

mschile commented Sep 14, 2023

@MikeMcC399, using the workaround provided by @mjhenkes above, I was able to successfully run the tests on Cypress 12.9.0.

@MikeMcC399

This comment was marked as outdated.

@mschile
Copy link
Contributor

mschile commented Sep 14, 2023

@MikeMcC399, this was on MacOS 13.5. I'm going to try my Windows and Linux VM as well.

@MikeMcC399
Copy link
Contributor Author

@mschile

I am so sorry. I completely misread @mjhenkes advice and used the workaround from the docs, not the one he posted.

The docs say:

if (arg === '--headless=new') {
        return '--headless'

and his advice was exactly the opposite way around:

      if (arg === '--headless') {
        return '--headless=new'

I will go back and re-test.

@MikeMcC399
Copy link
Contributor Author

I verified the workaround from above:

using the workaround provided by @mjhenkes above, I was able to successfully run the tests on Cypress 12.9.0.

With Cypress 12.9.0 and Node.js 18.17.1:

  • Ubuntu 22.04.3 LTS with Chrome 117.0.5938.62

  • Windows 11 Pro with Chrome 117.0.5938.63

(I have hidden the posts where I used the wrong workaround to avoid confusion.)

@MikeMcC399
Copy link
Contributor Author

Should this workaround be published in the docs now?

@MikeMcC399
Copy link
Contributor Author

This is not a documentation issue, since it is only a temporary condition caused by the Chromium bug 1483163 which causes Chromium 117 version family browsers (Chromium, Chrome & Edge) to crash under certain conditions when run in headless mode.

The issue is resolved in Chrome Canary 119.

Using Chrome Canary build on Windows win64 downloaded from
https://www.google.com/chrome/canary/?platform=win64
Version 119.0.6017.1 (Official Build) canary-dcheck (64-bit)

git clone --branch test/chrome-stable-cypress-12-14-0 https://github.com/MikeMcC399/github-action
cd github-action/examples/browser
npm ci
npx cypress info
npx cypress run --browser chrome:canary

runs successfully with Cypress 12.14.0 and Chrome 119.0.6017.1

  (Run Starting)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Cypress:        12.14.0                                                                        │
  │ Browser:        Canary 119 (headless)                                                          │
  │ Node Version:   v20.7.0 (C:\Program Files\nodejs\node.exe)                                     │
  │ Specs:          1 found (spec.cy.js)                                                           │
  │ Searched:       cypress/e2e/**/*.cy.{js,jsx,ts,tsx}                                            │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘


────────────────────────────────────────────────────────────────────────────────────────────────────

  Running:  spec.cy.js                                                                      (1 of 1)
before launching browser
{
  name: 'chrome',
  family: 'chromium',
  channel: 'canary',
  displayName: 'Canary',
  version: '119.0.6017.1',
  path: 'C:\\Users\\mikem\\AppData\\Local\\Google\\Chrome SxS\\Application\\chrome.exe',
  minSupportedVersion: 64,
  majorVersion: '119',
  isHeadless: true,
  isHeaded: false
}
chrome launch args:
--test-type
--ignore-certificate-errors
--start-maximized
--silent-debugger-extension-api
--no-default-browser-check
--no-first-run
--noerrdialogs
--enable-fixed-layout
--disable-popup-blocking
--disable-password-generation
--disable-single-click-autofill
--disable-prompt-on-repos
--disable-background-timer-throttling
--disable-renderer-backgrounding
--disable-renderer-throttling
--disable-backgrounding-occluded-windows
--disable-restore-session-state
--disable-new-profile-management
--disable-new-avatar-menu
--allow-insecure-localhost
--reduce-security-for-testing
--enable-automation
--disable-print-preview
--disable-device-discovery-notifications
--autoplay-policy=no-user-gesture-required
--disable-site-isolation-trials
--metrics-recording-only
--disable-prompt-on-repost
--disable-hang-monitor
--disable-sync
--disable-web-resources
--safebrowsing-disable-download-protection
--disable-client-side-phishing-detection
--disable-component-update
--simulate-outdated-no-au='Tue, 31 Dec 2099 23:59:59 GMT'
--disable-default-apps
--use-fake-ui-for-media-stream
--use-fake-device-for-media-stream
--disable-ipc-flooding-protection
--disable-backgrounding-occluded-window
--disable-breakpad
--password-store=basic
--use-mock-keychain
--disable-dev-shm-usage
--enable-precise-memory-info
--proxy-server=http://localhost:55529
--proxy-bypass-list=<-loopback>
--headless
--window-size=1280,720
--force-device-scale-factor=1
--remote-debugging-port=55533
--remote-debugging-address=127.0.0.1
--window-size=1920,1080


  √ works (468ms)

  1 passing (3s)


  (Results)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Tests:        1                                                                                │
  │ Passing:      1                                                                                │
  │ Failing:      0                                                                                │
  │ Pending:      0                                                                                │
  │ Skipped:      0                                                                                │
  │ Screenshots:  0                                                                                │
  │ Video:        false                                                                            │
  │ Duration:     2 seconds                                                                        │
  │ Spec Ran:     spec.cy.js                                                                       │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘


====================================================================================================

  (Run Finished)


       Spec                                              Tests  Passing  Failing  Pending  Skipped
  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ ✔  spec.cy.js                               00:02        1        1        -        -        - │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘
    ✔  All specs passed!                        00:02        1        1        -        -        -

@MikeMcC399 MikeMcC399 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
bkimminich added a commit to juice-shop/juice-shop that referenced this issue Sep 21, 2023
@giantryansaul
Copy link

@MikeMcC399 I've been trying to implement the 'before:browser:launch' workaround in the thread, we are on Cypress 9.7.0 (we have someone looking into a cypress upgrade currently).

Locally, the fix works for me on my M1 MacBook. But when I try to run this in Github Actions using their ubuntu-22 runner it breaks.

Interestingly it works fine for the first test, but the 2nd test always runs into the same Still waiting to connect to Chrome, retrying in 1 second issue.

One thing I tried was jumbling the tests around just to see if it was a particular test that was having issues, but it looks like any combination of tests does this.

We are also using v5 of the Cypress GIthub Actions Runner. I haven't tried using v6 yet since it needs a node upgrade.

@MikeMcC399
Copy link
Contributor Author

@giantryansaul

You may like to try the workaround of running headed temporarily until a fixed version of Chrome is rolled out.

        with:
          browser: chrome
          headed: true

I wouldn't expect any change moving from cypress-io/github-action@v5 to cypress-io/github-action@v6 as far as this issue is concerned.

If you need additional help, you could also use the Cypress technical community on Discord

Discord chat (click on button)

ciyer added a commit to SwissDataScienceCenter/renku-ui that referenced this issue Sep 22, 2023
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 25, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Fixes: https://tracker.ceph.com/issues/62971
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 25, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 25, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 26, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 26, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Sep 27, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 630ba3a)
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this issue Oct 3, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 630ba3a)
shrutiparvekar added a commit to shrutiparvekar/ceph that referenced this issue Oct 8, 2023
author sparvekar <shruti.sp07@gmail.com> 1695407418 -0400
committer sparvekar <shruti.sp07@gmail.com> 1696791578 -0400

docs/cephadm: fix broken links in cephadm docs

In the documentation https://docs.ceph.com/en/quincy/cephadm/services/osd/, there is a broken link in "List Devices" chapter. This change fixes the documentation to point to the correct link https://docs.ceph.com/en/quincy/rados/operations/devices

Fixes: https://tracker.ceph.com/issues/55763

Signed-off-by: sparvekar <shruti.sp07@gmail.com>

rgw: remove Bucket::update_container_stats()

callers use Bucket::read_stats() to load bucket stats

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/admin: 'buckets list' takes --marker

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/sal: list_buckets() returns RGWBucketEnts

`sal::User::list_buckets()` no longer returns a map of `sal::Bucket`
handles. it now uses `std::span<RGWBucketEnt>` for input and output.
`RGWBucketEnt` contains all of the information we need to satisfy
ListBuckets requests, and also stores the `rgw_bucket` key for use with
`Driver::get_bucket()` where a `sal::Bucket` handle is necessary

`sal::BucketList` contains the span of results and the `next_marker`.
the `is_truncated` flag was removed in favor of `!next_marker.empty()`

the checks for `user->get_max_buckets()` on bucket creation now use a
paginated `check_user_max_buckets()` helper function that limits the
number of allocated entries to `rgw_list_buckets_max_chunk`

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/sal: StoreBucket no longer wraps RGWBucketEnt

`sal::Bucket` no longer needs to wrap `RGWBucketEnt` to support user
bucket listings, so can be represented by `RGWBucketInfo` alone. the
bucket stats interfaces that relied on RGWBucketEnt internally now
return their result as either `RGWBucketEnt` or `RGWStorageStats`

Signed-off-by: Casey Bodley <cbodley@redhat.com>

debian: Build-Depend on g++ 11 or greater

Rely on the packaging system to provide a suitable g++ of version 11
or greater, and removing the corresponding hard-coding from
debian/rules, since cmake will then find a suitable version. This
seems better than trying to hard-code a particular version in
debian/rules, and Debian package building tools like e.g. sbuild will
then do the right thing.

This enables Reef (v18.2.0) to build on Debian bookworm in a clean
chroot.

Fixes: https://tracker.ceph.com/issues/61845

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: specify interpreters for ceph-mon and ceph-osd postinsts

These were previously missing. The requirement for interpreters is in
Debian policy section 10.4:
https://www.debian.org/doc/debian-policy/ch-files.html#s-scripts

Debian's packaging already adds the #! to these two postinsts. In
practice, a text executible without a #! line will likely be executed
by the calling shell, so a lot of the time we'd get away with it
unless the administrator is using an incompatible shell like tcsh.

This behaviour of shells is documented in POSIX section 1(e)(i)(b)
here:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: remove {Build-,}Depends on essential packages

Unless there's a version requirement (which there isn't here),
packages should not declare a Build-Depends: or Depends: relationship
on essential packages. Policy link:

https://www.debian.org/doc/debian-policy/ch-binary.html#dependencies

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: add missing item separators in debian/control

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian/copyright: update syntax, maintainer, add license stanzas

Update the header paragraph to link to the canonical URL for the
format, and point to dev@ceph.io as the Contact.

Also add License: stanzas to reflect the licences in use (and refer to
fuller versions in /usr/share/common-licenses/ as appropriate).

This means that packages containing this copyright file are better in
compliance with the licences concerned.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: dh compat to 12, necessary init/systemd adjustments

Bring the dh compat level to 12, the most recent supported by the
oldest supported Ubuntu LTS release, 20.04. This necessitates changes
to how initscripts & systemd packaging are done.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: correct maintainer address

This means that debian/control matches changelog entries, and that the
Maintainer address is up to date.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: remove obsolete ceph-base.docs, restore dh_installdocs

debian/ceph-base.docs only referred to a README that doesn't exist, so
remove it. Because dpkg-source doesn't reflect deletions from debian/
cf the orig.tar.gz, also remove the file in dh_auto_clean.

Then do away with the removal of the empty override of dh_installdocs;
the main benefit of which here is that debian/copyright gets installed
in all of the built packages, which otherwise lack a copyright
file.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: specify a dependency on python3 for cephadm

cephadm is a compressed zipapp, and dh3_python3 doesn't understand
this sort of binary file, so fails to produce the required python3
dependency. So specify this explicitly in debian/control

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: radosgw.init to installinit, remove auto_build override

Installation of init scripts properly belongs with dh_installinit, so
move the installation there.

That means we no longer need the override of dh_auto_build, which
simplifies the rules file.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

debian: call dh_python3 for ceph-{base,common,fuse,volume}

In the cases of ceph-base, ceph-common, and ceph-fuse, this picks up
that these packages contain python scripts and adds a necessary
python3 dependency. In the case of ceph-volume it additionally parses
the requirements.txt file.

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>

test: unmount the mountpoint just before exiting

Without this the qa test may fail by evicting the unresponsive
client after 300 seconds.

Fixes: https://tracker.ceph.com/issues/61394
Signed-off-by: Xiubo Li <xiubli@redhat.com>

run-make-check.sh: use clang-17 if available

now that clang-17 has been released, let's use it if available.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

valgrind: UninitCondition under __run_exit_handlers suppression

reqiered in CentOS / RHEL 9 & Ubuntu 22.04.1 LTS

Fixes: https://tracker.ceph.com/issues/62141

Signed-off-by: Mark Kogan <mkogan@redhat.com>

doc/architecture: "Edit HA Auth"

Rewrite the explanation of how a client authenticates against a monitor.
This is a rewrite of a single paragraph, and has been set apart in its
own PR so that it can receive the maximum amount of scrutiny that the
upstream Ceph community can muster.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

cephadm: fix unit tests executing FileLock type

The FileLock type doesn't play much of a role when running tests so
to prevent issues, always mock it out when using with_cephadm_ctx.

In particular, a future patch revealed a problem with the FileLock code
that I can not understand how it was not hit before, or why this simple
refactoring - not directly related to file locking - triggered it. But
in short, the FakeFilesystem mocking utility only covers some syscalls.
In fact, the fake filesystem was returning an fd that was then passed to
real calls (fcntl and os.close).  The latter then triggered issues when
pytest was trying to clean up after it applied it's magic to stdio
objects in sys. The fix is easy - understanding why it happens and how
was hard.  I still don't understand why it popped up when it did only
that this is necessary to implement the following patches.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move a pair of systemd unit status funcs to systemd.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move CephContainer/similar to new container_types.py

Part of general cephadm split-up refactoring. I am not happy with the
name 'container_types' but none of the alternatives I could think
of were much better.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: black format systemd.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: black format container_types.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

mgr/cephadm: removing double quotes from the generated nvmeof config
Fixes: https://tracker.ceph.com/issues/62838

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

doc/architecture: edit "HA Authentication"

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

mgr/dashboard: fix prometheus queries subscriptions

Fixes: https://tracker.ceph.com/issues/62868
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

mgr/dashboard: remove empty popover when there are no health warns

Fixes: https://tracker.ceph.com/issues/62846
Signed-off-by: Nizamudeen A <nia@redhat.com>

cephadm: start decorators.py in cephadmlib

Originally, wanted to move all the decorators into
their own files. Unfortunately, that isn't possible
at this time as most of them depend on things that
are still within cephadm.py This includes

list_daemons
_rm_cluster
is_fsid
termcolor
ContainerInfo
Ceph

and I'm sure I'm missing some others. We'll have to
revisit this again later when more of these things
have moved, or they can be slowly moved as their
dependencies are.

Signed-off-by: Adam King <adking@redhat.com>

cephadm: black format initial decorators.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: create host_facts.py in cephadmlib

For storing classes/functions related to gathering
information about the hosts such as disk enclosures
and networks

Signed-off-by: Adam King <adking@redhat.com>

cephadm: format black host_facts.py

Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: add ability to zap OSDs' devices while draining host

Currently, when cephadm drains a host, it will remove all OSDs on
the host, but provides no option to zap the OSD's devices afterwards.
Given users are draining the host likely to remove it from the cluster,
it makes sense some users would want to clean up the devices on the
host that were being used for OSDs. Cephadm already supports zapping
devices outside of host draining, so it makes shouldn't take much to
add that functionality to the host drain as well.

Fixes: https://tracker.ceph.com/issues/61593

Signed-off-by: Adam King <adking@redhat.com>

pybind/mgr/pg_autoscaler: noautoscale flag retains individual pool configs

Problem:

The pg_autoscaler `noautoscale flag` doesn't retain individual pool states of
`autoscale mode`. For example turn the flag `ON` and then `OFF` again all
the pools will have `autoscale mode on` which is inconvenience for the user
because sometimes the user just want to temporary disable the autoscaler on
all pools and will enable it back after a period of time while retaining
individual pool states of `autoscale mode`

Solution:

We store noautoscale flag in the OSDMAP such that it is
persistent. We then get rid of noautoscale MODULE OPTION
in the pg_autoscaler module since we do not need it anymore.
Everytime we set, unset or get the flag we rely on looking up
the OSDMAP, we did this because we want to avoid inconsistancy
between the `noautoscale flag`. This is because `noautoscale flag`
can easily be set by doing `ceph osd set noautoscale`.

Fixes: https://tracker.ceph.com/issues/61922

Signed-off-by: Kamoltat <ksirivad@redhat.com>

qa/workunits: modified tests for noautoscale flag change

modified:

`qa/workunits/mon/test_noautoscale_flag.sh`
`qa/workunits/cephtool/test.sh`

adding test coverage to files mentioned above

Fixes: https://tracker.ceph.com/issues/61922

Signed-off-by: Kamoltat <ksirivad@redhat.com>

doc/architecture: edit "SDEH"

Edit the front matter of the "Smart Daemons Enable Hyperscale" section
of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

crimson/os/seastore/cache: don't add EXIST_CLEAN extents to lru

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

crimson/os/seastore/cache: replace is_clean by is_stable_clean wherever
possible

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

crimson/os/seastore/transaction_manager: move intermediate_key by
"remap_offset" when remapping the "back" half of the original pin

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

os/bluestore: Only capture time of oldest operation

Used to capture entire oldest operation.
Now only captures oldest operation time.
Changes parsing to quickly exit when op is not "osd_op".

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

os/bluestore: Fix setting osd_op_history_size

To make OSD immediately react to new config one must set value directly
instead of changing configuration.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

os/bluestore: scraper, fix sleep time

Typo with +/- caused problem with calculation of osd.ready_time
causing target OSD to be excluded from processing cycle.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

os/bluestore: scraper: make fixed history duration

Set fixes 2s of OSD history ops duration.
OSD behaves wierd if there is full ops size.
Make duration small to lets ops leave capture window quickly.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

os/bluestore: scraper: Make window size calculation dumb

OSD has a wierd handling of ops when window is full.
Make window size adaptation really-really simple.
Now window is set to 1.5 * captured ops.
Window is never shortened.
Deleted unused code that related to periodic window size update.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

fixup

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

pybind/mgr/pg_autoscaler: fix warn when not too few pgs

Problem:

when `pg_num_final` is equal to `pg_num_target`
we get too many PGs warnings in ceph health while
`warn` mode in the autoscaler.

Solution:

Get rid of `else` condition and add an
`elif p['pg_num_final'] < p['pg_num_target']`
instead

Fixes: https://tracker.ceph.com/issues/61570

Signed-off-by: Kamoltat <ksirivad@redhat.com>

osd: fix manifest object not to be promoted when references_chunk called

When a cls_cas_references_chunk() is called on a chunked metadata object,
it makes the object's chunks be promoted in maybe_handle_manifest_detail().
It happens, for instance, while doing a chunk scrub job.

However, this operation doesn't need to get evicted data.
It only needs metadata information that already exists in metadata object.
To prevent this object promotion, this commit adds an exception handling for this.

Signed-off-by: Sungmin Lee <sung_min.lee@samsung.com>

client: Add multi_target_id to handle libcephfs mds_spce=* command

To fix multi target MDSs command result overwritten issue, add multi_target_id feature in CommandTable.
Also, add multi_target_id in CommandOp to track all the multi target commands end to make formatted result

Signed-off-by: Jimyeong Lee <jinmyeong.lee@linecorp.com>

client: Apply multi target MDS result handling

To fix the issue that multi target MDSs result is overwritten

Signed-off-by: Jimyeong Lee <jinmyeong.lee@linecorp.com>

client: Fix 1 active, multi standy mds condition timing issue

When there are 1 active MDS and several standby MDSs and the result of standby-MDS comes after the active-MDS's,
the closing square bracket cannot be added.
To Fix this issue, add one more condition

Signed-off-by: Jimyeong Lee <jinmyeong.lee@linecorp.com>

client: Adjust multi_targets map assert condition

When access to map, even do not add element to set of map, the set is initiated,
so make the assert condition reasonable.

Signed-off-by: Jimyeong Lee <jinmyeong.lee@linecorp.com>

client: Add tailing closing square bracket once in handle_command_reply

Delete unnecessary method, Add multi_id to logs

Signed-off-by: Jimyeong Lee <jinmyeong.lee@linecorp.com>

test: client: Add multi target MDSs command tests

Signed-off-by: Jinmyeong Lee <jinmyeong.lee@linecorp.com>

mon/ConfigMonitor: Show localized name in "config dump --format json" output

The "ceph config dump" command without the json formatted output shows
the localized option names and their values. An example of a normalized
vs localized option is shown below:

Normalized: mgr/dashboard/ssl_server_port (maintaned within Option struct)
Localized: mgr/dashboard/x/ssl_server_port (maintained in mon store)

But the "ceph config dump --format json*" output showed the normalized
option names which was not consistent with the "config dump" output.
The output of the command along with variations for pretty printing must
show the same content.

This commit introduces a new member within the ConfigMap's MaskedOption
struct called "localized_name". This is initialized to the localized name
as part of ConfigMonitor::load_config() method.

The MaskedOption::dump() used for the json formatting is modified to
display the localized_name instead of the normalized name.

Fixes: https://tracker.ceph.com/issues/62379
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

PendingReleaseNotes: Note change to 'ceph config dump' pretty-print output.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

msg/AsyncMessenger: re-evaluate the stop condition when woken up in 'wait()'

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Fixes: https://tracker.ceph.com/issues/62395

os/bluestore: add some slow count for bluestore

Add slow count as below:
- l_bluestore_slow_aio_wait_count
- l_bluestore_slow_committed_kv_count
- l_bluestore_slow_read_onode_meta_count
- l_bluestore_slow_read_wait_aio_count

We can get a count while bluestore happens slowly,
in some cases, this is more useful than average latency.
Add it to prometheus, we can get it from the dashboard.

Signed-off-by: Yite Gu <yitegu0@gmail.com>

tools: add std:: qualifiers to 'move'

to silence compiler warnings.
e.g. (ceph_dedup_tool.cc:1104:32: warning: unqualified call to
'std::move' [-Wunqualified-std-cast-call]
     estimate_threads.push_back(move(ptr));
                               ^
                               std::

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

rgw/test: add std:: qualifiers to 'move'

to silence compiler warnings.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

doc/architecture: edit "OSDs service clients directly"

Edit "OSDs service clients directly" in the list in
"Smart Daemons Enable Hyperscale" in doc/architecure.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

cephfs-shell: drop LooseVersion for version.parse

Fixes: https://tracker.ceph.com/issues/62739
Signed-off-by: Jos Collin <jcollin@redhat.com>

doc: update colorama, packaging

Fixes: https://tracker.ceph.com/issues/62739
Signed-off-by: Jos Collin <jcollin@redhat.com>

mgr/dashboard: fix cephfs forms validations

1. CephFS Edit Form didnt had any validation for name eventhough the
   create had. So reused the Create form to display the Edit as well

2. Add Name Validations to Subvoume and Subvolume group forms

3. Removed the datePipe from the cephfs list template since we are using
   the relativeDate.

Fixes: https://tracker.ceph.com/issues/62939
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: update to angular v15

- The scss import was broken because of the ~ symbol. Looks like its not
needed.

- Login username/password label was somehow broken because of the
placeholder class and color. instead of applying the color through a
class I applied the color directly to the attribute and it worked

- Typescript 4.9 uses ES2022 and it complaints about using some items
  before its initialization. There were other typescript fixes need to
be delivered because of this change.

- Reverting back the badge to rectangular shape (because I feel like the
  round leaves out some empty spaces)

Fixes: https://tracker.ceph.com/issues/62844
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: update nodejs to 18.17.0

the latest npm doesn't support setting python as a config like `npm
config set python3` instead it needs to be either set in the node-gyp
explicitly using the node-gyp command or through an environment
variable.
Since we are calling the node-gyp through npm, we need to set the
environment variable which is documented here: https://github.com/nodejs/node-gyp?tab=readme-ov-file#configuring-python-dependency

Accordingly the CMakeLists.txt for dashboard is adapted

Fixes: https://tracker.ceph.com/issues/62844
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: adapt and refactor jest test files

Use the `configureTestBed` as the placeholder for adding the
declarations, imports... that is required for the unit tests to run

Fixes: https://tracker.ceph.com/issues/62844
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: upgrade to cypress 12

Looks like chrome 117 will need cypress >=12.15.0
https://github.com/cypress-io/cypress-documentation/issues/5479

Signed-off-by: Nizamudeen A <nia@redhat.com>

exporter: add ceph_daemon labels to labeled counters as well

Exporter missed adding the `ceph_daemon` or `instance_id`
labels(in case if rgw metrics) to the new labeled performance counters.

Fixes: https://tracker.ceph.com/issues/62874
Signed-off-by: avanthakkar <avanjohn@gmail.com>

common/tracer: remove is_enabled check in add_span methods

when tracing is disabled globally, new spans won't be added
to existing traces, because of that if condition.
this can happen also if a specific trace was enabled by lua script

so in case tracing is disabled, the tracer will create new spans
if it's parent span is not a noop span, regardless of tracer state

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>

rgw: add test case to reproduce bucket check stats bug for versioned bucket

Reproduces a regression where radosgw-admin bucket check incorrectly counts
objects that started as unversioned and later transitioned to versioned.

Signed-off-by: Cory Snyder <csnyder@1111systems.com>

rgw: fix radosgw-admin bucket check stat calculation bug

Fixes a regression with radosgw-admin bucket check stat
calculation and bucket reshard stat calculation when
there are objects that have transitioned from unversioned
to versioned. The bug was introduced in
152aadb71b61c53a4832a1c8cf82fce3d64b68d1.

Signed-off-by: Cory Snyder <csnyder@1111systems.com>

rgw: fix output formatting of bucket index check admin api

The bucket index check admin API was previously returning invalid
JSON.

Signed-off-by: Cory Snyder <csnyder@1111systems.com>

crimson/vstart: add --seastore-device-size option in vstart.sh command line

default seastore_device_size will be out of space for smp >28

Signed-off-by: chunmei <chunmei.liu@intel.com>

qa/rgw/sts: keycloak task installs java manually

java had already been installed automatically before centos 9. add an
override to install the jdk-17 packages manually

Fixes: https://tracker.ceph.com/issues/62536

Signed-off-by: Casey Bodley <cbodley@redhat.com>

doc/architecture: edit "OSD Membership and Status"

Edit "OSD Membership and Status" in the "Smart Daemons Enable
Hyperscale" section of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

qa: fix "no orch backend set" in nfs suite

Fixes: https://tracker.ceph.com/issues/62870
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>

doc/architecture: edit "Data Scrubbing"

Edit the "Data Scrubbing" listitem in the list of benefits conferred by
the use by OSDs of the aggregate power of the cluster, in the section
"Smart Daemons Enable Hyperscale" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

script/backport-resolve-issue: Update script with latest versions
Signed-off-by: Sayantani Saha <ii.sayantani.ii@gmail.com>

doc/architecture: edit "Replication"

Edit "Replication" in the "Smart Daemons Enable Hyperscale" section of
doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

cephadm: move a logging line closer to where the data is used

Move a logging line closer to where the data being logging is
used. This avoids having a dependency on logging in a fairly
simple function and should make moving the function in a future
commit easier.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move context based getters to context_getters.py

Move functions that exist mainly to pull information out of the
CephadmContext in various ways to a new context_getters.py module.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: rename fetch_tcp_ports to fetch_endpoints

Rename fetch_tcp_ports to fetch_endpoints to more closely match what
the function is doing.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: black format context_getters.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: remove (doc)string

Remove a, now irrelevant (IMO), docstring that might have been
associated with the recently moved `cached_stdin` global. It's not
really clear how helpful it is in light of the new "compiled"
cephadm, so I am opting to remove it rather than move it.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move pathify & get_file_timestamp to file_utils

Signed-off-by: John Mulligan <jmulligan@redhat.com>

mgr/cephadm: fix REFRESHED column of orch ps being unpopulated

The way the daemon ls data was processed was changed in
https://github.com/ceph/ceph/commit/1fd4132c7c03602719f29230732b12c8afa04779
and it seems that commit removed a line that set the
last_refresh field. This commit just adds it back
in the new location after the change.

Without this in "ceph orch ps" the REFRESHED column
for every daemon just reports "-"

Fixes: https://tracker.ceph.com/issues/62954

Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: add unit test for _process_ls_output

This is a weird function to make a unit test for
since it's essentially just moving data from a
list of dicts into a list of DaemonDescriptions,
but wanted to have some coverage to lower the
chance of breaking something again.

Signed-off-by: Adam King <adking@redhat.com>

cephadm: move more funcs into data_utils.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: re-format black data_utils.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: move logging from registry_login to command_registry_login

So that registry_login can be moved to container_engines.py
without creating a dependency on logging there

Signed-off-by: Adam King <adking@redhat.com>

cephadm: move registry_login to container_engines.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: re-format black container_engines.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: move more funcs into net_utils.py

Signed-off-by: Adam King <adking@redhat.com>

cephadm: add unit test for get_ipv6_address

I wanted to modify this function slightly
to try to make both black and flake8 happy
with it, so adding a unit test to make sure
I don't break it.

Signed-off-by: Adam King <adking@redhat.com>

cephadm: re-format black net_utils.py

There was a conflict here between what black
and flake8 were okay with. After running
format-black flake8 would report

cephadmlib/net_utils.py:211:29: E203 whitespace before ':'
cephadmlib/net_utils.py:259:25: E203 whitespace before ':'
cephadmlib/net_utils.py:272:27: E203 whitespace before ':'

but removing the whitespace before the ":" would
cause black to complain. For parse_mon_ip and
parse_mon_addrv, it was doing array slicing with
a start of "0" so I believe we can just remove the
start point without affecting anything (since "0" is
just the beginning of the string anyway). For
get_ipv6_address it had to actually be altered in
a way that had the potential to be done incorrectly,
so I added a unit test for it in a previous commit
in order to make sure we maintain the behavior.

Signed-off-by: Adam King <adking@redhat.com>

doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

mgr: add throttle policy for DaemonServer
This commit fix the throttle parameter of osd does not take effect for mgr
Fixes: https://tracker.ceph.com/issues/61942

Signed-off-by: ericqzhao <125110732+ericqzhao@users.noreply.github.com>

osd: Report health error if OSD public address is not within subnet

In a containerized environment after a OSD node reboot, due to
some race condition in systemd some OSDs registered their
v1/v2 public addresses on cluster network instead on
defined public_network. Report this inconsistency as a health
error as RADOS clients fail to connect to the cluster.

Fixes: https://tracker.ceph.com/issues/56057

Signed-off-by: Prashant D <pdhange@redhat.com>

RGW | Bucket Notification: migrating old entries to support persistency control

Signed-off-by: Ali Masarwa <ali.saed.masarwa@gmail.com>

test: corrected control reaches end by adding a return

Signed-off-by: Patty8122 <divyapattisapu@uchicago.edu>

doc/architecture: edit "Calculating PG IDs"

Edit the section "Calcluating PG IDs" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

cephadm: fix haproxy version with certain containers

Some builds of haproxy containers' output
from "haproxy -v" start with

HAProxy version

rather than

HA-Proxy version

no reason on our end not to accept both

Signed-off-by: Adam King <adking@redhat.com>

cephadm: remove get_unit_name_by_instance func

As it is one line, quite simple, and only
had a single caller, it was decided we'd remove
this function as part of the cephadm refactor.

Signed-off-by: Adam King <adking@redhat.com>

rgw: Fix bucket validation against POST policies

It's possible that user could provide a form part as a part of a POST
object upload that uses 'bucket' as a key; in this case, it was
overriding what was being set in the validation env (which is the real
bucket being modified). The result of this is that a user could actually
upload to any bucket accessible by the specified access key by matching
the bucket in the POST policy in said POST form part.

Fix this simply by setting the bucket to the correct value after the
POST form parts are processed, ignoring the form part above if
specified.

Fixes: https://tracker.ceph.com/issues/63004

Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>

rgw: fix unwatch crash at radosgw startup

During radosgw initialization, if there is an exception in init_watch that causes the watcher registration to fail,
When finalize_watch is executed, a crash occurs due to unregister an unregistered watch.

Fixes: https://tracker.ceph.com/issues/60094

Signed-off-by: lichaochao <lichaochao2_yewu@cmss.chinamobile.com>

rgw/async: use optional_yield for keystone and kms requests

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/keystone: EC2Engine uses reject() for ERR_SIGNATURE_NO_MATCH

ERR_SIGNATURE_NO_MATCH means that we found the given access key in
keystone, so we should use reject() instead of deny() to prevent
other engines like LocalEngine from looking up the access key again

this change causes us to return the SignatureDoesNotMatch error expected
by s3test case test_list_buckets_bad_auth()

Fixes: https://tracker.ceph.com/issues/62989

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/multisite: call drain before flushing markers in incremental sync

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>

rgw: fix rgw rate limiting RGWRateLimitInfo class decode_json max_read_bytes and max_write_bytes field mismatch

Fixes: https://tracker.ceph.com/issues/62955
Signed-off-by: xiangrui meng <mengxr@chinatelecom.cn>

rgw: s3website doesn't prefetch for web_dir() check

this function only needs to check for existence of the given path.
the sal::Object is destroyed before the function returns, so it's
wasteful to prefetch its data

Fixes: https://tracker.ceph.com/issues/62938

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: fix SignatureDoesNotMatch when extra headers

Headers start with 'x-amz' but not 'x-amz-', should not be in the list of CanonicalHeaders.

Signed-off-by: rui ma <marui1@chinatelecom.cn>

rgw: improve the efficiency of buffer list utilization of chunk upload

Reduced waste of buffer::ptr by receiving multiple chunks and filling them into the buffer

AWSv4ComplMulti::recv_body() just receive one chunk and fill it into buffer.
Each 4MB buffer is actually only utilizing 64KB, leading to frequent buffer allocations.
~800GB virtual memory consumption has been observed.

Signed-off-by: liubingrun <liubr1@chinatelecom.cn>

rgw/lc: remove_bucket_config() doesn't update xattrs on bucket delete

we're deleting the bucket instance metadata anyway, so there's no reason
to send an additional write to remove the RGW_ATTR_LC xattr first. this
write bumps the cls_version and can cause the actual delete op to fail
with ECANCELED

Fixes: https://tracker.ceph.com/issues/62411

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/lc: bucket delete only calls remove_bucket_config() if RGW_ATTR_LC

if there's no RGW_ATTR_LC, don't try to do any lifecycle-related cleanup

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/file: make setattr(...) a no-op on buckets

Shallow fix for apparent unstable behavior after nfs "chown" on
an RGW bucket via RGW NFS.  While we allow buckets to be created
(and subject to ordinary rules, deleted), chown against a bucket
hasn't been tested and potentially is not valid.  Prevent it
altogether for now--if permissions would allow it, chown will
succeed but won't have any effect.

Fixes: https://tracker.ceph.com/issues/61689

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

rgw/ops-log: explicitly specify object name in the log entry

Pseudo-directories can be used when naming an object (key): e.g.,
"my/weird/object.txt". This would lead to cases where the name of the object
cannot be derived deterministically. For example, if an object name uses
pseudo-directories and starts with "<bucket_name>/" and virtual-host style is
used when accessing object, we cannot figure out the name of the object. Note
that in DNS (virtual host) style bucket naming, since bucket name is specified as
a part of the host name, URI doesn't contain any reference to the bucket.

boto_client = boto3.client(..., config=Config(s3={'addressing_style': 'virtual'}))

boto_client.put_object(
        Body=b"this is the data",
        Key="my-bucket/my-object",
        Bucket="my-bucket"
)

The corresponding log entry is

{   ...,
    "bucket":"my-bucket", ...,
    "uri":"PUT /my-bucket/my-object HTTP/1.1",
    ...
}

We can falsely conclude that the name of the object is "my-object".

By having the name of the object listed in the log-entry, we address
this ambiguity.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>

qa/suites/krbd: rename singleton to singleton-msgr-failures

A "singleton without msgr-failures" is wanted in the next commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

qa/suites/krbd: stress test for recovering from watch errors

Fixes: https://tracker.ceph.com/issues/63010
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

rgw/test: fix compiler warning

Fixing a compiler warning regarding ambiguity of
the overloaded operator '==' (as it allows a one-sided
const operand)

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

cls_lock: expired lock before unlock and start check

If the lock expired, the stat check shouldn't return -ENOENT,
We will change the lock duration to prevent lock expired before the
stat check.

Fixes: https://tracker.ceph.com/issues/56575
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

osd: correct unsigned/signed compiler wrn

    /home/pdonnell/ceph/src/osd/OSD.cc: In member function ‘void OSD::ShardedOpWQ::stop_for_fast_shutdown()’:
    /home/pdonnell/ceph/src/osd/OSD.cc:11143:41: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
    11143 |   for (int shard_index = 0; shard_index < osd->num_shards; shard_index++) {

Fixes: https://tracker.ceph.com/issues/62851
Fixes: 210dbd4ff19ea66fd2f0109cc15aad53349be52f
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mgr/dashboard: fix cephfs form validator

Number is not allowed as the starting character of the mds service

Fixes: https://tracker.ceph.com/issues/63005
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: allow tls 1.2 with a config option

Provide the option to allow tls1.2

`ceph dashboard set-enable-unsafe-tls-v1-2 True` followed with a mgr
restart will enable tls 1.2.

With tls1.2 enabled
```
╰─$ nmap -sV --script ssl-enum-ciphers -p 11000 127.0.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2023-09-27 16:56 IST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00018s latency).

PORT      STATE SERVICE  VERSION
11000/tcp open  ssl/http CherryPy wsgiserver
|_http-server-header: Ceph-Dashboard
| ssl-enum-ciphers:
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|       TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CCM (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CCM (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.3:
|     ciphers:
|       TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_CCM_SHA256 (ecdh_x25519) - A
|     cipher preference: server
|_  least strength: A

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 16.55 seconds
```

Without tls1.2 enabled (which defaults to tls 1.3)
```
╰─$ nmap -sV --script ssl-enum-ciphers -p 11000 127.0.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2023-09-27 16:54 IST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000075s latency).

PORT      STATE SERVICE  VERSION
11000/tcp open  ssl/http CherryPy wsgiserver
| ssl-enum-ciphers:
|   TLSv1.3:
|     ciphers:
|       TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_CCM_SHA256 (ecdh_x25519) - A
|     cipher preference: server
|_  least strength: A
|_http-server-header: Ceph-Dashboard
```

Fixes: https://tracker.ceph.com/issues/62940
Signed-off-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: fix the landing page layout issues

We were following a row-col grid layout for the landing page.
First row includes Details, Status and Capacity
Second row for Inventory and Cluster Utilization

So if one of the item in the first row increases, it pushes the entire
second row downwards.

To fix this, I made a col-row grid.

First col has Details and Inventory in two rows.
Second col has Status and Capacity as a col and Cluster Utilization as a
single row

Fixes: https://tracker.ceph.com/issues/62961

Signed-off-by: Nizamudeen A <nia@redhat.com>
Co-authored-by: cloudbehl <cloudbehl@gmail.com>

mds/FSMap: allow upgrades if no up mds

This is to support the fail_fs scenario for cephadm where max_mds >= 1
and all MDS are down.

Fixes: https://tracker.ceph.com/issues/62682
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

cephadm: start ssh.py in cephadmlib

As part of the cephadm refactoring process
to split cephadm into multiple python files,
start "ssh.py" that includes some functions used
for setting up and testing ssh connections,
primarily as part of bootstrap.

Signed-off-by: Adam King <adking@redhat.com>

cephadm: format black cephadmlib/ssh.py

Signed-off-by: Adam King <adking@redhat.com>

mgr/dashboard: show a message to restart the rgw daemons after moving from single-site to multi-site

Fixes: https://tracker.ceph.com/issues/62984

Signed-off-by: Aashish Sharma <aasharma@redhat.com>

mgr/dashboard: enable protect option if layering enabled

Fixes: https://tracker.ceph.com/issues/63076
Signed-off-by: avanthakkar <avanjohn@gmail.com>

osd: fix read balancer logic to avoid redundant primary assignment

Fixes: https://tracker.ceph.com/issues/62833
Signed-off-by: Laura Flores <lflores@ibm.com>

osd/OSDMonitor: check svc is writeable before changing pending

Fixes: https://tracker.ceph.com/issues/59813
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon: refactor loop variable names

To make it easier to read.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mgr/dashboard: Rgw Multi-site naming improvements

Fixes: https://tracker.ceph.com/issues/62721

Signed-off-by: Aashish Sharma <aasharma@redhat.com>

mgr/dashboard: rbd image hide usage bar when disk usage is not provided

Fixes: https://tracker.ceph.com/issues/63037
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

mds: add option mds_bal_overload_epochs

Add an option to configure the number of epochs the overload lasts before migrating,
setting it to a higher value can avoid frequent migrations caused by load fluctuations.

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>

mds: fix stray CInodes' use-after-free bug when submit ELid entry

When submitting a journal log entry it could start a new segment
and it could advance the stray CInodes, which has been released
just before it. Just skip advancing the stray dentries when MDS is
shutting down.

Reported-by: Patrick Donnelly <pdonnell@redhat.com>
Fixes: commit 5a537476544("mds: introduce ELid event to create/close log")
Fixes: https://tracker.ceph.com/issues/62861
Signed-off-by: Xiubo Li <xiubli@redhat.com>

doc/rados: edit ops/control.rst (2 of x)

Edit doc/rados/operations/control.rst (2 of x).

Co-authored-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

mgr/dashboard: Fix user/bucket count in rgw overview dashboard

Donot consider buckets/users count from daemons that have similar realm
name

Fixes: https://tracker.ceph.com/issues/62964

Signed-off-by: Aashish Sharma <aasharma@redhat.com>

test/allocator_replay_test: add assess_free command.

This permits to estimate amount of free space for the given
allocation unit.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>

test/hybrid_allocator_test: a couple broken cases

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>

os/bluestore: fix edge case for bitmap alloc's claim_free_to_left(0)
call.

This imporperly marked the first 64 chunks as allocated.
Apaprently not critical for production since offset(0) is never
released.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>

os/bluestore: Hybrid Allocator might unexpectedly returned ENOSPC

This happened when secondary allocator returned no additional
extents while primary one still provided a few but less than enough.
This is rather a non-critical issue but it violated our informal
agreement for allocators which can return less space than requested.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>

test/allocator_replay_test: proper command line options setup

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>

mgr/dashboard: fix bootstrap script for cephadm installation

Fixes: https://tracker.ceph.com/issues/62827
Signed-off-by: avanthakkar <avanjohn@gmail.com>

dashboard: regression, make install fails w/dashboard disabled

https://tracker.ceph.com/issues/63100

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

mgr/dashboard: fix rgw inventory card and broken shadows

Mess up of the dashboard landing page layout fixes PR

Fixes: http://tracker.ceph.com/issues/62961
Signed-off-by: Nizamudeen A <nia@redhat.com>

RGW: add the missing help print for command 'topic stats'

Signed-off-by: Ali Masarwa <ali.saed.masarwa@gmail.com>

rgw: Add coverity annotation for warning about tautological comparison

Signed-off-by: Vedansh Bhartia <vedanshbhartia@gmail.com>

doc/rados: edit troubleshooting.rst

Edit doc/rads/troubleshooting.rst to remove some language that sounds
quite close to marketing language.

Signed-off-by: Zac Dover <zac.dover@proton.me>

vstart: exclude default route during cluster setup

"ip route list" may list default route, and that needs to be excluded
while doing cluster setup
Typical o/p of ip route list:
$ ip route list
default via 10.8.159.254 dev eno1 proto dhcp src 10.8.152.13 metric 100
10.8.152.0/21 dev eno1 proto kernel scope link src 10.8.152.13 metric 100

Signed-off-by: Sachin Punadikar <sachin.punadikar@ibm.com>

mgr/dashboard: fixed cephfs snapshot & Quota list

fixes: https://tracker.ceph.com/issues/63007

Signed-off-by: cloudbehl <cloudbehl@gmail.com>

mds: disable delegating inode ranges to clients

Fixes: http://tracker.ceph.com/issues/63103
Signed-off-by: Venky Shankar <vshankar@redhat.com>

qa: start testing mds_client_delegate_inos_pct config

Signed-off-by: Venky Shankar <vshankar@redhat.com>

PendingReleaseNotes: add a note about disallowing delegating inodes

Signed-off-by: Venky Shankar <vshankar@redhat.com>

cephadm: add some unit test coverage for deploying nfs, snmp

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: add daemon_form.py: bases and funcs for daemon forms

Create daemon_form.py containing the DaemonForm class and a few
subclasses and utility functions for working with DaemonForms.
In a future commit, DaemonForm will become the base class for
the current assortment of classes named after the daemon or
family of daemon they help manage.

A daemon form, think "form" as in "template" or "mold", assists
in setting up, creating, and managing daemons controlled with
cephadm. Because cephadm supports a variety of services the
DaemonForm is an abstract base class and the module also supports
additional ABCs that may be used by DaemonForms to implement
optional features.

The daemon forms that are expected to be used directly must be
registered using the provided decorator. This is an explicit extra
step so that common bases that inherit from DaemonForm can be
implemented. Plus explicit is better than implicit. :-)
All DeamonForm subclasses are expected to provide a small set
of standard methods so that the types can be chosen, instantiated,
and used a common manner.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: introduce daemon forms to cephadm.py

Introduce the DeamonForm base class to cephadm.py and make various
daemon-type classes into fully fleged deamon form classes.

Some classes already had a semi-standard `init` classmethod for
instantiation. In these cases the new `create` classmethod is a thin
wrapper over the existing method. In cases where the class was not
already being instantiated a minimal set of methods are added.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: add test_daemon_form.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: remove direct daemon-type deps from sysctl

Using the appropriate daemon form we can break the direct dependency
that the sysctl setup function has on particular classes and use
a generic interface.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move sysctl specific functions to sysctl.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: remove direct daemon-class deps from firewall

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move firewalld related items to firewalld.py

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: move DeploymentType to deploy.py

The DeploymentType is used by a number of other classes and functions
and has no dependencies beyond enum and is safe to move.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: add ContainerDaemonForm

Add a supplemental DaemonForm subclass that helps deploy container
based daemons in a standard fashion. Most of these methods are
optional and should have sensible defaults.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: add func to deploy any generic ContainerDaemonForm

While there are no ContainerDaemonForms implemented yet, add a function
that uses the ContainerDaemonForm methods to construct a deployment
for the container based daemons.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: convert NFSGanesha to a ContainerDaemonForm

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: convert CustomContainer to a ContainerDaemonForm

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: convert SNMPGateway to a ContainerDaemonForm

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: convert cephadm agent to a daemon form

The cephadm agent is a bit special in that it will not be converted
to a ContainerDaemonForm (it is not containerized) but we still want
to have it registered as a DeamonForm so that the deamon_type can be
passed to create and have it resolve correctly.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

ceph orch add fails when ipv6 address is surrounded by square brackets.

fixes: https://tracker.ceph.com/issues/61885
fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2153448

Signed-off-by: Teoman ONAY <tonay@ibm.com>

doc: remove egg fragment from dev/developer_guide/running-tests-locally

DEPRECATION: git+https://github.com/ceph/teuthology#egg=teuthology
[test] contains an egg fragment with a non-PEP 508 name pip 25.0 will enforce
this behaviour change. A possible replacement is to use the req @ url syntax,
and remove the egg fragment. Discussion can be found at
https://github.com/pypa/pip/issues/11617

Signed-off-by: Dhairya Parmar <dparmar@redhat.com>

rgw: Add coverity annotations for missing mutex locks

Signed-off-by: Vedansh Bhartia <vedanshbhartia@gmail.com>

osd: fix: slow scheduling when item_cost is large

We use the iops and bandwidth tested by
`ceph tell osd.0 bench 10737418240 204800 204800 100`
to verify the QoS function. iops was 400 and bandwidth was 80MiB/s.
When osd_mclock_scheduler_client_lim is set to 1,
the sequential write bandwidth is only half of the capacity.
Therefore, we believe that it should not unconditionally increase
osd_bandwidth_cost_per_io for each IO, but take the maximum of the two.

Fixes: https://tracker.ceph.com/issues/62812
co-author: yanghonggang <yanghonggang_yewu@cmss.chinamobile.com>
co-author: zhangjianwei <zhangjianwei2_yewu@cmss.chinamobile.com>
Signed-off-by: Jrchyang Yu <yuzhiqiang_yewu@cmss.chinamobile.com>

script: add option for debug build

See: https://github.com/ceph/ceph-build/pull/2167

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

doc/architecture: repair RBD sentence

Improve an ambiguous sentence in doc/architecture.rst.

The problem presented by the original sentence is that the phrasal verb
"to provide with" is implicated in one of its possible readings.
Interpreted in that way, the sentence seems to express the incorrect
idea that RBD furnishes block devices with snapshotting and cloning, as
though snapshotting and cloning are being delivered to the block
devices. In fact, snapshotting and cloning are just features of RBD, and
are features that are described on this page:
https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/.

Signed-off-by: Zac Dover <zac.dover@proton.me>

doc/rados: edit troubleshooting-mon.rst (3 of x)

Edit doc/rados/troubleshooting/troubleshooting-mon.rst.

Follows https://github.com/ceph/ceph/pull/52827

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

doc/rados: edit troubleshooting/community.rst

Edit doc/rados/troubleshooting/community.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

docs/cephadm: fix broken links in cephadm docs
In the documentation https://docs.ceph.com/en/quincy/cephadm/services/osd/, there is a broken link in "List Devices" chapter. This change fixes the documentation to point to the correct internal link under ceph/doc/rados/operations/devices

Fixes: https://tracker.ceph.com/issues/55763

Signed-off-by: sparvekar <shruti.sp07@gmail.com>
smanjara pushed a commit to smanjara/ceph that referenced this issue Oct 10, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Resolves:rhbz2242116

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 630ba3a)
(cherry picked from commit de4b500)
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this issue Nov 7, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this issue Dec 6, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this issue Dec 26, 2023
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this issue Jan 2, 2024
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
VallariAg pushed a commit to VallariAg/ceph that referenced this issue Jan 9, 2024
Looks like chrome 117 will need cypress >=12.15.0
cypress-io/cypress-documentation#5479

Signed-off-by: Nizamudeen A <nia@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants