Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Latest image crashes on startup #248

Closed
tommyalatalo opened this issue May 13, 2022 · 30 comments
Closed

[BUG] Latest image crashes on startup #248

tommyalatalo opened this issue May 13, 2022 · 30 comments
Labels
bug Something isn't working

Comments

@tommyalatalo
Copy link

I just switched over to the docker images ghcr.io/analogj/scrutiny:master-web and ghcr.io/analogj/scrutiny:master-collector since the docker hub ones have been taken down. Now my web instance is crashing on startup with this error message:

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc00038a070, 0x12a4b00, 0xc0003faa80, 0x129f9a0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000385610, 0x12a4b00, 0xc0003faa80, 0x1)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000385610, 0x0, 0x0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
main.main.func2(0xc000387340, 0x4, 0x6)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
github.com/urfave/cli/v2.(*Command).Run(0xc0003ef200, 0xc0003871c0, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/command.go:164 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc0003fe000, 0x128e820, 0xc0000c8010, 0xc0000be020, 0x2, 0x2, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:306 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
2022/05/13 14:38:05 Loading configuration file: /opt/scrutiny/config/scrutiny.yaml
time="2022-05-13T14:38:05Z" level=info msg="Trying to connect to scrutiny sqlite db: \n"
time="2022-05-13T14:38:05Z" level=info msg="Successfully connected to scrutiny sqlite db: \n"
panic: a username and password is required for a setup

There is no mention in the readme or the examble configs of a username/password, so what are the credentials that the application is missing and crashing over? Also, this error feels a lot like something that should be handled by the application and an informative error presented to the user.

@tommyalatalo tommyalatalo added the bug Something isn't working label May 13, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented May 13, 2022

Hey,
We've seen this before for users migrating from the LSIO image to the official omnibus image.

Can you paste your docker run command or docker-compose file?

@tommyalatalo
Copy link
Author

Hey, We've seen this before for users migrating from the LSIO image to the official omnibus image.

Can you paste your docker run command or docker-compose file?

I don't use docker cli or docker-compose, I'm running it on Nomad, but here is the file anyway:

variable "web_image" {
  type    = string
  default = "ghcr.io/analogj/scrutiny:master-web"
}

variable "collector_image" {
  type    = string
  default = "ghcr.io/analogj/scrutiny:master-collector"
}

job "scrutiny" {
  datacenters = ["main"]
  type        = "service"

  vault {
    policies    = ["scrutiny"]
    change_mode = "restart"
  }

  group "api" {
    constraint {
      attribute = "${node.unique.name}"
      value     = "nas"
    }

    network {
      mode = "bridge"
      port "http" {
        to = 8080
      }
    }

    task "scrutiny" {
      driver = "docker"


      config {
        image   = var.web_image
        cap_add = ["sys_admin", "sys_rawio"]
        ports   = ["http"]

        volumes = [
          "/run/udev:/run/udev:ro",
          "secrets/scrutiny.yaml:/opt/scrutiny/config/scrutiny.yaml",
          "/zpool/services/scrutiny/scrutiny.db:/scrutiny/config/scrutiny.db",
        ]
      }

      service {
        name = "${NOMAD_TASK_NAME}"
        port = "http"

        tags = [
          "api",
          "http",
          "traefik.enable=true",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.entrypoints=https",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.rule=Host(`${NOMAD_TASK_NAME}.tox.sh`)",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.middlewares=chain-internal-no-auth@file",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.tls=true",
        ]

        check {
          port     = "http"
          name     = "${NOMAD_TASK_NAME} health"
          type     = "http"
          path     = "/api/health"
          interval = "30s"
          timeout  = "1s"
        }
      }

      env {
        GIN_MODE              = "release"
        SCRUTINY_API_ENDPOINT = "http://localhost:8080"
        SCRUTINY_COLLECTOR    = "false"
        SCRUTINY_WEB          = "true"
      }


      template {
        destination = "secrets/scrutiny.yaml"
        change_mode = "restart"
        data        = file("./config/scrutiny/scrutiny.yaml")
      }
    }
  }

  group "collector-arch" {
    constraint {
      attribute = "${node.unique.name}"
      value     = "arch"
    }

    network {
      mode = "bridge"
    }

    task "wait-for-api" {
      driver = "docker"

      lifecycle {
        hook    = "prestart"
        sidecar = false
      }

      config {
        image   = "praqma/network-multitool:alpine-extra"
        command = "/bin/bash"

        args = [
          "-c",
          "while ! dig +short api.scrutiny.service.consul srv | grep -ve '^$'; do sleep 1; done",
        ]
      }
    }

    task "collector" {
      driver = "docker"

      config {
        image   = var.collector_image
        cap_add = ["sys_admin", "sys_rawio"]

        volumes = [
          "/run/udev:/run/udev:ro",
          "secrets/collector.yaml:/opt/scrutiny/config/collector.yaml",
        ]

        devices = [
          {
            host_path      = "/dev/sda"
            container_path = "/dev/sda"
          },
          {
            host_path      = "/dev/sdb"
            container_path = "/dev/sdb"
          },
          {
            host_path      = "/dev/sdc"
            container_path = "/dev/sdc"
          },
          {
            host_path      = "/dev/sdd"
            container_path = "/dev/sdd"
          },
          {
            host_path      = "/dev/sde"
            container_path = "/dev/sde"
          },
          {
            host_path      = "/dev/sdf"
            container_path = "/dev/sdf"
          },
        ]
      }

      template {
        destination = "secrets/collector.yaml"
        change_mode = "restart"
        data        = file("./config/scrutiny/collector.yaml")
      }
    }
  }

  group "collector-backup" {
    constraint {
      attribute = "${node.unique.name}"
      value     = "backup"
    }

    network {
      mode = "bridge"
    }

    task "wait-for-api" {
      driver = "docker"

      lifecycle {
        hook    = "prestart"
        sidecar = false
      }

      config {
        image   = "praqma/network-multitool:alpine-extra"
        command = "/bin/bash"

        args = [
          "-c",
          "while ! dig +short api.scrutiny.service.consul srv | grep -ve '^$'; do sleep 1; done",
        ]
      }
    }

    task "collector" {
      driver = "docker"

      config {
        image   = var.collector_image
        cap_add = ["sys_admin", "sys_rawio"]

        volumes = [
          "/run/udev:/run/udev:ro",
          "secrets/collector.yaml:/opt/scrutiny/config/collector.yaml",
        ]

        devices = [
          {
            host_path      = "/dev/nvme0"
            container_path = "/dev/nvme0"
          },
          {
            host_path      = "/dev/sda"
            container_path = "/dev/sda"
          },
          {
            host_path      = "/dev/sdb"
            container_path = "/dev/sdb"
          },
        ]
      }

      template {
        destination = "secrets/collector.yaml"
        change_mode = "restart"
        data        = file("./config/scrutiny/collector.yaml")
      }
    }
  }
  group "collector-nas" {
    constraint {
      attribute = "${node.unique.name}"
      value     = "nas"
    }

    network {
      mode = "bridge"
    }

    task "wait-for-api" {
      driver = "docker"

      lifecycle {
        hook    = "prestart"
        sidecar = false
      }

      config {
        image   = "praqma/network-multitool:alpine-extra"
        command = "/bin/bash"

        args = [
          "-c",
          "while ! dig +short api.scrutiny.service.consul srv | grep -ve '^$'; do sleep 1; done",
        ]
      }
    }

    task "collector" {
      driver = "docker"

      config {
        image   = var.collector_image
        cap_add = ["sys_admin", "sys_rawio"]

        volumes = [
          "/run/udev:/run/udev:ro",
          "secrets/collector.yaml:/opt/scrutiny/config/collector.yaml",
        ]

        devices = [
          {
            host_path      = "/dev/nvme0"
            container_path = "/dev/nvme0"
          },
          {
            host_path      = "/dev/sda"
            container_path = "/dev/sda"
          },
          {
            host_path      = "/dev/sdb"
            container_path = "/dev/sdb"
          },
          {
            host_path      = "/dev/sdc"
            container_path = "/dev/sdc"
          },
          {
            host_path      = "/dev/sdd"
            container_path = "/dev/sdd"
          },
          {
            host_path      = "/dev/sde"
            container_path = "/dev/sde"
          },
          {
            host_path      = "/dev/sdf"
            container_path = "/dev/sdf"
          },
        ]
      }

      template {
        destination = "secrets/collector.yaml"
        change_mode = "restart"
        data        = file("./config/scrutiny/collector.yaml")
      }
    }
  }
}

@evulhotdog
Copy link

@AnalogJ here's mine as well.

---
version: "2.1"
services:
  scrutiny:
    image: linuxserver/scrutiny
    container_name: scrutiny
    privileged: true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/New_York
      - SCRUTINY_API_ENDPOINT=http://localhost:8080
      - SCRUTINY_WEB=true
      - SCRUTINY_COLLECTOR=true
    volumes:
      - ./config:/config
      - /dev/sda:/dev/sda
      - /dev/sdb:/dev/sdb
      - /dev/sdc:/dev/sdc
      - /run/udev:/run/udev:ro
    ports:
      - 8081:8080
    restart: unless-stopped
    networks:
      - internet-facing

networks:
  internet-facing:
    external:
      name: internet-facing

@AnalogJ
Copy link
Owner

AnalogJ commented May 13, 2022

@tommyalatalo you'll need to remove the SCRUTINY_WEB=true and SCRUTINY_COLLECTOR=true environmental variables. They were used by the LSIO image, but cause issues with Scrutiny for some reason.

@evulhotdog you'll need to remove the same environmental variables I mentioned to @tommyalatalo , but you'll also need to update your image to ghcr.io/analogj/scrutiny:master-omnibus. The LSIO image is missing a new dependency that we introduced in v0.4.0+ (InfluxDB), and that causes issues. You can revert to a earlier version of the LSIO image ( lscr.io/linuxserver/scrutiny:060ac7b8-ls34), or just change to the official Scrutiny image (ghcr.io/analogj/scrutiny:master-omnibus)

I'm going to close this issue for now, feel free to comment/reopen it if you run into any other issues.

@AnalogJ AnalogJ closed this as completed May 13, 2022
@tommyalatalo
Copy link
Author

Removing SCRUTINY_WEB and SCRUTINY_COLLECTOR don't fix the problem, it now seems like the scrutiny app in the api container can't connect to the internal influxdb instance:

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc0000b0098, 0x12a4b00, 0xc000414bd0, 0x129f9a0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc0000ad6c0, 0x12a4b00, 0xc000414bd0, 0x10)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc0000ad6c0, 0x0, 0x0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
main.main.func2(0xc0000b7340, 0x4, 0x6)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
github.com/urfave/cli/v2.(*Command).Run(0xc0004170e0, 0xc0000b71c0, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/command.go:164 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc00009a300, 0x128e820, 0xc000038038, 0xc00000e080, 0x2, 0x2, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:306 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
2022/05/14 22:23:57 Loading configuration file: /opt/scrutiny/config/scrutiny.yaml
time="2022-05-14T22:23:57Z" level=info msg="Trying to connect to scrutiny sqlite db: /scrutiny/config/scrutiny.db\n"
time="2022-05-14T22:23:57Z" level=info msg="Successfully connected to scrutiny sqlite db: /scrutiny/config/scrutiny.db\n"
panic: Post "http://0.0.0.0:8086/api/v2/setup": dial tcp 0.0.0.0:8086: connect: connection refused

@tommyalatalo
Copy link
Author

tommyalatalo commented May 14, 2022

So now I switched to the omnibus image on my api node, but it also fails...
The container just loops with the below message over and over, though I can see that influxdb is running and listening on 8086 inside, and I've set SCRUTINY_WEB_INFLUXDB_HOST="http://localhost:8086"

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.4.4

Start the scrutiny server
waiting for influxdb
starting scrutiny

@AnalogJ
Copy link
Owner

AnalogJ commented May 15, 2022

Hey @tommyalatalo can you unset the SCRUTINY_WEB_INFLUXDB_HOST variable? its unnecessary, as the defaults should work: SCRUTINY_WEB_INFLUXDB_HOST=0.0.0.0

@AnalogJ
Copy link
Owner

AnalogJ commented May 15, 2022

actually, can you try setting it to SCRUTINY_WEB_INFLUXDB_HOST=localhost if unsetting the env var doesnt work.

@AnalogJ AnalogJ reopened this May 15, 2022
@raulfg3
Copy link

raulfg3 commented May 15, 2022

same problem here with latest docker:

yesterday at 17:25:48
yesterday at 17:25:48 ___ ___ ____ __ __ ____ ____ _ _ _ _
yesterday at 17:25:48/ ) / )( _ ( )( )( )( )( ( )( / )
yesterday at 17:25:48_
( (
) / )(
)( )( )( ) ( \ /
yesterday at 17:25:48(
/ _)()_)() () ()()_) (__)
yesterday at 17:25:48gitpro.ttaallkk.top/AnalogJ/scrutiny dev-0.4.4
yesterday at 17:25:48
yesterday at 17:25:48Start the scrutiny server
yesterday at 17:25:48[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
yesterday at 17:25:48 - using env: export GIN_MODE=release
yesterday at 17:25:482022/05/14 17:25:48 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
yesterday at 17:25:48time="2022-05-14T17:25:48+02:00" level=info msg="Trying to connect to scrutiny sqlite db: \n"
yesterday at 17:25:48 - using code: gin.SetMode(gin.ReleaseMode)
yesterday at 17:25:48
yesterday at 17:25:48time="2022-05-14T17:25:48+02:00" level=info msg="Successfully connected to scrutiny sqlite db: \n"
yesterday at 17:25:48panic: a username and password is required for a setup
yesterday at 17:25:48
yesterday at 17:25:48goroutine 1 [running]:
yesterday at 17:25:48gitpro.ttaallkk.top/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0xfeaac0, 0xc0000c4cd8}, {0xff3d50, 0xc000373f10})
yesterday at 17:25:48 /app/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xa5
yesterday at 17:25:48gitpro.ttaallkk.top/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000370610, {0xff3d50, 0xc000373f10})
yesterday at 17:25:48 /app/scrutiny/webapp/backend/pkg/web/server.go:26 +0xb4
yesterday at 17:25:48gitpro.ttaallkk.top/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000370610)
yesterday at 17:25:48 /app/scrutiny/webapp/backend/pkg/web/server.go:97 +0x3ab
yesterday at 17:25:48main.main.func2(0xc00034fb80)
yesterday at 17:25:48 /app/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x1f7
yesterday at 17:25:48gitpro.ttaallkk.top/urfave/cli/v2.(*Command).Run(0xc000359e60, 0xc00034fa00)
yesterday at 17:25:48 /root/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/command.go:164 +0x64a
yesterday at 17:25:48gitpro.ttaallkk.top/urfave/cli/v2.(*App).RunContext(0xc000240480, {0xfd4dd0, 0xc0000cc000}, {0xc0000c8000, 0x2, 0x2})
yesterday at 17:25:48 /root/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:306 +0x926
yesterday at 17:25:48gitpro.ttaallkk.top/urfave/cli/v2.(*App).Run(...)
yesterday at 17:25:48 /root/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:215
yesterday at 17:25:48main.main()
yesterday at 17:25:48 /app/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x679

It woorks fine until latest docker update.

@raulfg3
Copy link

raulfg3 commented May 15, 2022

version: "2.1"
networks:
  default:
    external:
      name: my-net
services:
  scrutiny:
    image: ghcr.io/linuxserver/scrutiny
    container_name: scrutiny
    cap_add:
      - SYS_RAWIO
      - SYS_ADMIN #optional
    environment:
      - PUID=1001
      - PGID=1000
      - TZ=Europe/Madrid
      - SCRUTINY_API_ENDPOINT=http://localhost:8080
      - SCRUTINY_WEB=true
      - SCRUTINY_COLLECTOR=true
    volumes:
      - /srv/dev-disk-by-uuid-6f38f974-7aec-452a-815d-9101878af2e1/Data/dockers/scrutiny:/config
      - /run/udev:/run/udev:ro
    ports:
      - 89:8080
    devices:
      - /dev/sda:/dev/sda
      - /dev/sdb:/dev/sdb
      - /dev/sdc:/dev/sdc
      - /dev/sdd:/dev/sdd
      - /dev/sde:/dev/sde
      - /dev/sdf:/dev/sdf
    restart: unless-stopped

@raulfg3
Copy link

raulfg3 commented May 15, 2022

I switched for latest analogj Image and worrks fine, my new yaml file is:

version: "3.5"
networks:
  default:
    external:
      name: my-net
services:
  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-omnibus
    cap_add:
      - SYS_RAWIO
    volumes:
      - /srv/dev-disk-by-uuid-6f38f974-7aec-452a-815d-9101878af2e1/Data/dockers/scrutiny:/opt/scrutiny/config
      - /srv/dev-disk-by-uuid-6f38f974-7aec-452a-815d-9101878af2e1/Data/dockers/scrutiny/influxdb:/opt/scrutiny/influxdb
      - /run/udev:/run/udev:ro
    ports:
      - 89:8080 # webapp
      - 86:8086 # influxDB admin
    devices:
      - /dev/sda:/dev/sda
      - /dev/sdb:/dev/sdb
      - /dev/sdc:/dev/sdc
      - /dev/sdd:/dev/sdd
      - /dev/sde:/dev/sde
      - /dev/sdf:/dev/sdf
    restart: unless-stopped

@tommyalatalo
Copy link
Author

actually, can you try setting it to SCRUTINY_WEB_INFLUXDB_HOST=localhost if unsetting the env var doesnt work.

I've tried both unsetting and setting it to localhost, now the error I'm getting is this:

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.4.5

Start the scrutiny server
ts=2022-05-15T11:13:36.619100Z lvl=error msg="failed to onboard user admin" log_id=0aTrat3G000 handler=onboard error="onboarding has already been completed" took=0.057ms
ts=2022-05-15T11:13:36.619120Z lvl=error msg="api error encountered" log_id=0aTrat3G000 error="onboarding has already been completed" 

@AnalogJ
Copy link
Owner

AnalogJ commented May 15, 2022

Hey @tommyalatalo
Thanks for confirming that the INFLUX_HOST should be localhost I'll fix that up in the defaults

That new error related to onboarding is because the influxdb data directory is not currently mounted/persisted outside the container.

If you're using the official omnibus image, you can add a volume mount like: ./influxdb:/opt/scrutiny/influxdb.
Also, I noticed that you still have references to /scrutiny in your Nomad config, those should all be renamed to /opt/scrutiny (thats the new consistent path).

Please confirm that these steps fixed the issue and I can close this :)

@tommyalatalo
Copy link
Author

I don't have any references left to anything outside /opt/scrutiny, and the container still loops with this message:

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.4.5

Start the scrutiny server
waiting for influxdb
starting scrutiny

this is the container that should be starting up:

    task "web" {
      driver = "docker"

      config {
        image   = var.web_image
        cap_add = ["sys_admin", "sys_rawio"]
        ports   = ["scrutiny"]

        volumes = [
          "/run/udev:/run/udev:ro",
          "secrets/scrutiny.yaml:/opt/scrutiny/config/scrutiny.yaml",
          "/zpool/services/scrutiny/scrutiny.db:/opt/scrutiny/config/scrutiny.db",
          "/zpool/services/scrutiny/influxdb:/opt/scrutiny/influxdb",
        ]
      }

      service {
        name = "${NOMAD_TASK_NAME}"
        port = "scrutiny"

        tags = [
          "web",
          "http",
          "traefik.enable=true",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.entrypoints=https",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.rule=Host(`${NOMAD_TASK_NAME}.tox.sh`)",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.middlewares=chain-internal-no-auth@file",
          "traefik.http.routers.https-${NOMAD_TASK_NAME}.tls=true",
        ]

        check {
          port     = "scrutiny"
          name     = "${NOMAD_TASK_NAME} health"
          type     = "http"
          path     = "/api/health"
          interval = "30s"
          timeout  = "1s"
        }
      }

      env {
        GIN_MODE              = "release"
        SCRUTINY_WEB_INFLUXDB_HOST = "localhost"
      }

      template {
        destination = "secrets/scrutiny.yaml"
        change_mode = "restart"
        data        = file("./config/scrutiny/scrutiny.yaml")
      }
    }

@AnalogJ
Copy link
Owner

AnalogJ commented May 17, 2022

@tommyalatalo no other error messages? Can you enable debug mode by setting the DEBUG environmental variable to true

@tommyalatalo
Copy link
Author

Okay, I found that the database location was wrong in my scrutiny.yaml file, after fixing that I now get this error:

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc00040a068, 0x12a4b00, 0xc000470bd0, 0x129f9a0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000405620, 0x12a4b00, 0xc000470bd0, 0x14)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000405620, 0x0, 0x0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
main.main.func2(0xc000411240, 0x4, 0x6)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
github.com/urfave/cli/v2.(*Command).Run(0xc0004730e0, 0xc0004110c0, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/command.go:164 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc00047e000, 0x128e820, 0xc000130010, 0xc000126020, 0x2, 0x2, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:306 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/v2@v2.2.0/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
2022/05/17 17:12:13 Loading configuration file: /opt/scrutiny/config/scrutiny.yaml
time="2022-05-17T17:12:13Z" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n"
time="2022-05-17T17:12:13Z" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n"
time="2022-05-17T17:12:13Z" level=debug msg="InfluxDB url: http://localhost:8086"
time="2022-05-17T17:12:13Z" level=debug msg="No influxdb token found, running first-time setup..."
panic: conflict: onboarding has already been completed

@AnalogJ
Copy link
Owner

AnalogJ commented May 17, 2022 via email

@tommyalatalo
Copy link
Author

Yeah, that does get scrutiny to start, but if I restart the web container it errors out again with the same message.

waiting for influxdb
starting scrutiny

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.4.5

Start the scrutiny server
ts=2022-05-17T18:28:01.974547Z lvl=error msg="failed to onboard user admin" log_id=0aWpGhZW000 handler=onboard error="onboarding has already been completed" took=0.053ms
ts=2022-05-17T18:28:01.974573Z lvl=error msg="api error encountered" log_id=0aWpGhZW000 error="onboarding has already been completed"

@AnalogJ
Copy link
Owner

AnalogJ commented May 17, 2022

That means your config.yaml file is not being persisted correctly.

"secrets/scrutiny.yaml:/opt/scrutiny/config/scrutiny.yaml"

Is that secrets/scrutiny.yaml path writable? during setup, scrutiny will attempt to configure your influxdb instance, then store the api token to the config file.

I'm guessing your secrets folder is similar to a Kubernetes secret mount? In which case its not writable?

In that case, you can authenticate to the InfluxDB webui, retrieve the API token, and store it in your scrutiny.yaml file.

https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_INFLUXDB.md#first-start

@tommyalatalo
Copy link
Author

You're right, secrets/scrutiny.yaml is not persisted, it's held in memory because it's templated with my pushover token from my secrets storage. I think I saw the document that you linked before, there are a few questions and comments that arise from this behavior;

  • Writing a value back to a users' config file is bad practice, you should never edit a config file that a user has deployed in order to run a service.
  • Why is the API token needed in the case where I'm using the omnibus image? I figure scrutiny could just as well use the admin credentials that it starts out with since the influxdb database is embedded in the same container and essentially dedicated to scrutiny? This would eliminate the complexity of having to fetch the API token.
  • Why does scrutiny need a database in general? Looking at it from my use case I'm only interested in monitoring my disks and getting notified if errors are detected, for this purpose I don't see the need for scrutiny to be a stateful service? Is scrutiny storing anything vital in influxdb in addtition to the temperature history which I assume is stored there?

@AnalogJ
Copy link
Owner

AnalogJ commented May 17, 2022

  • I mostly agree that updating the config file isn't ideal, but an "uncommon practice" != "bad practice". This is only necessary because the InfluxDB SDK does not support username/password auth, only API token authentication: https://github.com/influxdata/influxdb-client-go/blob/ab68e236009fd2a1b12edbd0a328f2103c4053d7/client.go#L95
    • basically I'm in a weird position where i need to configure a new InfluxDB instance (which adds buckets, admin users and api tokens) and I need a way to persist this auth data so scrutiny will continue to work if/when the container is restarted.
    • if you bring your own influxdb instance (in hub/spoke deployments), then you can pre-populate the token in your config file.
    • my thought process is that the config file/directory is intended to store scrutiny specific data, so it makes sense to persist it there. If there's alot of concern, I guess I could write the auth token to a different file in the config directory.
  • See above.
  • One of Scrutiny's primary features is S.M.A.R.T metric tracking for historical trends. Its intended to be used as a way to determine if your SMART data is changing over time, which is obviously stateful. If you only want visibility into the SMART output at a point in time, you could use smartctl directly.

Hope that answers all your questions?

@tommyalatalo
Copy link
Author

tommyalatalo commented May 18, 2022

Thanks for answering all the questions, I appreciate it.

I definitely agree that uncommon practice is not equal to bad practice, though in this case I'm pretty confident that it's also the latter. So I work in devops/sysops and have deployed hundreds of applications, and I can honestly say that not a single one so far has ever overwritten it's own config file when started up. There are two main reasons for this, both are present in my issue above.

The first is that the application cannot assume that the config file is writable, and shouldn't expect it to be since it's a user supplied config which can contain credentials like the api token or pushover token, meaning that a user like me can use an in-memory filesystem like nomad's secrets folder, or kubernetes secrets etc, and never persist the file to disk in order to not leak the credentials. The file could also just be mounted as read-only into docker with :ro which is also entirely reasonable to do in order to improve the security of the container by preventing any malicious process from inside the container to make changes to the config.

The second reason is that if an application writes parameters back into its config file it makes the behavior of the application non-deterministic, which you can see in my case where I first start up scrutiny from having no database, it works, and then when restarting after the database has been initialized it fails instead, while I'm still using the same config. So the application behavior changes, but from a user standpoint my application config is exactly the same and so results should be expected to be the same as well.

Since I said that it's bad practice I should of course give an example of what is considered to be good practice, one of the best examples I know of is the 12factor app method, where configuration should be primarily in env vars; https://12factor.net/config. When considering this you quickly realize that if all the config was set as env vars there would be nowhere to write the api token back to in order to persist it, which is a strong indication that the current api token handling is suboptimal.

But that's all fine, scrutiny is a work in progress and I hope you appreciate that this is a tangent on what as a whole is a great project and very useful to me as well, so let's discuss possibilities of how to handle the api token in a different way. I've built an application myself that uses influxdb heavily, but that was v1.8 so the SDK is clearly a bit different since the api token is now necessary. I'm currently thinking of two approaches;

  1. It seems that you can set an admin token when influx setup is initially run:
❯ ./influx setup --help
NAME:
    setup - Setup instance with initial user, org, bucket

USAGE:
...
COMMON OPTIONS:
...
OPTIONS:
   --token value, -t value      Auth token to set on the initial user [$INFLUX_TOKEN]
...

The official influxdb docker image does exactly this in its entrypoint: https://github.com/influxdata/influxdata-docker/blob/master/influxdb/2.2/entrypoint.sh#L224-L226

This would be by far the best way to handle this, since you can have the token supplied immediately as an env var, (or in the scrutiny.yaml) so that it will be used when initializing influxdb and scrutiny can load the same value to use for its requests. This would make the whole process automated and consistent, no restarts or creating tokens in the UI needed. Just have scrutiny stop with an error if this influxdb admin token isn't set properly in the omnibus image. I actually wonder if this would work without any changes if I set INFLUX_TOKEN in the container and set web.api.token to the same value? I haven't checked the scrutiny source code for how influxdb is initialized.

  1. Halt the program (don't shut down since scrutiny assumingly runs as pid 1) at startup if the database has been initialized and the token isn't set. The error message should be clearer and say that influxdb has been initialized, but web.api.token is not set. Option 1 is far better though.

Well that was a whole lot of discussion, I hope you didn't fall asleep halfway through :D

@AnalogJ AnalogJ reopened this May 18, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented May 18, 2022

I ended up forking the InfluxDB sdk and just adding support for a SetupWithToken() method. This means I can leverage the same functionality as the influx setup cli command, without needing to pull in the CLI tooling and its dependencies.

The docker image is building right now, but it everything is working correctly in my local testing.
You may need to delete your influxdb folder, or retrieve your existing token and store it as web.influx.token in your existing config file.


Appreciate your detailed response above. I had already considered most of your options before going the "write a config file" route, but I thought it would add additional complexity and maintenance burden on my side. Regarding your 12-factor app comment, even with a config file Scrutiny supports overrides using env variables. The Viper config library that we use merges CLI -> Env -> config file values before providing them to the application. Secrets could/should always be provided as env variables.
As Scrutiny is primarily a "read-only" application with little-to-no need for secrets or dynamic configuration, I had thought the trade-off was worth it, but I'm happy to make this change to support this style of deployment.

@tommyalatalo
Copy link
Author

tommyalatalo commented May 18, 2022

That sounds like a good solution, I thought you already bootstrapped the database with the influx cli tool, but forking the SDK is probably a good way forward in this case, you should make a PR to the SDK and see if they'll take in the code so you don't have to maintain it further.

Ah yes, I've used Viper myself, it's an excellent library especially when paired with Cobra. And there's nothing wrong with supporting both config files and env, that gives a lot of flexibility for templating etc.

I'm looking forward to trying the new image when you have it ready! I'm using scrutiny across 3 hosts where one is a remote backup server, all in a hub/spoke setup, overall I really like being able to monitor all my disks and have Pushover notifications set up so that I can immediately move on changing a disk if I start to get errors. It's so great to be able to do this across a cluster and manage it from one central place and not have to set it up per-host as with smartctl.

@ThisIsTheOnlyUsernameAvailable

Is it possible to use an external influxdb instance? If so, can anyone suggest what the Docker environment variables (or config options) should be?

@AnalogJ
Copy link
Owner

AnalogJ commented May 20, 2022

@ThisIsTheOnlyUsernameAvailable please take a look at this example config file if you'd like to use your own influxdb docker container:

https://github.com/AnalogJ/scrutiny/blob/master/docker/example.hubspoke.docker-compose.yml

If using a preexisting Influxdb instance (already pre-configured with users & buckets) you'll need to specify the following variables in your config file:

https://github.com/AnalogJ/scrutiny/blob/master/example.scrutiny.yaml#L42-L45

@AnalogJ
Copy link
Owner

AnalogJ commented May 20, 2022

Closing this issue now that v0.4.6 has been released (and uses an updated version of the influxDB SDK with predetermined Token support).

If you're using Docker, you'll need to wait until https://github.com/AnalogJ/scrutiny/actions/runs/2359630681 completes to use the fixed image.

Thanks for all your help and feedback @tommyalatalo ! 🎉

@AnalogJ AnalogJ closed this as completed May 20, 2022
@tommyalatalo
Copy link
Author

tommyalatalo commented May 21, 2022

I looked at the pipeline, are you not tagging the image with the version number in addition to just :master-omnibus?

I force pulled the new image tagged :master-omnibus, and it works nicely! I would like to lock it to version 0.4.6 now though, so that I don't risk breakage with future updates.

@AnalogJ
Copy link
Owner

AnalogJ commented May 21, 2022

not sure what you looked at:

The docker-build.yaml GH Action will automatically build version locked docker images and tag them with ghcr.io/analogj/scrutiny:${VERSION}-omnibus

It seems that the v0.4.6 image failed for some reason. I'll take a look at that.

@AnalogJ
Copy link
Owner

AnalogJ commented May 21, 2022

Looks like the failure was due to a timeout, retrying fixed the issue:

https://github.com/AnalogJ/scrutiny/pkgs/container/scrutiny/22808074?tag=v0.4.6-omnibus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants