-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Dockerfile to individual components #157
Conversation
This PR adds the following
The Travis changes were made kind of in a blind mode, as I'm not sure how I can test it. The bash parts were tested as much as I could. I'm also not quite sure how I can remove the submodule updates to this commit, as I never really dealt with submodules. Any hints there is highly appreciated :) |
plugin/storage/cassandra/schema.sh
Outdated
@@ -0,0 +1,31 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure if this is the most appropriate for this script. It has to be either on the same directory as the Dockerfile or in a sub directory, but not on a parent directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok here, given the other scripts. I would just rename it to create-schema.sh
@@ -0,0 +1,6 @@ | |||
FROM centos:7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images are inheriting the CentOS 7 images, as those provide some tools which are useful for debugging. Once the images mature a bit on the community, we can change the base image to Alpine or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The smallest base image that still allows basic debug abilities would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CentOS image isn't that big (192.5 MB), which is shared by all images. But if you have a specific image to suggest, I could try that out and see if it works.
@@ -16,6 +16,9 @@ matrix: | |||
- go: 1.7 | |||
env: | |||
- CROSSDOCK=true | |||
- go: 1.7 | |||
env: | |||
- DOCKER=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I don't know for sure what this does. I guess this will trigger different executors on Travis, but I just did a copy/paste of the sections above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a travis test matrix; for each one of the entries, travis will kick off a new build with the environment variables set in the entry. We did this to speed up the travis build process by parallelizing the subcomponents. The copy pasta is good
@@ -46,7 +47,7 @@ md-to-godoc-gen: | |||
|
|||
.PHONY: clean | |||
clean: | |||
rm -rf cover.out cover.html lint.log fmt.log | |||
rm -rf cover.out cover.html lint.log fmt.log jaeger-ui-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really belong to this commit. I hope it's OK to leave this change here.
rm -rf cmd/query/jaeger-ui-build | ||
|
||
.PHONY: docker-push | ||
docker-push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This target is only useful for development purposes and it's not being used for the actual builds. Should this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should never be manually pushing and always have travis push the repos on a successful build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we'd ever have need for manually pushing anything; should always wait for travis to push on a successful build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did quite a few times under my own namespace, to test the images on OpenShift/Kubernetes. If I add a comment to this target, saying that this is intended only for local development purposes, would that be OK to leave it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
@@ -0,0 +1,7 @@ | |||
FROM jpkroehling/cassandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed on the mailing list. I'll either change this to use the "official" Dockerhub image for Cassandra or change to use an image we are building for OpenShift. I don't expect to change this before this PR is merged, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add TODO here and a task once this lands
@@ -16,6 +16,9 @@ matrix: | |||
- go: 1.7 | |||
env: | |||
- CROSSDOCK=true | |||
- go: 1.7 | |||
env: | |||
- DOCKER=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a travis test matrix; for each one of the entries, travis will kick off a new build with the environment variables set in the entry. We did this to speed up the travis build process by parallelizing the subcomponents. The copy pasta is good
rm -rf cmd/query/jaeger-ui-build | ||
|
||
.PHONY: docker-push | ||
docker-push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should never be manually pushing and always have travis push the repos on a successful build.
rm -rf cmd/query/jaeger-ui-build | ||
|
||
.PHONY: docker-push | ||
docker-push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we'd ever have need for manually pushing anything; should always wait for travis to push on a successful build.
@@ -0,0 +1,6 @@ | |||
FROM centos:7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The smallest base image that still allows basic debug abilities would be best.
@@ -0,0 +1,7 @@ | |||
FROM jpkroehling/cassandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add TODO here and a task once this lands
@@ -43,3 +46,4 @@ script: | |||
- if [ "$COVERAGE" == true ]; then travis_retry goveralls -coverprofile=cover.out -service=travis-ci || true ; else echo 'skipping coverage'; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on L41 above, you probably have to change it to:
- if [[ "$ALL_IN_ONE" == true || "$DOCKER" == true ]]; then bash ./travis/install-ui-deps.sh ; fi
because we need yarn to build the ui which is needed in build-docker-images.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can add a new environment variable called INSTALL_YARN and add it to the matrix for ALL_IN_ONE and DOCKER
|
||
source ~/.nvm/nvm.sh | ||
nvm use 6 | ||
DOCKER_NAMESPACE=jaegertracing make docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stupid question incoming: shouldn't there be a newline before make docker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. In bash, this has the effect of setting this var for this command, so, make
will see DOCKER_NAMESPACE as jaegertracing
without the parent script needing to export
it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
.PHONY: docker | ||
docker: build_ui build-agent-linux build-collector-linux build-query-linux | ||
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build | ||
docker build -t $(DOCKER_NAMESPACE)/jaeger-cassandra-schema plugin/storage/cassandra/ ; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this jaeger-cassandra instead? The schema gets applied on startup and after that it's essentially just a cassandra box. The name might throw off some people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no "after" for this image :) It's used as a Kubernetes "Job": it will just run the shell script that generates the keyspace and will gracefully die afterwards. It's derived from the Cassandra image because it needs the cqlsh
tools, so, I thought it would be easier (and save space) to just consume a image that I know existed there, but Cassandra itself will never start on this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh gotcha. Maybe jaeger-cassandra-bootstrap? fine with jaeger-cassandra-schema too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the same mechanism across all artifacts.
@black-adder It sounds like this is doing something different from what e2e test is doing to init Cassandra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we do for e2e is pretty ridiculous though, we keep bringing up collector and wait until it doesn't fatal out due to cassandra not being ready. I prefer this new approach.
^ above answer is an answer to a different c* question below
For this question, we have the e2e handler actively probe the cassandra cluster to see if it's ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @jsanda last week about something related to this and he made me realize that this might also not be appropriate.
In any case: I think we could add a separate task for having a more reliable bootstrap method, perhaps consistent among all deployment architectures. In the meantime, I'd keep it like this, so that we can move forward.
@@ -111,6 +112,21 @@ build-query-linux: | |||
build-collector-linux: | |||
CGO_ENABLED=0 GOOS=linux installsuffix=cgo go build -o ./cmd/collector/collector-linux ./cmd/collector/main.go | |||
|
|||
.PHONY: docker | |||
docker: build_ui build-agent-linux build-collector-linux build-query-linux | |||
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not clear why this is needed. We already build all-in-one without copying this dir into query source, instead we just mount it to the Docker image from ./jaeger-ui-build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because the Dockerfile is within the cmd/query , and Docker can only build stuff that is on its own directory or any sub directory, but not in any parent directory (for security reasons I suppose). So, we need to have a copy there.
.PHONY: docker | ||
docker: build_ui build-agent-linux build-collector-linux build-query-linux | ||
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build | ||
docker build -t $(DOCKER_NAMESPACE)/jaeger-cassandra-schema plugin/storage/cassandra/ ; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the same mechanism across all artifacts.
@black-adder It sounds like this is doing something different from what e2e test is doing to init Cassandra.
@@ -0,0 +1,6 @@ | |||
FROM centos:7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we'd put all Docker files in one dir. But no strong feeling either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, as long as the contents for the Docker images are in sub directories from where the Dockerfiles are located. I'd rather keep it alongside with the image's contents.
|
||
COPY collector-linux /go/bin/ | ||
|
||
CMD ["/go/bin/collector-linux"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to all images - don't we need to be translating env vars into command line switches, to make the images follow https://12factor.net/ ?
Ideally our binaries could be reading env vars directly, there are frameworks like spf13/Cobra that support that, but we haven't implemented it yet. So the only way currently to configure each binary is to translate env vars into command line switches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily: I'm not quite sure how it would work for Docker Compose, but for Docker at the command line, Kubernetes and OpenShift, you can override the command line and set extra env vars.
So, the best approach is to specify only what's really required to get the binary running, so that you don't need to maintain two places with default values.
This is an example of how to set an extra env var:
$ docker run -e MY_ENV_VAR=test -it jpkroehling/jaeger-agent bash
[root@86fc09d20bfc /]# echo $MY_ENV_VAR
test
And here is an example of how to override the command to execute when running an image: https://github.com/jpkrohling/jaeger-openshift/blob/JPK-OpenShift-IndividualComponents/template.yml#L147-L149
plugin/storage/cassandra/schema.sh
Outdated
echo "Generating the schema for the keyspace ${KEYSPACE} and datacenter ${DATACENTER}" | ||
export KEYSPACE | ||
|
||
# the `test` parameter is to force the script to use a SimpleStrategy instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? simple strategy only makes sense in test env with a single Cassandra host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually using the prod
, but got into a Cassandra issue. Something like Error reading service_names from storage: Cannot achieve consistency level LOCAL_ONE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prod
configuration defines replication factor 2, so you need at least 2 Cassandra nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's it then. I'm not that familiar with Cassandra and noticed that I could change the script to test
and it would work :-) My deployment script creates 2 Cassandra nodes, but it seems it was still not enough to avoid this error.
Should we move all these Cassandra-related tasks to a new task?
plugin/storage/cassandra/schema.sh
Outdated
@@ -0,0 +1,31 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok here, given the other scripts. I would just rename it to create-schema.sh
plugin/storage/cassandra/schema.sh
Outdated
|
||
CQLSH_HOST=${CQLSH_HOST:-"cassandra"} | ||
CASSANDRA_WAIT_TIMEOUT=${CASSANDRA_WAIT_TIMEOUT:-"60"} | ||
DATACENTER=${DATACENTER:-"openshift"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like a good default. I am actually not sure what datacenter the official Cassandra image mentions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datacenter is actually required by the existing schema generator: https://github.com/uber/jaeger/blob/master/plugin/storage/cassandra/cassandra3v001-schema.sh#L23
So, it's either prod
or test
, and when passing prod
, a datacenter has to be provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's required for prod
because it uses the name when defining replication strategy. That datacenter must match what Cassandra uses in its cassandra-rackdc.properties
.
plugin/storage/cassandra/schema.sh
Outdated
total_wait=0 | ||
while true | ||
do | ||
ping -c 1 ${CQLSH_HOST} > /dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the typical way of checking that the container app is running? Seems line it might succeed as soon as the container / IP is available, but the Cassandra server may still be starting.
@black-adder what did we use for e2e test? Didn't we wait for the actual response from Cassandra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes/OpenShift will establish the connection only once the target service has a container passing the "readiness" check. We don't have such a check right now, but this is more like a hack and might eventually fail on the situation you just mentioned.
I think @jsanda can help us here in coming up with a reliable solution, but I'd prefer to have it as part of another task.
@yurishkuro , @black-adder , I just updated this PR, rebasing it on the latest master and adding a couple of fixes based on the comment reviews. If this looks good, I'll squash it and it would be ready for merging. There are still a couple of items open for Cassandra, mostly related to tuning it to be more resilient and reliable (replication factor, schema creation, ...) , but I'd like to have that as another follow up task, instead of blocking this PR. I'll then ask @jsanda for help on that task, as he's far more experienced with Cassandra than I am :-) |
|
You will certainly run into problems by just pinging the the pod. Cassandra start up times will vary, particularly when you take commit log replay into consideration. That can slow down start up significantly. I suggest using To avoid overly long start up times and to avoid commit log corruption, I strongly recommend running |
I updated the PR to use another command for checking Cassandra's state. I decided not to use
It works now. I had to add a dedicated conditional on
They are created on demand on the first build, provided the account has push permissions on the organization.
It's actually done on the
I tried every git trick I know, but the only thing that helped was to do a diff against master, create a new branch off of master and apply the new diff onto this new branch :/ I also opened #175 to track the points to improve in our usage of Cassandra. |
Rebased with the latest master. This is ready for a re-review or eventual merge. |
No description provided.