Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If dbt deps can't git clone, it doesn't give the error message encountered by git #3167

Closed
1 of 5 tasks
tpilewicz opened this issue Mar 15, 2021 · 6 comments · Fixed by #4124
Closed
1 of 5 tasks

If dbt deps can't git clone, it doesn't give the error message encountered by git #3167

tpilewicz opened this issue Mar 15, 2021 · 6 comments · Fixed by #4124
Labels
cli enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@tpilewicz
Copy link
Contributor

Describe the bug

When running dbt deps, if the git clone encounters an issue, the error message is:

Got a non-zero returncode running: ['git', 'clone', '--depth', '1', 'https://github.com/fivetran/dbt_fivetran_utils.git', '2570ae56bf9cb34e49fe01aa3bc99195']

In my case the situation was that I had updated OSX, which had messed up my XCode command line tools (and thus my install of git). To get the real error message encountered by git and understand my issue, I had to manually run the git command. I suggest giving the error message returned by git to the user, so that they can more easily know what happened. Also users that are not used to the terminal won't think of copying the git clone command and trying it out to see the message.

Steps To Reproduce

Make your git clone command fail during a dbt deps. For example how I did it was I added in my development version of dbt, in file /core/dbt/clients/git.py, right before the run_cmd of the clone function:

clone_cmd.append('zerezrezrze')

Normally the output you'll get is just:

Got a non-zero returncode running: ['git', 'clone', '--depth', '1', 'https://github.com/fivetran/dbt_fivetran_utils.git', '2570ae56bf9cb34e49fe01aa3bc99195', 'zerezrezrze']

Expected behavior

I expected the output of dbt deps to forward the error message from git. For example if you added something to the clone_cmd like I did you should see:

Got a non-zero returncode running: ['git', 'clone', '--depth', '1', 'https://github.com/fivetran/dbt_fivetran_utils.git', '2570ae56bf9cb34e49fe01aa3bc99195', 'zerezrezrze']
fatal: Too many arguments.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.19.0
   latest version: 0.19.0

Up to date!

Plugins:
  - postgres: 0.19.0
  - snowflake: 0.19.0
  - redshift: 0.19.0
  - bigquery: 0.19.0

The operating system you're using: Mac OSX

The output of python --version: Python 3.6.8

Additional context

Happy to contribute if this is recognized as an issue!

@tpilewicz tpilewicz added bug Something isn't working triage labels Mar 15, 2021
@jtcohen6 jtcohen6 added cli good_first_issue Straightforward + self-contained changes, good for new contributors! enhancement New feature or request and removed triage bug Something isn't working labels Mar 16, 2021
@jtcohen6
Copy link
Contributor

Thanks for the detailed write-up @tpilewicz! I'm all for clearer error handling. I'm going to mark this a good first issue, and I'd welcome a contribution. I will say two things in advance:

  • dbt (almost) always includes the full error message in the debug-level logs
  • We should use stdout logging as a way to flag, wrap, or otherwise communicate those errors in a way that helps the user figure out what to do or where to look next

To get the real error message encountered by git and understand my issue, I had to manually run the git command. I suggest giving the error message returned by git to the user, so that they can more easily know what happened. Also users that are not used to the terminal won't think of copying the git clone command and trying it out to see the message.

I think you've hit the nail on the head: The message should make it clear-as-can-be that the error was encountered during a git clone process, as well as passing along the error and and encourage users to try running (and debugging) the git clone step on their own.

Again, it's worth saying that the full error message should be available in the debug logs. In this example, I'm trying to clone a non-existing package, https://github.com/fishtown-analytics/does-not-exist.git. The stdout log told me only about the non-zero return code, but I can pop into logs/dbt.log to see:

2021-03-16 09:22:29.052957 (MainThread): Executing "git clone --depth 1 https://github.com/fishtown-analytics/does-not-exist.git de8543fbe946ef5e90808e1f971f595a"
2021-03-16 09:22:29.593614 (MainThread): STDOUT: "b''"
2021-03-16 09:22:29.594022 (MainThread): STDERR: "b"Cloning into 'de8543fbe946ef5e90808e1f971f595a'...\nremote: Repository not found.\nfatal: repository 'https://github.com/fishtown-analytics/does-not-exist.git/' not found\n""
2021-03-16 09:22:29.594208 (MainThread): command return code=128
2021-03-16 09:22:29.595039 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e8b7fa0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10eb8b070>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10eb8b370>]}
2021-03-16 09:22:29.595401 (MainThread): Flushing usage events
2021-03-16 09:22:29.999091 (MainThread): Encountered an error:
2021-03-16 09:22:29.999337 (MainThread): Got a non-zero returncode running: ['git', 'clone', '--depth', '1', 'https://github.com/fishtown-analytics/does-not-exist.git', 'de8543fbe946ef5e90808e1f971f595a']

So the question here is not whether to make that information available at all, but when to surface it more directly to the user because it matches an expected error pattern—or to include a message to the effect of, we have no idea what this error is, please check the debug logs to find out.

dbt already does a decent job of returning the git-provided error message, if the error occurs during the remote/fetch steps (e.g. because of a fake revision). There's a special exception for this, raised by the checkout step:

https://github.com/fishtown-analytics/dbt/blob/6a5ed4f41877bdcb3bd9b561ac0af2d197d234d1/core/dbt/exceptions.py#L645-L648

We could call that same exception, or a similar one, in the clone/clone_and_checkout methods as well. One of the things to watch out for is that, because clone is the first git operation dbt may be performing, it could fail because of a bad package spec, or it could fail because of a bad git installation (as in your case). So we may need conditional exception handling based on what's gone wrong. I'm hopeful that a bad symlink or non-executable git would raise a different set of error messages, via OSError raised by run_cmd.

As an example, here's some pseudo-code for simple handling and a clearer exception in the case where the repository does not exist:

$ git diff
diff --git a/core/dbt/clients/git.py b/core/dbt/clients/git.py
index 0bbf937d..414ec5f5 100644
--- a/core/dbt/clients/git.py
+++ b/core/dbt/clients/git.py
@@ -79,6 +79,8 @@ def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
     except dbt.exceptions.CommandResultError as exc:
         err = exc.stderr.decode('utf-8')
         exists = re.match("fatal: destination path '(.+)' already exists", err)
+        if 'not found' in err:
+            dbt.exceptions.bad_package_spec(repo, branch, err.strip())
         if not exists:  # something else is wrong, raise it
             raise
packages:
  - git: https://github.com/fishtown-analytics/does-not-exist.git
    revision: fake-branch

Before:

$ dbt deps
Running with dbt=0.19.0
Encountered an error:
Got a non-zero returncode running: ['git', 'clone', '--depth', '1', 'https://github.com/fishtown-analytics/does-not-exist.git', 'de8543fbe946ef5e90808e1f971f595a']

After:

$ dbt deps
Running with dbt=0.19.0
Encountered an error:
Error checking out spec='fake-branch' for repo https://github.com/fishtown-analytics/does-not-exist.git
Cloning into 'de8543fbe946ef5e90808e1f971f595a'...
remote: Repository not found.
fatal: repository 'https://github.com/fishtown-analytics/does-not-exist.git/' not found

@tpilewicz
Copy link
Contributor Author

Thanks for your feedback @jtcohen6 :)

Alright then I'll try to do both these things you suggested:

  • distinguish the errors and raise them with exceptions coming from exceptions.py, adding some exception classes if needed
  • print something to the user when we don't know what the error is to encourage them to look at debug logs

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 14, 2021
@Gitznik
Copy link
Contributor

Gitznik commented Oct 19, 2021

@jtcohen6 if this is still an issue and @tpilewicz is no longer working on it, I'm happy to give it a look.

@jtcohen6
Copy link
Contributor

@Gitznik I think it's fair game!

@github-actions github-actions bot removed the stale Issues that have gone stale label Oct 20, 2021
@tpilewicz
Copy link
Contributor Author

@Gitznik go for it! I don't have time for working on this right now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants