Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Debugger Plugin] Fix Python unit tests for TF2 #2583

Merged
merged 7 commits into from
Aug 21, 2019
Merged

[Debugger Plugin] Fix Python unit tests for TF2 #2583

merged 7 commits into from
Aug 21, 2019

Conversation

caisq
Copy link
Contributor

@caisq caisq commented Aug 20, 2019

  • Motivation for features / changes

  • Technical description of changes

    • In session_debug_test.py and interactive_debugger_plugin_test.py: use tf.compat.v1 and disable_v2_behavior(), as tfdbg (V1) is indeed wedded to the Session::Run() paradigm.
    • In debug_graphs_helper_test.py: make conditional assertions to fit v2-specific behaviors

@caisq caisq changed the title Debugger tf2 fixes [Debugger Plugin] Fix Python unit tests for TF2 Aug 20, 2019
@caisq caisq requested a review from wchargin August 20, 2019 20:49
Copy link
Contributor

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -37,6 +37,7 @@

import tensorflow as tf
from tensorflow.python import debug as tf_debug
from tensorflow.python import tf2 as tensorflow_python_tf2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add

# See discussion on issue #1996 for private module import justification.
from tensorflow.python import tf2 as tensorflow_python_tf2

(as in test_util.py)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertIn(('d', 0, 'DebugIdentity'), gated_debug_ops)
self.assertIn(('d/read', 0, 'DebugIdentity'), gated_debug_ops)

if not tensorflow_python_tf2.enabled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ll shortly stop running tests on TF 1.x, so these will become dead
code. Can we invert the guard and test against the TF 2.x names instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this chunk of assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that works, if you’re satisfied with the coverage.

@@ -33,7 +33,7 @@
import numpy as np
import portpicker # pylint: disable=import-error
from six.moves import urllib # pylint: disable=wrong-import-order
import tensorflow as tf # pylint: disable=wrong-import-order
import tensorflow # pylint: disable=wrong-import-order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be import tensorflow.compat.v1 as tf rather than leaving a
lurking (unused) tensorflow identifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this will lead to copybara issues, as I've run into in tensorflow (see CL/264270116). The identifier tensorflow is not unused. It's used on line 47.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use import tensorflow.compat.v1 as tf extensively in TensorBoard
(you can find existing uses with git grep), and it doesn’t cause any
Copybara issues.

The identifier tensorflow is not unused. It's used on line 47.

Of course, but if you change this to import tensorflow.compat.v1 as tf
then line 47 goes away, and there are no other uses of tensorflow. The
identifier is “lurking” in that tensorflow and tf are both in-scope
identifiers but have different values, which is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I didn't know import tensorflow.compat.v1 as tf was used extensively in TensorBoard. Done.

@@ -44,11 +44,13 @@
from tensorboard.plugins.debugger import interactive_debugger_plugin
from tensorboard.util import test_util

tf = tensorflow.compat.v1
tf.disable_v2_behavior()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting: simply adding tf.compat.v1.disable_v2_behavior() without
the rest of the patch doesn’t suffice to fix the timeouts on my end, so
some of the API differences must(?) be causing these unbounded
timeouts—do we know which ones?

In the short term, this patch is great because it lets us continue to
run the tests, but ideally we’d like to get rid of tf.compat.v1
wherever possible, too (#1718).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here. The Debugger Plugin is currently tied to v1 behavior (tf.Sessions), which is the reason why it is necessary to pin V1 behavior here. When tfdbg V2 comes along, this will of course no longer be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. But if session and node behavior were the only
salient changes then we would expect adding disable_v2_behavior() to
fix the tests, right? And yet if we make only that change then the
testMultipleSessionRunsTensorValueFullHistory test case hangs at
various points—sometimes in the listen call in setUp, sometimes in
the first call to _serverGet in the test body. I was wondering why
that is, and why the remaining changes, which just change tf.foo to
tf.compat.v1.foo, do fix the tests.

(This isn’t necessarily a blocking question, but I would be interested
to learn how this PR actually works.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed what you said. It seems that making only the change of

  1. Using import tensorflow.compat.v1 as tf
  2. Using tf.disable_v2_behavior()

leads to hanging tests. This looks like some unexpected behavior in how tf.compat.v1.compat.v1.... behaves.

As you said, this doesn't have to be blocking. I can make a TODO item for myself to investigate it and potentially file bug for tensorflow while merging this PR. LMK WYT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tf.compat.v1.compat.v1 will simply not work, in either TF 1.x or
TF 2.x. But making only the disable_v2_behavior() change seems like
it should work, because the rest of the code already uses
tf.compat.v1.Session() explicitly.

This is good to merge; feel free to investigate iff you feel like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable_v2_behavior() exists only in tf from TF1 or tf.compat.v1 from TF2. So to use it, we'd have to do import tf.compat.v1 as tf regardless (TF1 tests are passing, so they are out of concern). But if you use that and don't change anything below, you'll just end up using tf.compat.v1.compat.v1.*, which I think is what causes the hanging.

I'll merge the PR. Thanks for the review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me be concrete: the patch that I’m talking about is simply

diff --git a/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py b/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
index 1b127456..653af030 100644
--- a/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
+++ b/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
@@ -45,6 +45,9 @@ from tensorboard.plugins.debugger import interactive_debugger_plugin
 from tensorboard.util import test_util
 
 
+tf.compat.v1.disable_v2_behavior()
+
+
 _SERVER_URL_PREFIX = '/data/plugin/debugger/'
 
 

against master prior to this PR (i.e., 2b96c2a).

This properly disables V2 behavior, does not use compat.v1.compat.v1,
and uses tf.compat.v1.Summary, but still hangs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tf.compat.v1.compat.v1 is a hard error, not a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Thanks for the concrete info.

Copy link
Contributor Author

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @wchargin

@@ -37,6 +37,7 @@

import tensorflow as tf
from tensorflow.python import debug as tf_debug
from tensorflow.python import tf2 as tensorflow_python_tf2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertIn(('d', 0, 'DebugIdentity'), gated_debug_ops)
self.assertIn(('d/read', 0, 'DebugIdentity'), gated_debug_ops)

if not tensorflow_python_tf2.enabled():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this chunk of assertions.

@@ -33,7 +33,7 @@
import numpy as np
import portpicker # pylint: disable=import-error
from six.moves import urllib # pylint: disable=wrong-import-order
import tensorflow as tf # pylint: disable=wrong-import-order
import tensorflow # pylint: disable=wrong-import-order
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this will lead to copybara issues, as I've run into in tensorflow (see CL/264270116). The identifier tensorflow is not unused. It's used on line 47.

@@ -44,11 +44,13 @@
from tensorboard.plugins.debugger import interactive_debugger_plugin
from tensorboard.util import test_util

tf = tensorflow.compat.v1
tf.disable_v2_behavior()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here. The Debugger Plugin is currently tied to v1 behavior (tf.Sessions), which is the reason why it is necessary to pin V1 behavior here. When tfdbg V2 comes along, this will of course no longer be necessary.

@caisq caisq requested a review from wchargin August 21, 2019 02:11
@wchargin
Copy link
Contributor

Re-reviewed.

@@ -44,11 +44,13 @@
from tensorboard.plugins.debugger import interactive_debugger_plugin
from tensorboard.util import test_util

tf = tensorflow.compat.v1
tf.disable_v2_behavior()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tf.compat.v1.compat.v1 will simply not work, in either TF 1.x or
TF 2.x. But making only the disable_v2_behavior() change seems like
it should work, because the rest of the code already uses
tf.compat.v1.Session() explicitly.

This is good to merge; feel free to investigate iff you feel like it.

@caisq caisq merged commit 87d2927 into tensorflow:master Aug 21, 2019
@caisq caisq deleted the debugger-tf2-fixes branch August 21, 2019 15:40
@wchargin
Copy link
Contributor

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants