-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Debugger Plugin] Fix Python unit tests for TF2 #2583
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -37,6 +37,7 @@ | |||
|
|||
import tensorflow as tf | |||
from tensorflow.python import debug as tf_debug | |||
from tensorflow.python import tf2 as tensorflow_python_tf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add
# See discussion on issue #1996 for private module import justification.
from tensorflow.python import tf2 as tensorflow_python_tf2
(as in test_util.py
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
self.assertIn(('d', 0, 'DebugIdentity'), gated_debug_ops) | ||
self.assertIn(('d/read', 0, 'DebugIdentity'), gated_debug_ops) | ||
|
||
if not tensorflow_python_tf2.enabled(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’ll shortly stop running tests on TF 1.x, so these will become dead
code. Can we invert the guard and test against the TF 2.x names instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this chunk of assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that works, if you’re satisfied with the coverage.
@@ -33,7 +33,7 @@ | |||
import numpy as np | |||
import portpicker # pylint: disable=import-error | |||
from six.moves import urllib # pylint: disable=wrong-import-order | |||
import tensorflow as tf # pylint: disable=wrong-import-order | |||
import tensorflow # pylint: disable=wrong-import-order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be import tensorflow.compat.v1 as tf
rather than leaving a
lurking (unused) tensorflow
identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that this will lead to copybara issues, as I've run into in tensorflow (see CL/264270116). The identifier tensorflow
is not unused. It's used on line 47.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use import tensorflow.compat.v1 as tf
extensively in TensorBoard
(you can find existing uses with git grep
), and it doesn’t cause any
Copybara issues.
The identifier
tensorflow
is not unused. It's used on line 47.
Of course, but if you change this to import tensorflow.compat.v1 as tf
then line 47 goes away, and there are no other uses of tensorflow
. The
identifier is “lurking” in that tensorflow
and tf
are both in-scope
identifiers but have different values, which is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I didn't know import tensorflow.compat.v1 as tf
was used extensively in TensorBoard. Done.
@@ -44,11 +44,13 @@ | |||
from tensorboard.plugins.debugger import interactive_debugger_plugin | |||
from tensorboard.util import test_util | |||
|
|||
tf = tensorflow.compat.v1 | |||
tf.disable_v2_behavior() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting: simply adding tf.compat.v1.disable_v2_behavior()
without
the rest of the patch doesn’t suffice to fix the timeouts on my end, so
some of the API differences must(?) be causing these unbounded
timeouts—do we know which ones?
In the short term, this patch is great because it lets us continue to
run the tests, but ideally we’d like to get rid of tf.compat.v1
wherever possible, too (#1718).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment here. The Debugger Plugin is currently tied to v1 behavior (tf.Session
s), which is the reason why it is necessary to pin V1 behavior here. When tfdbg V2 comes along, this will of course no longer be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that makes sense. But if session and node behavior were the only
salient changes then we would expect adding disable_v2_behavior()
to
fix the tests, right? And yet if we make only that change then the
testMultipleSessionRunsTensorValueFullHistory
test case hangs at
various points—sometimes in the listen
call in setUp
, sometimes in
the first call to _serverGet
in the test body. I was wondering why
that is, and why the remaining changes, which just change tf.foo
to
tf.compat.v1.foo
, do fix the tests.
(This isn’t necessarily a blocking question, but I would be interested
to learn how this PR actually works.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed what you said. It seems that making only the change of
- Using
import tensorflow.compat.v1 as tf
- Using
tf.disable_v2_behavior()
leads to hanging tests. This looks like some unexpected behavior in how tf.compat.v1.compat.v1....
behaves.
As you said, this doesn't have to be blocking. I can make a TODO item for myself to investigate it and potentially file bug for tensorflow while merging this PR. LMK WYT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tf.compat.v1.compat.v1
will simply not work, in either TF 1.x or
TF 2.x. But making only the disable_v2_behavior()
change seems like
it should work, because the rest of the code already uses
tf.compat.v1.Session()
explicitly.
This is good to merge; feel free to investigate iff you feel like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disable_v2_behavior()
exists only in tf
from TF1 or tf.compat.v1
from TF2. So to use it, we'd have to do import tf.compat.v1 as tf
regardless (TF1 tests are passing, so they are out of concern). But if you use that and don't change anything below, you'll just end up using tf.compat.v1.compat.v1.*
, which I think is what causes the hanging.
I'll merge the PR. Thanks for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me be concrete: the patch that I’m talking about is simply
diff --git a/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py b/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
index 1b127456..653af030 100644
--- a/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
+++ b/tensorboard/plugins/debugger/interactive_debugger_plugin_test.py
@@ -45,6 +45,9 @@ from tensorboard.plugins.debugger import interactive_debugger_plugin
from tensorboard.util import test_util
+tf.compat.v1.disable_v2_behavior()
+
+
_SERVER_URL_PREFIX = '/data/plugin/debugger/'
against master
prior to this PR (i.e., 2b96c2a).
This properly disables V2 behavior, does not use compat.v1.compat.v1
,
and uses tf.compat.v1.Summary
, but still hangs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And tf.compat.v1.compat.v1
is a hard error, not a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged. Thanks for the concrete info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @wchargin
@@ -37,6 +37,7 @@ | |||
|
|||
import tensorflow as tf | |||
from tensorflow.python import debug as tf_debug | |||
from tensorflow.python import tf2 as tensorflow_python_tf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
self.assertIn(('d', 0, 'DebugIdentity'), gated_debug_ops) | ||
self.assertIn(('d/read', 0, 'DebugIdentity'), gated_debug_ops) | ||
|
||
if not tensorflow_python_tf2.enabled(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this chunk of assertions.
@@ -33,7 +33,7 @@ | |||
import numpy as np | |||
import portpicker # pylint: disable=import-error | |||
from six.moves import urllib # pylint: disable=wrong-import-order | |||
import tensorflow as tf # pylint: disable=wrong-import-order | |||
import tensorflow # pylint: disable=wrong-import-order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that this will lead to copybara issues, as I've run into in tensorflow (see CL/264270116). The identifier tensorflow
is not unused. It's used on line 47.
@@ -44,11 +44,13 @@ | |||
from tensorboard.plugins.debugger import interactive_debugger_plugin | |||
from tensorboard.util import test_util | |||
|
|||
tf = tensorflow.compat.v1 | |||
tf.disable_v2_behavior() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment here. The Debugger Plugin is currently tied to v1 behavior (tf.Session
s), which is the reason why it is necessary to pin V1 behavior here. When tfdbg V2 comes along, this will of course no longer be necessary.
Re-reviewed. |
@@ -44,11 +44,13 @@ | |||
from tensorboard.plugins.debugger import interactive_debugger_plugin | |||
from tensorboard.util import test_util | |||
|
|||
tf = tensorflow.compat.v1 | |||
tf.disable_v2_behavior() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tf.compat.v1.compat.v1
will simply not work, in either TF 1.x or
TF 2.x. But making only the disable_v2_behavior()
change seems like
it should work, because the rest of the code already uses
tf.compat.v1.Session()
explicitly.
This is good to merge; feel free to investigate iff you feel like it.
Thanks for the fix! |
Motivation for features / changes
Technical description of changes