Fix fromjson() to support reading from stdin #667

yaniv-aknin · 2024-04-25T05:42:53Z

Trivial PR to support reading from stdin (source=None) in fromjson().

Two things you may want me to do before merging -

I didn't add tests because there are no tests I could find for the use of source=None in any io module (e.g., also not in csv etc). Generally, perhaps I missed them but I didn't see tests that fork/exec the bin/petl executable at all...?
Add source=None to other modules. I ran git grep 'def from' | grep source | grep -v source=None and found a few more that miss it, let me know if you'd like me to add it to any/all of them.
Modules which appear to be missing source=None: avro, bcolz, pytables (maybe), xml

Checklist

Use this checklist to ensure the quality of pull requests that include new code and/or make changes to existing code.

coveralls · 2024-04-25T05:46:45Z

Pull Request Test Coverage Report for Build 9050397434

Details

11 of 11 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 91.105%

Totals
Change from base Build 8715866607:	0.03%
Covered Lines:	13366
Relevant Lines:	14671

💛 - Coveralls

juarezr · 2024-04-25T20:01:44Z

petl/io/json.py

@@ -16,7 +16,7 @@
 from petl.util.base import data, Table, dicts as _dicts, iterpeek


-def fromjson(source, *args, **kwargs):
+def fromjson(source=None, *args, **kwargs):


@yaniv-aknin

Code scanning / Prospector (reported by Codacy)

Keyword argument before variable positional arguments list in the definition of fromjson function (keyword-arg-before-vararg)

This wouldn't break existing code?

Maybe read_source_from_arg can be taught to read even when stdin is passed as argument as in:

table1 = etl.fromjson(sys.stdin, header=["foo", "bar"]) print(table1)

I might be wrong, but I don't think it should break existing code.

All these assertions pass -

>>> def f(s, *a, **kw): ... return (s, a, kw) ... >>> def fn(s=None, *a, **kw): ... return (s, a, kw) ... >>> assert f("hello") == fn("hello") >>> assert f(None) == fn(None) >>> assert f("hello", 1, 2) == fn("hello", 1, 2) >>> assert f("hello", k=1) == fn("hello", k=1) >>> assert f(s="hello") == fn(s="hello") >>> assert f("hello", 1, 2, k=1, j=2) == fn("hello", 1, 2, k=1, j=2)

re. read_source_from_arg() - my interest is in the petl executable, i.e., I would love for syntax like this to work in shell: echo '{"a":2, "b":4}' | petl 'fromjson(lines=True)'.

What do you think?

I think that:

Using petl as a command line tool is an interesting use case and we should give it some love by:
a. Adding more examples in the documentation.
b. Mention this usage pattern in the repository Readme/Frontage

Currently fromjson() source argument is not consistent with other functions like:
a. tojson() and tojsonarrays()
b. fromcsv()

We value not breaking existing code.
a. But I haven't had the time to check if this is the case yet.
b. If not it will be a good change for API consistency
c. I would love to hear more opinions about this change.

petl/io/json.py

juarezr · 2024-04-28T20:43:21Z

Adding to the investigation:

The fromjson() parameters args and kwargs are forwarded to the json library call:

def fromjson(source, *args, **kwargs):
    """..."""
   ...

...
dicts = json.load(f, *self.args, **self.kwargs)
...

juarezr · 2024-04-28T21:06:53Z

But the json library calls used take only one positional argument followed by others keyword arguments:

 def json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    ...

def  json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
   ...

It looks like the args parameter is simply ignored by json.load and json.loads calls.

Now I'm wondering if the function syntax shouldn't be:

def fromjson(source=None, *, **kwargs):
    """..."""
   ...

juarezr · 2024-05-04T16:48:02Z

@yaniv-aknin,

After looking at the source code, I've concluded that your proposal looks good.

Also, some considerations:

It seems like some functions are not ready to use stdin or even can't work with stdin at all. For example:
- pytables function fromhdf5
- bcolz function frombcolz
As fromjson receives, but doesn't use the args parameter, maybe it would be interesting to change the function syntax/signature as described above.
This would be a good case for adding some test cases to avoid future regressions as the test suite is the major guarantee of stability for petl. The test cases will ensure that petl executable won't stop working after new features are merged.

So, what are your plans for this PR?

yaniv-aknin · 2024-05-04T17:24:58Z

Thanks @juarezr , this sounds good.

I'd be happy to add source=None to more cases, where it'd work easily. Where it'd take a more substantial refactoring (or even just infeasible), I'd leave it unchanged.
I'm mildly indifferent about changing the signature, but don't mind doing it of course. If (sources=None, *, **kwargs) looks good to you, great, I'll do that.
I'm also happy to add some tests that fork/exec petl, and in particular will test the "pipe from stdin" case.
I propose to -
- Create a new test module petl/test/test_script.py
- Have it find bin/petl in the package directory and fork/exec it using the subprocess module, testing results
- Add a few simple test cases around that
- Make the above pass in petl's supported environments (2.7, 3.6-3.12)

Does that sound reasonable and I can update the PR?

juarezr · 2024-05-06T16:49:05Z

After checking it, I've found that the change in signature won't work:

>>> def fromjson(source=None, *, **kwargs):
...    print("source:", source, "kwargs:", kwargs) 
... 
  File "<stdin>", line 1
SyntaxError: named arguments must follow bare *

So we should keep the args argument as it is currently, and proceed.

Eventually, we could remove the self.args property and the forwarding to json.load functions as it is pointless in:

dicts = json.load(f, *self.args, **self.kwargs)

But it shouldn't be mandatory.

- adds a simple test invoking the petl executable - installs the package in CI so the executable is available.

petl/test/test_executable.py

+    """, shell=True, check=True, capture_output=True)
+    assert result.stdout == b'foo\r\na\r\n'
+
+def test_json_stdin():


petl/test/test_executable.py

@@ -0,0 +1,22 @@
+from __future__ import print_function, division, absolute_import


petl/test/test_executable.py

+
+import subprocess
+
+def test_executable():


petl/test/test_executable.py

+
+import subprocess
+
+def test_executable():


petl/test/test_executable.py

@@ -0,0 +1,22 @@
+from __future__ import print_function, division, absolute_import


petl/test/test_executable.py

+    """, shell=True, check=True, capture_output=True)
+    assert result.stdout == b'foo\r\na\r\n'
+
+def test_json_stdin():


petl/test/test_executable.py

@@ -0,0 +1,22 @@
+from __future__ import print_function, division, absolute_import
+
+import subprocess


petl/test/test_executable.py

+        ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) |
+        petl 'fromjson(lines=True).tocsv()'
+    """, shell=True)
+    assert result == b'foo,bar\r\na,b\r\nc,d\r\n'


petl/test/test_executable.py

+        (echo foo,bar ; echo a,b; echo c,d) |
+        petl 'fromcsv().cut("foo").head(1).tocsv()'
+    """, shell=True)
+    assert result == b'foo\r\na\r\n'


petl/test/test_executable.py

+        echo '[{"foo": "a", "bar": "b"}]' |
+        petl 'fromjson().tocsv()'
+    """, shell=True)
+    assert result == b'foo,bar\r\na,b\r\n'


petl/test/test_executable.py

+    assert result == b'foo\r\na\r\n'
+
+def test_json_stdin():
+    result = subprocess.check_output("""


petl/test/test_executable.py

+import subprocess
+
+def test_executable():
+    result = subprocess.check_output("""


petl/test/test_executable.py

+        echo '[{"foo": "a", "bar": "b"}]' |
+        petl 'fromjson().tocsv()'
+    """, shell=True)
+    assert result == b'foo,bar\r\na,b\r\n'


petl/test/test_executable.py

+    result = subprocess.check_output("""
+        echo '[{"foo": "a", "bar": "b"}]' |
+        petl 'fromjson().tocsv()'
+    """, shell=True)


petl/test/test_executable.py

+        petl 'fromjson().tocsv()'
+    """, shell=True)
+    assert result == b'foo,bar\r\na,b\r\n'
+    result = subprocess.check_output("""


petl/test/test_executable.py

+        (echo foo,bar ; echo a,b; echo c,d) |
+        petl 'fromcsv().cut("foo").head(1).tocsv()'
+    """, shell=True)
+    assert result == b'foo\r\na\r\n'


petl/test/test_executable.py

+        ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) |
+        petl 'fromjson(lines=True).tocsv()'
+    """, shell=True)
+    assert result == b'foo,bar\r\na,b\r\nc,d\r\n'


petl/test/test_executable.py

+    result = subprocess.check_output("""
+        (echo foo,bar ; echo a,b; echo c,d) |
+        petl 'fromcsv().cut("foo").head(1).tocsv()'
+    """, shell=True)


petl/test/test_executable.py

+    result = subprocess.check_output("""
+        ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) |
+        petl 'fromjson(lines=True).tocsv()'
+    """, shell=True)


yaniv-aknin · 2024-05-12T09:36:03Z

I've made some progress, but not all checks pass yet.

Python2.7 -- probably my fault, I need to look further
Windows tests -- not sure, if I can sort out a windows test environment I will do it
macOS tests -- seem broken unrelated to me
security test complaining about shell=True -- I disagree with the check, I am literally testing shell behaviour; I could do this differently, but this is short readable and (in test context) perfectly safe
No docstring - don't think they're needed for something like this...?

I'll try to look into this again next week, but admit I'm beginning to question how much time I can put into this; small fix, lots of overhead 😅

github-advanced-security bot found potential problems Apr 25, 2024

View reviewed changes

petl/io/json.py Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems Apr 25, 2024

View reviewed changes

petl/io/json.py Dismissed Show dismissed Hide dismissed

juarezr requested review from arturponinski and a team April 28, 2024 20:05

yaniv-aknin force-pushed the master branch from 5091beb to 0f22627 Compare May 12, 2024 08:56

ci: Test the petl executable

9fe7939

- adds a simple test invoking the petl executable - installs the package in CI so the executable is available.

yaniv-aknin force-pushed the master branch from 0f22627 to c3f650c Compare May 12, 2024 09:01

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

petl/test/test_executable.py Fixed Show fixed Hide fixed

petl/test/test_executable.py Fixed Show fixed Hide fixed

petl/test/test_executable.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

Fix fromjson() to support reading from stdin

60818d5

yaniv-aknin force-pushed the master branch from c3f650c to 60818d5 Compare May 12, 2024 09:27

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fromjson() to support reading from stdin #667

Fix fromjson() to support reading from stdin #667

yaniv-aknin commented Apr 25, 2024

coveralls commented Apr 25, 2024 •

edited

Loading

juarezr Apr 25, 2024 •

edited

Loading

yaniv-aknin Apr 25, 2024

juarezr Apr 28, 2024

juarezr commented Apr 28, 2024

juarezr commented Apr 28, 2024

juarezr commented May 4, 2024 •

edited

Loading

yaniv-aknin commented May 4, 2024

juarezr commented May 6, 2024

yaniv-aknin commented May 12, 2024

		@@ -0,0 +1,22 @@
		from __future__ import print_function, division, absolute_import

		@@ -0,0 +1,22 @@
		from __future__ import print_function, division, absolute_import

		import subprocess

Fix fromjson() to support reading from stdin #667

Are you sure you want to change the base?

Fix fromjson() to support reading from stdin #667

Conversation

yaniv-aknin commented Apr 25, 2024

Checklist

coveralls commented Apr 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9050397434

Details

💛 - Coveralls

juarezr Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

yaniv-aknin Apr 25, 2024

Choose a reason for hiding this comment

juarezr Apr 28, 2024

Choose a reason for hiding this comment

juarezr commented Apr 28, 2024

juarezr commented Apr 28, 2024

juarezr commented May 4, 2024 • edited Loading

yaniv-aknin commented May 4, 2024

juarezr commented May 6, 2024

yaniv-aknin commented May 12, 2024

coveralls commented Apr 25, 2024 •

edited

Loading

juarezr Apr 25, 2024 •

edited

Loading

juarezr commented May 4, 2024 •

edited

Loading