Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enhance Parser to Support # %% with Additional Comments #127

Merged
merged 6 commits into from
Jul 16, 2024

Conversation

ESSO0428
Copy link
Contributor

@ESSO0428 ESSO0428 commented Jul 12, 2024

Summary

This pull request enhances the parser to correctly recognize lines that start with # %%, even when followed by additional comments or markdown annotations. This improvement is achieved by using regular expressions.

Detailed Description

  • Enhancement: Refactored the parser to use regex for identifying cells that start with # %% and may include additional comments or markdown annotations.

Changes

  • Added regex-based identification of cells to the parser.
  • Imported the re module to facilitate regex matching.

Motivation

The existing parser did not handle lines starting with # %% when followed by additional comments or markdown annotations. By using regular expressions, the parser can now accurately identify such lines, improving its robustness and flexibility.

Testing

  • Verified that the parser correctly identifies cells starting with # %% followed by additional comments or markdown annotations.
  • Ensured no existing functionality is broken by these changes.

Thank you for considering this pull request. I look forward to your feedback.

Ensure the parser can correctly recognize lines that start with # %%,
even when followed by additional comments or markdown annotations.
This change includes using regular expressions to match the lines.

Refactor the parser to use regex for identifying cells.
Ensure the parser can correctly recognize lines that start with # %%,
even when followed by additional comments or markdown annotations.
This change includes importing the re module to use regular expressions
for matching the lines.

Import the re module and use regex for identifying cells.
@kiyoon
Copy link
Owner

kiyoon commented Jul 13, 2024

Thanks for the PR!

  1. What is the problem you actually want to solve? What are markdown annotations? According to the jupytext document, something like key="value"? I've never seen this being used so far so please let me know how this is useful.
  2. I don't really see why you would add a comment to the line separator. It's not even a standard Jupytext format, right?
  3. That function is called very often. Changing it to regex will likely cause a huge performance drop in some machines. Since Jupynium is different from Jupytext because it requires parsing in real-time changes, I want to keep it lightweight. The changes you made, can be replaced by str.startswith.
  4. There is an equivalent lua implementation that you need to change (maybe in cells.lua).

@ESSO0428
Copy link
Contributor Author

ESSO0428 commented Jul 13, 2024

Thanks for the PR!

  1. What is the problem you actually want to solve? What are markdown annotations? According to the jupytext document, something like key="value"? I've never seen this being used so far so please let me know how this is useful.
  2. I don't really see why you would add a comment to the line separator. It's not even a standard Jupytext format, right?
  3. That function is called very often. Changing it to regex will likely cause a huge performance drop in some machines. Since Jupynium is different from Jupytext because it requires parsing in real-time changes, I want to keep it lightweight. The changes you made, can be replaced by str.startswith.
  4. There is an equivalent lua implementation that you need to change (maybe in cells.lua).

Thank you for your response!

1. What problem are you trying to solve? What are markdown annotations?

Markdown annotation was a miscommunication on my part. What I meant is that # %% should allow adding comments.

I want to enable the code cell separator # %% to have comments following it, similar to the behavior in VS Code. This way, comments can describe the function of the code block, and when the code block is sent to the Python Interactive window, it is still recognized as a code cell. This helps users maintain Python scripts more clearly by understanding the main function of each code cell.

2. Why add comments after the line separator? Isn't this not even standard Jupytext format?

While this is not a standard Jupytext format, users, including myself, often add comments after # %% to improve readability and organization of the code. This helps when reviewing the original Python script to understand each code block better, making it easier to convert the developed features into formal Python scripts.

Here are a few examples from my work:

# %% [Train Pretrained Yolov8 Model]
from ultralytics import YOLO
import os

model = YOLO("yolov8s-seg.pt")  # load a pretrained model (recommended for training)

# Set training parameters
current_path = os.path.dirname(os.path.abspath("__file__"))
params = { ... }
model.train(**params)

# %% [Load Model]
from ultralytics import YOLO

model = YOLO("runs/segment/train/weights/best.pt")

# %% [Prediction]
results = model.predict( ... )

3. This function is called very often. Changing it to a regular expression might cause a significant performance drop on some machines. Since Jupynium is different from Jupytext because it requires parsing real-time changes, I want to keep it lightweight. The changes you made can be replaced by str.startswith.

Thank you for pointing that out. Using regular expressions can indeed affect performance. I have reverted the changes to the original version except for the line.strip() == "# %%" part, which I have replaced with line.startswith("# %%"), discarding the use of regular expressions.

4. The equivalent Lua implementation needs to be changed (possibly in cells.lua).

Since the # %% block conversion is written in buffer.py, implementing this through cells.lua might be challenging. However, I will further investigate whether it is possible to achieve this functionality by modifying cells.lua.

Ensure the parser can correctly recognize lines that start with # %%,
even when followed by additional comments. This change includes modifying 
the line separator check to use `line.startswith("# %%")` instead of `line.strip() == "# %%"`.

Revert other changes and only modify the line separator check for code cell recognition and conversion.
This update also removes the use of regular expressions to prevent potential performance drops.
@kiyoon
Copy link
Owner

kiyoon commented Jul 13, 2024

Okay, I don't mind supporting this feature.

However, I'd like to point out that your current implementation is broken.

  1. # %%timeit is a magic command, and should not be treated as a cell separator. You need to check whether there is a space between the comment and the percent.
  2. The line_type in cells.lua also needs a change accordingly.

https://github.com/kiyoon/jupynium.nvim/blob/1e572965813719c9ef123f777d79814e91d1a2cf/lua/jupynium/cells.lua#L6C12-L23

@ESSO0428
Copy link
Contributor Author

ESSO0428 commented Jul 13, 2024

Okay, I don't mind supporting this feature.

However, I'd like to point out that your current implementation is broken.

  1. # %%timeit is a magic command, and should not be treated as a cell separator. You need to check whether there is a space between the comment and the percent.
  2. The line_type in cells.lua also needs a change accordingly.

https://github.com/kiyoon/jupynium.nvim/blob/1e572965813719c9ef123f777d79814e91d1a2cf/lua/jupynium/cells.lua#L6C12-L23

Thank you for pointing out the issue.

I understand that the current implementation mistakenly treats magic commands like # %%timeit as cell separators. To fix this, I have made the following changes:

Changes

Python (buffer.py)

I have modified the cell separator check from:

elif line.startswith("# %%"):

to

elif line[:5].strip() == "# %%":

This ensures that only lines starting with # %% followed by a space or no additional string are treated as cell separators when syncing to Jupyter.

Lua (cells.lua)

I have modified the line type detection from:

elseif vim.fn.trim(line) == "# %%" then

to

elseif vim.fn.trim(string.sub(line, 1, 5)) == "# %%" then

This ensures that only lines starting with # %% followed by a space or no additional string are treated as cell separators, avoiding the misidentification of magic commands when syncing to Jupyter.

These changes should correctly distinguish between cell separators and magic commands, maintaining the functionality while preventing performance issues by not using regular expressions.

Thank you for your guidance and support for this feature.

This commit corrects the cell separator detection that was broken
in the previous commit. The previous commit allowed lines starting
with "# %%" followed by additional comments to be treated as cell
separators. However, this broke the functionality for magic commands
like "# %%timeit", which should not be treated as cell separators.

The fix involves checking whether there is a space after "# %%" before
treating it as a cell separator. This is done by changing the line
separator check in Lua files to `vim.fn.trim(string.sub(line, 1, 5)) == "# %%"`.

Additionally, it was discovered that `line.startswith("# %%")` in Python
files also misidentified magic commands as cell separators when syncing
to Jupyter. Therefore, this has been corrected to `line[:5].strip() == "# %%"`.

This change ensures that magic commands are correctly recognized and
not treated as cell separators.
@kiyoon
Copy link
Owner

kiyoon commented Jul 15, 2024

Okay, thank you for your work!

To finalise the PR, I think one last step remains.

Add comments

Since the implementation is a little awkward and hard to understand, I'd like a comment linking to this PR and a brief description.

# Treat '# %% Cell Title' as a code cell
# But not magic commands such as '# %%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127

To both lua and the python files.

Add tests

In tests/test_buffer.py, I'd like you to add a new test. You can copy an existing test, but do not modify the existing ones.

For example,

# below test_buffer_1()
def test_buffer_with_cell_title_1():
    buffer = JupyniumBuffer(["a", "b", "c", "# %% Cell Title", "d", "e", "f"])
    assert buffer.num_rows_per_cell == [3, 4]
    assert buffer.cell_types == ["header", "code"]

# below test_magic_command_1()
def test_double_magic_command_1():
    """
    Everything else except magic commands should be preserved after __init__() or fully_analysed_buf().
    """
    lines = ["a", "b", "c", "# %% Cell Title", "# %%timeit", "e", "f"]
    buffer = JupyniumBuffer(lines)
    code_cell_content = buffer.get_cell_text(1)
    assert code_cell_content == "%%timeit\ne\nf"

I think this should be sufficient.

@ESSO0428
Copy link
Contributor Author

ESSO0428 commented Jul 15, 2024

Okay, thank you for your work!

To finalise the PR, I think one last step remains.

Add comments

Since the implementation is a little awkward and hard to understand, I'd like a comment linking to this PR and a brief description.

# Treat '# %% Cell Title' as a code cell
# But not magic commands such as '# %%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127

To both lua and the python files.

Add tests

In tests/test_buffer.py, I'd like you to add a new test. You can copy an existing test, but do not modify the existing ones.

For example,

# below test_buffer_1()
def test_buffer_with_cell_title_1():
    buffer = JupyniumBuffer(["a", "b", "c", "# %% Cell Title", "d", "e", "f"])
    assert buffer.num_rows_per_cell == [3, 4]
    assert buffer.cell_types == ["header", "code"]

# below test_magic_command_1()
def test_double_magic_command_1():
    """
    Everything else except magic commands should be preserved after __init__() or fully_analysed_buf().
    """
    lines = ["a", "b", "c", "# %% Cell Title", "# %%timeit", "e", "f"]
    buffer = JupyniumBuffer(lines)
    code_cell_content = buffer.get_cell_text(1)
    assert code_cell_content == "%%timeit\ne\nf"

I think this should be sufficient.

Thank you for your feedback.

I have added the requested comments in cells.lua and buffer.py to explain the new
feature that treats lines starting with # %% followed by any text (except magic commands) as code cells. This is done by ensuring there is a space after # %% to avoid misinterpreting magic commands.

# Treat '# %% Cell Title' as a code cell
# But not magic commands such as # '%%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127

Additionally, I have added the following unit tests in tests/test_buffer.py:

  1. test_buffer_with_cell_title_1 : This test ensures that the buffer correctly handles lines starting with # %% Cell Title as code cells.
  2. test_double_magic_command_1 : This test ensures that magic commands like # %%timeit are preserved correctly and not treated as cell separators.

I have run the tests with pytest -m . to ensure the changes work as expected and do not break existing functionality.

Please review the changes and let me know if there are any further adjustments needed.

Thank you for your guidance and support.

This commit adds comments in `cells.lua` and `buffer.py` to explain
the new feature that treats lines starting with '# %%' followed by
any text (except magic commands) as code cells.

See: kiyoon#127
This commit adds unit tests in `test_buffer.py` for the new feature in
`buffer.py` and `cells.lua` that treats lines starting with '# %%' followed
by any text (except magic commands) as code cells. The tests, named
`test_buffer_with_cell_title_1` and `test_double_magic_command_1`, ensure
that the buffer correctly identifies and handles these cells. The tests
were run with `pytest -m .` to ensure the changes work as expected and
do not break existing functionality.

See: kiyoon#127
@kiyoon kiyoon merged commit d93164f into kiyoon:master Jul 16, 2024
10 checks passed
@kiyoon kiyoon changed the title Enhance Parser to Support # %% with Additional Comments feat: Enhance Parser to Support # %% with Additional Comments Jul 16, 2024
kiyoon pushed a commit that referenced this pull request Jul 16, 2024
* refactor(parser): support # %% with additional comments
* fix(parser): import re for regex matching
* refactor(parser): support # %% with additional comments
* fix: correct cell separator detection for magic commands
* docs: add comments for '# %% Cell Title' feature
* test: add tests for '# %% Cell Title' in test_buffer.py
@kiyoon
Copy link
Owner

kiyoon commented Jul 16, 2024

Thank you for this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants