Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse environments #56

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Parse environments #56

wants to merge 10 commits into from

Conversation

Kolaru
Copy link
Owner

@Kolaru Kolaru commented Mar 13, 2022

Superseed #50 (I did not find how I could easily update it directly) and is a first step into implementing #48.

This is still missing tests.

Currently the parsing results in the following:

  • env expressions have the name of the environment as first argument and the rows of the env (separated by \\) as follow up arguments
  • Each row is a env_row expr, with cells (separated by &) as arguments
  • Cells represented by env_cell expr can contain arbitrary latex constructs, just like a group expr can

The parser performs no check on the name of the environment nor on the structure of the content (e.g. making sure rows have a consistent number of columns). I am not sure whether these checks are better done at parser or layouting level...

@TheCedarPrince I'd love your opinion on that. If the result of the parsing seem reasonnable to you, I will add tests and merge it. Layouting the content of env can be done in a subsequent PR.

Example:

julia> texparse(L"\begin{matrix} x & x_2 \\ y + 2 \end{matrix}")
TeXExpr :expr
└─ TeXExpr :env
   ├─ "matrix"
   ├─ TeXExpr :env_row
   │  ├─ TeXExpr :env_cell
   │  │  └─ TeXExpr :char
   │  │     └─ 'x'
   │  └─ TeXExpr :env_cell
   │     └─ TeXExpr :decorated
   │        ├─ TeXExpr :char
   │        │  └─ 'x'
   │        ├─ TeXExpr :digit
   │        │  └─ '2'
   │        └─ nothing
   └─ TeXExpr :env_row
      └─ TeXExpr :env_cell
         ├─ TeXExpr :char
         │  └─ 'y'
         ├─ TeXExpr :spaced
         │  └─ TeXExpr :symbol
         │     └─ '+'
         └─ TeXExpr :digit
            └─ '2'

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2022

Codecov Report

Merging #56 (6a088c2) into master (2dd00d8) will decrease coverage by 2.13%.
The diff coverage is 20.51%.

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   75.72%   73.59%   -2.14%     
==========================================
  Files          10        9       -1     
  Lines         548      568      +20     
==========================================
+ Hits          415      418       +3     
- Misses        133      150      +17     
Impacted Files Coverage Δ
src/parser/commands_registration.jl 83.33% <0.00%> (-1.18%) ⬇️
src/parser/texexpr.jl 46.66% <0.00%> (-33.34%) ⬇️
src/parser/parser.jl 60.32% <32.00%> (-12.47%) ⬇️
src/engine/texelements.jl 77.21% <0.00%> (-2.01%) ⬇️
src/engine/fonts.jl 78.94% <0.00%> (-1.06%) ⬇️
src/prototype.jl

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2dd00d8...6a088c2. Read the comment docs.

@TheCedarPrince
Copy link

Hey @Kolaru,

Thank you for the update here!
I have been meaning to respond but have been immensely busy in the last month or so with travel and grad school applications!
That said, I did some tinkering with this PR after I checked it out on my fork!

Here is a very quick proof of concept I was exploring to see about parsing out from the parsed text a valid Julia matrix:

using MathTeXEngine

expr = texparse(L"""\begin{matrix}1 & ϕ \\3 & 4\end{matrix}""")

rows = []
if expr.args[1].head == :env
	env_type = expr.args[1].args[1]
	body = expr.args[1].args[2:end]
	if env_type == "matrix"
		for row in body
			cells = []
			for cell in row.args
				for expression in cell.args
					if expression.head == :digit
					push!(cells, parse(Int64, expression.args[1]))
					else
					push!(cells, expression.args[1])
					end
				end
			end
			push!(rows, cells)
		end
	end
end

rows = mapreduce(permutedims, vcat, rows)

For this little example, I can recover the original matrix in Julia notation:

> 2×2 Matrix{Any}:
 1   'ϕ'
 3  4   

After having explored and experimented with the PR some, here are my thoughts:

  1. I really appreciate the generality of the environment parsing mechanism at work here in the parser.
    It makes adding new environments to the engine, I would imagine, much easier to access as a developer.
  2. Regarding checks, I would say it may make sense to have two simple checks in the parser.
    Ideally the first check could be to look up what sort of environment this LaTeX string represents and associated rules of construction (i.e. a valid Matrix must have the same number of cells per row).
    Then, a follow up check could be done while the parsing is occurring against the rules for a given environment so that way time is not wasted in parsing everything then checking if it is valid for a user.
    With the example you gave showing it being an invalid matrix, I could imagine a cell_counter for matrix environments running per row where counting cells of the first row increments the cell_counter and then subsequent rows' cells are counted and compared against cell_counter to ensure that each row has the maximum (i.e. same) number of cells per row.
    To me more simple checks at the parser level make sense.
  3. Excited to start talking about layout!
    Eager to see how we could potentially handle a cell that contains an expression like 2x + Z - I imagine we could store that as a Julia Expr like, :(2x + Z).

Otherwise, AMAZING stuff!
Thanks for the work and for tagging me!
Life has settled back down so should have more time to comment here and there. :)

@Kolaru
Copy link
Owner Author

Kolaru commented Mar 28, 2022

I have added several changes to take in account your comments.

  1. Check that the env has a known name. Currently I just accept matrix, pmatrix and bmatrix.
  2. Put everything in a matrix after parsing. It seems like LaTeX accept different column length, so I just pad it with empty cells (spaces with 0 width) to match the longest row. Also I put full TeXExpr in the matrix, I don't think there is a need to turn them back to string.
  3. Add a compact show method for TeXExpr so that the matrix of TexExpr is readable. The layout is a bit broken, and I have no idea why.

In example it gives

julia> expr = texparse(L"""\begin{matrix}a + \lim_i^j & \alpha \\ u_3 & \sqrt{2}\end{matrix}""")
TeXExpr :expr
└─ TeXExpr :env
   ├─ "matrix"
   └─ Matrix{TeXExpr}
      ├─ TeXExpr :group
      │  ├─ TeXExpr :char
      │  │  └─ 'a'
      │  ├─ TeXExpr :spaced
      │  │  └─ TeXExpr :symbol
      │  │     └─ '+'
      │  └─ TeXExpr :underover
      │     ├─ TeXExpr :function
      │     │  └─ "lim"
      │     ├─ TeXExpr :char
      │     │  └─ 'i'
      │     └─ TeXExpr :char
      │        └─ 'j'
      ├─ TeXExpr :decorated
      │  ├─ TeXExpr :char
      │  │  └─ 'u'
      │  ├─ TeXExpr :digit
      │  │  └─ '3'
      │  └─ nothing
      ├─ TeXExpr :symbol
      │  └─ 'α'
      └─ TeXExpr :sqrt
         └─ TeXExpr :digit
            └─ '2'

# Just grab the matrix of arguments
julia> expr.args[1].args[2]
2×2 Matrix{TeXExpr}:
 TeX"a + \lim_i^j"    TeX"α"
 TeX"u_3"
                                        TeX"\sqrt{2}"

@musoke
Copy link

musoke commented Jul 21, 2023

I was able to rebase this on the current master with minimal fuss. It looks like it still needs some tests and the layout implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants