Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault from readline(); avoid unnecessary string copy #1665

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

t-kalinowski
Copy link
Member

I encountered a segmentation fault from reticulate's internal readline() function while evaluating interleaved R and Python chunks in RStudio within a Quarto document.

gdb backtrace
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:394
#1  0x00005adcfdb76186 in  ()
#2  0x00005adcfdb8006b in  ()
#3  0x00005adcfe1c229d in  ()
#4  0x00005adcfe1c2599 in  ()
#5  0x00005adcfe1c5218 in  ()
#6  0x00005adcfe1f36dc in  ()
#7  0x00005adcfe1f4f79 in  ()
#8  0x00005adcfdb49749 in  ()
#9  0x00005adcfdb311b2 in  ()
#10 0x00005adcfdb3427b in  ()
#11 0x00005adcfe6731f1 in  ()
#12 0x00007a0ecf92a29f in readline(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (prompt=">>> ") at readline.cpp:14
#13 0x00007a0ecf8f9d43 in _reticulate_readline(SEXP) (promptSEXP=0x5add0ae0ddf8) at /home/tomasz/R/x86_64-pc-linux-gnu-library/4.4/Rcpp/include/Rcpp/InputParameter.h:74
#14 0x00007a0ed5d03a0e in R_doDotCall (fun=fun@entry=0x7a0ecf8f9cc0 <_reticulate_readline(SEXP)>, nargs=nargs@entry=1, cargs=cargs@entry=0x7ffd53ead6d0, call=call@entry=0x5add0a2ab838) at dotcode.c:754
#15 0x00007a0ed5d46d88 in bcEval_loop (ploc=<optimized out>) at eval.c:8691
#16 0x00007a0ed5d5a89d in bcEval (rho=0x5add1c392538, body=0x5add09f75990) at eval.c:7524
#17 bcEval (body=0x5add09f75990, rho=0x5add1c392538) at eval.c:7509
#18 0x00007a0ed5d5ac0b in Rf_eval (e=e@entry=0x5add09f75990, rho=rho@entry=0x5add1c392538) at eval.c:1167
#19 0x00007a0ed5d5cddf in R_execClosure
    (call=call@entry=0x5add1c392340, newrho=newrho@entry=0x5add1c392538, sysparent=<optimized out>, rho=rho@entry=0x5adcffc8e728, arglist=arglist@entry=0x5adcffc9d890, op=op@entry=0x5add09f715d8) at eval.c:2398
#20 0x00007a0ed5d5dbc7 in applyClosure_core (call=call@entry=0x5add1c392340, op=op@entry=0x5add09f715d8, arglist=0x5adcffc9d890, rho=rho@entry=0x5adcffc8e728, suppliedvars=<optimized out>, unpromise=unpromise@entry=TRUE)
    at eval.c:2311
#21 0x00007a0ed5d5ad3c in Rf_applyClosure (unpromise=TRUE, suppliedvars=<optimized out>, rho=0x5adcffc8e728, arglist=<optimized out>, op=0x5add09f715d8, call=0x5add1c392340) at eval.c:2333
#22 Rf_eval (e=e@entry=0x5add1c392340, rho=rho@entry=0x5adcffc8e728) at eval.c:1285
#23 0x00007a0ed5d913ea in Rf_ReplIteration (rho=rho@entry=0x5adcffc8e728, savestack=savestack@entry=0, browselevel=browselevel@entry=0, state=state@entry=0x7ffd53eae1d0) at main.c:262
#24 0x00007a0ed5d91780 in R_ReplConsole (rho=0x5adcffc8e728, savestack=savestack@entry=0, browselevel=browselevel@entry=0) at main.c:314
#25 0x00007a0ed5d91840 in run_Rmainloop () at main.c:1216
#26 0x00005adcfe698ec1 in  ()
#27 0x00005adcfe66c1fa in  ()
#28 0x00005adcfdac25b4 in  ()
#29 0x00007a0ed4c29d90 in __libc_start_call_main (main=main@entry=0x5adcfdabe7a0, argc=argc@entry=11, argv=argv@entry=0x7ffd53eb06d8) at ../sysdeps/nptl/libc_start_call_main.h:58
#30 0x00007a0ed4c29e40 in __libc_start_main_impl (main=0x5adcfdabe7a0, argc=11, argv=0x7ffd53eb06d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd53eb06c8)
    at ../csu/libc-start.c:392
#31 0x00005adcfdb02445 in  ()

While I'm not entirely sure this change completely resolves the segfault (as I didn’t have a reliable reproducible example beforehand), this PR represents an improvement, and I can no longer reproduce the segfault.

This PR updates reticulate's internal readline() to eliminate an unnecessary allocation and copy of the input in std::string().

As far as I can tell, the reason we maintain our own readline() function, rather than using base::readline(), is due to the limited input size in base R, which truncates input at 256 characters. In contrast, our implementation accepts up to 8192 characters (and this limit can be easily adjusted).

@t-kalinowski t-kalinowski merged commit 74627cd into main Sep 9, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant