-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Illegal instruction despite building from source #2867
Comments
Hi @chrish935 ! We've seen this before on #2580 -- although I am surprised you're seeing this even on a build from source. I'll take a look |
@chrish935 Thanks for the detailed logs here. Your In the tiledb-r package we accomodate this with a test: ## on linux, we need to consider AVX2 vs non-AVX2 capabilities on the build machine
avx2 <- if (osarg == "linux" && any(grepl("avx2", readLines("/proc/cpuinfo")))) "" else "-noavx2"
## downloads are from GitHub releases
baseurl <- "https://github.com/TileDB-Inc/TileDB/releases/download"
## now switch based on macOS or Linux, using version, architecture, avx2 if needed, version and sha
dlurl <- switch(osarg,
linux = file.path(baseurl,sprintf("%s/tiledb-linux-%s%s-%s-%s.tar.gz", ver, arch, avx2, ver, sha)),
macos = file.path(baseurl,sprintf("%s/tiledb-macos-%s-%s-%s.tar.gz", ver, arch, ver, sha)),
url = urlarg)
cat("downloading", dlurl, "\n") I fear we may never have added that for the tiledbsoma packages, either Python or R, and it could well be that the Python builds you got as wheels also assume you have a newer CPU. |
The puzzle is that this is intending to be a build-from-source situation for TileDB-SOMA, and, cellxgene-census wheel is Python source-only (no binaries): https://pypi.org/project/cellxgene-census/#files That said, it's not clear that this is a from-source build just yet. I'm seeing above:
I believe you would need
Can you try this out? |
Sure thing.
Running pytest
|
|
BTW my apologies about
You did indeed already show
which shows:
As a dev I usually build a different way which produces a lot more output :) -- anyway, what you're doing here looks fine. 🤔 |
Is there a |
Doesn't appear to be, no.
|
OK let me ask around internally for a third pair of 👀 -- thanks @chrish935 🙏 |
Maybe this bit would need a 'have avx2 or not' generalization like the one I show above as this will pull the 'assume AVX2' blob when we also have one for '-noavx2' in each release build: TileDB-SOMA/libtiledbsoma/cmake/Modules/FindTileDB_EP.cmake Lines 92 to 95 in 6b74aa5
|
@chrish935 can you We can do this as an experiment with your help, but, look into automating in the future ... |
Still no dice.
Running pytest
|
@johnkerl Doesn't it need an updated sha1 for the changed file too? |
@eddelbuettel correct -- @chrish935 here
-- although I'm surprised your build didn't fail with a file-not-found / curl error -- which may itself be a clue ... 🤔 I was composing the following message when @eddelbuettel 's message came in -- just now -- please disregard if the filename update helps: @chrish935 if it's not too much trouble to ask you to do another re-run on your hardware -- in
-- if you can change this to
then uninstall and re-install as you've done -- and please save and upload all output to |
That's just the link for the linux
Pytest
I don't see any Lines 11 - 26 of
Lines 118 - 121 of
|
Yes our thinking is that you may need this '-noavx' variant though -- from our 'releases' page -- ue the second tar.gz herecd That being said, and while nobody is watching, you could also play a dirty trick on your machine and replace the libtiledb.so.2.25 you have with this $ tar tvzf tiledb-linux-x86_64-noavx2-2.25.0-bbcbd3f.tar.gz | grep ".so"
-rw-r--r-- root/root 4948 2024-07-24 11:08 include/tiledb/consolidation_plan_experimental.h
-rwxr-xr-x root/root 42252696 2024-07-24 11:37 lib/libtiledb.so.2.25
lrwxrwxrwx root/root 0 2024-07-24 11:42 lib/libtiledb.so -> libtiledb.so.2.25
$ as it carries the 'no-avx2' object code that we hope avoids the 'illegal instruction' on your machine. If you replaced the library and then restarted your script or session to get this one it may succeed. Then we'd know it was the newer instruction set. |
I downloaded I replaced all of the
This is all with the change to the Original Changed Should I change this back at all when attempting the other suggestions? |
@chrish935 thanks for your time on this. What you've done so far should have worked, so in the words of Sherlock Holmes, when it's not any of thing things it can be, it has to be one of the things it can't be. Something is not what it seems:
In any event, we at TileDB need to get a system set up similar to this and debug it. I appreciate your generous time involvement but I think we're at a next step of rolling up the sleeves internally. |
Here's the output from
|
Yes, I did. If it's not too much to ask for you to do "one more thing" which is to re-run with that set -- I would appreciate it. This does generate a lot of output, so if you can gist it and share the link here, that would be wonderful ... |
... just saw your I'm curious about this
and will look into that. What I really want is to get the opcode for the instruction which is illegal and see if it decodes as |
I think the TL;DR there is |
Making this change didn't change what was written to |
Here's the output of
|
No, it should be on the screen ...
I'm happy to take that as a gist, and you can share the link here! :)
This is what I was looking for ... |
For context, illegal instruction can be three things:
We've established it's the 1st, which is good to know |
Glad to hear that we're making some progress! Is there anything else that you'd like me to do, or are y'all kind of taking things from here? |
I was going to try to take it from here until your messages this morning came in which are awesome, thanks! We've established it's genuinely an AVX2 issue which is 💫 . What I'd need to do is get access to a non-AVX2-capable substrate and docker-run from there ... I'll ask around internally & see if anyone has such hardware.
For context, I spun up a Docker image (on an AVX2-equipped substrate, admittedly) and here is a gist: Notes:
|
I can see from your stack trace that the crash is happening at a place where tiledbsoma -- which you're building from source -- is calling directly into the core library, which you've got a |
Here's the gist from building with a more verbose Turns out that I forgot the Note: This is with grabbing a fresh clone of the repo, so we're doing this without changing the download URL in |
Thanks @chrish935 !
Good point-out -- I usually take this for granted ...
Yes -- we definitely want to run this with (manual -- for now) edit in place so we can try to retrieve the (As @eddelbuettel mentioned above, for the tiledb-r package there is auto-discovery of avx2 capability and it would be great for us to do the same in the TileDB-SOMA package, on both the Python and R sides. That said, that would be effectively the same result as you doing a manual edit of the
And there it is -- this may be that "something else". It looks like this is the copy of the core library which has AVX2 instructions in it -- and this one is getting found when you do a from-source install. I'll discuss internally & see what's best to do next, although I suspect an Thanks for the time and the details @chrish935 !! |
Confirming your
... and can you please share the |
So we do want the edit in place for the |
Yeah, I cloned into the following:
Here's the output that you asked for:
|
Hmm
|
Yes, these are the asks -- thank you 🙏 :
|
Here's the gist. After making the change to the URL, build time again shortened to almost instantaneous & did not produce as much output even with the increased verbosity. |
OK this is somehow not being a build from source then ... if it were it would be producing more output, and taking more time I'll need to investigate this on a noavx2 system -- thank you very much for the experiments @chrish935 ... |
Describe the bug
Calling
cellxgene_census.open_soma()
and/orpython3 -m pytest TileDB-SOMA/apis/python/tests
causes anIllegal instruction
error. I've read through the issues here & see that building from source usually seems to be the resolution, but that hasn't worked for me (see below). I've also tried building from release 1.12 & release 1.11, but got the same error at the same steps in the workflow.To Reproduce
Creating environment
Building TileDB-SOMA
Running pytest
Attempting
open_soma()
Versions (please complete the following information):
Python 3.10.12
The text was updated successfully, but these errors were encountered: