Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linalg test fails with Fedora 19 system LAPACK #5050

Closed
nalimilan opened this issue Dec 7, 2013 · 13 comments
Closed

linalg test fails with Fedora 19 system LAPACK #5050

nalimilan opened this issue Dec 7, 2013 · 13 comments
Labels
domain:building Build system, or building Julia or its dependencies kind:bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@nalimilan
Copy link
Member

Another failing test with the system Fedora 19 libraries. LAPACK is the classic Netlib one, version 3.4.2 (the same Julia would normally download). The error message is not quite obvious, but of course I can run more tests if needed.

$ make sparse
    JULIA test/sparse
     * sparse
exception on 1: ERROR: test error during :((maximum(abs(-(\($(Expr(:', :a)),b),\(dense($(Expr(:', :a))),b))))<*(1000,eps())))
LAPACKException(-9223371753386934272)
 in getrf! at linalg/lapack.jl:376
 in LU at linalg/factorization.jl:144
 in \ at linalg/dense.jl:518
 in \ at linalg/dense.jl:504
 in anonymous at test.jl:53
 in do_test at test.jl:37
 in anonymous at no file:82
 in runtests at /home/makerpm/rpmbuild/BUILD/julia-0.2.0/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
at sparse.jl:89
ERROR: test error during :((maximum(abs(-(\($(Expr(:', :a)),b),\(dense($(Expr(:', :a))),b))))<*(1000,eps())))
LAPACKException(-9223371753386934272)
 in getrf! at linalg/lapack.jl:376
 in LU at linalg/factorization.jl:144
 in \ at linalg/dense.jl:518
 in \ at linalg/dense.jl:504
 in anonymous at test.jl:53
 in do_test at test.jl:37
 in anonymous at no file:82
 in runtests at /home/makerpm/rpmbuild/BUILD/julia-0.2.0/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
at sparse.jl:89
at /home/makerpm/rpmbuild/BUILD/julia-0.2.0/test/runtests.jl:21

make: *** [sparse] Erreur 1
@andreasnoack
Copy link
Member

This one looks like a 32/64 bit library mix error. Probably the system LAPACK and BLAS are 32 bit so you'll have to recompile julia with USE_BLAS64=0.

@jiahao
Copy link
Member

jiahao commented Dec 7, 2013

Perhaps we should emit a more helpful error message here. Perhaps putting something like

try
    ... #the whole LinAlg module
catch e
    e.info & (2^32-1) == 0 && error("It looks like BLAS/LAPACK is returning 32-bit integers but Julia is expecting 64-bit integers.")
end

somewhere in LinAlg? I'm not sure how to register a module-level error handler though.

@ViralBShah
Copy link
Member

That seems a bit excessive, even if it were possible. We do need a way to detect this though. Is there a way we can call a generic LAPACK routine to detect this (perhaps a trick?) during julia initialisation?

That said, it is preferable to use Julia's provided BLAS and LAPACK libraries, as we track these a lot more carefully for bugs and apply patches compared with the versions that ship in linux distros.

@ViralBShah
Copy link
Member

How about we intentionally pass a large argument to an LAPACK routine during initialisation, and check the LAPACKException?

@jiahao
Copy link
Member

jiahao commented Dec 8, 2013

Actually that's a good point. It's better to check during startup. We can do something very simple to throw an error like ask to Cholesky factor [1 0; 0 -1], which should fail with error code +2, then check if the error code is bit-shifted 32 places. You don't even need to catch an Exception, simply check the info code.

@nalimilan
Copy link
Member Author

Ah, I had missed the possibility of a 32/64 bits discrepancy. Indeed, with USE_BLAS64=0, this works.

This is weird because I do not have any 32-bit BLAS or LAPACK installed:

$ locate libblas.so
/usr/lib64/libblas.so
/usr/lib64/libblas.so.3
/usr/lib64/libblas.so.3.4
/usr/lib64/libblas.so.3.4.2

$ locate liblapack.so
/usr/lib64/liblapack.so
/usr/lib64/liblapack.so.3
/usr/lib64/liblapack.so.3.4
/usr/lib64/liblapack.so.3.4.2
/usr/lib64/R/lib/liblapack.so.3
/usr/lib64/atlas/liblapack.so
/usr/lib64/atlas/liblapack.so.3
/usr/lib64/atlas/liblapack.so.3.0

$ file /usr/lib64/libblas.so.3.4.2
/usr/lib64/libblas.so.3.4.2: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x41bc1619c2debac61b835e4e9620549e239484c0, stripped

$ file /usr/lib64/liblapack.so.3.4.2
/usr/lib64/liblapack.so.3.4.2: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x89e14e96211a3817c8016ff3d375ed74a4c66c3e, stripped

$ file /usr/lib64/atlas/liblapack.so.3.0
/usr/lib64/atlas/liblapack.so.3.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x7bb5c8a40e85957c76dacbb0465e184f717bd40c, stripped

See https://gist.github.com/nalimilan/7858024 for the output of Julia with LD_DEBUG=files.

@ViralBShah
Copy link
Member

The terminology is confusing. When we talk of 32-bit BLAS, we are referring to a BLAS compiled with integers being 32-bit wide inside the BLAS library. This is usually the default on all platforms. In Julia, we compile our BLAS and LAPACK to support 64-bit integer values for sizes of arrays and such on 64-bit architectures.

Thus, even though your libraries are built on a 64-bit architecture, they are still using 32-bit integer values internally.

@nalimilan
Copy link
Member Author

Indeed, this was confusing. ;-) Again, a short word in README.md would greatly help.

So I'm going to keep using USE_BLAS64=0 for now, this would have to be discussed with Fedora people working on the different BLASes. At the moment, Atlas also takes precedence over Openblas due to a file in /etc/ld.so.conf.d, so we would have to make sure Atlas used 64-bit too. For this reason, detecting this on start would indeed be very useful - there will always be distributions where the BLAS that is used at runtime does not comply with the Julia options chosen at build time.

@staticfloat
Copy link
Sponsor Member

@nalimilan this isn't really the documentation you want, but the list of makevars I pass into Ubuntu builds is here, which includes things like USE_BLAS64=0, overriding BLAS and LAPACK names to match the default names from Ubuntu's packages, asking for a Multiarch install to get JL_LIBDIR and friends in the right places, etc...

@ViralBShah
Copy link
Member

I have added some explanation of USE_BLAS64 in DISTRIBUTING.md. We should also put in the check that @jiahao suggested, as that will ensure that a clear error message is issued on startup. That should then suffice to close this issue.

jiahao added a commit that referenced this issue Dec 9, 2013
Returns helpful error message as discussed in #5050, #4744
@jiahao
Copy link
Member

jiahao commented Dec 9, 2013

I think we can close this issue now given the build-time check and documentation.

@jiahao jiahao closed this as completed Dec 9, 2013
ViralBShah added a commit that referenced this issue Dec 9, 2013
@nalimilan
Copy link
Member Author

@staticfloat But even in your list, USE_SYSTEM_LIGHTTPD, USE_SYSTEM_ZLIB and USE_QUIET do not seem to be used anywhere in the current code base. Thanks for the pointer, though, I've taken LIBBLASNAME so that openblas is used.

Nice fix!

@staticfloat
Copy link
Sponsor Member

@nalimilan Aha, yes indeed. Those used to be used, but don't do anything anymore. We don't ship with lighttpd anymore (that was a long time ago!) or zlib, and USE_QUIET=0 has now been changed to VERBOSE=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:building Build system, or building Julia or its dependencies kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants