Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stockcorr benchmark is slow #445

Closed
dgleich opened this issue Feb 22, 2012 · 15 comments
Closed

stockcorr benchmark is slow #445

dgleich opened this issue Feb 22, 2012 · 15 comments
Assignees
Labels
performance Must go faster

Comments

@dgleich
Copy link

dgleich commented Feb 22, 2012

Julia looks really interesting. My impression was that it'd be a great language to compare in the studies done here: http://www.walkingrandomly.com/?p=3604.

The idea is to generate about 500MB of synthetic time series data for financial applications.

The unoptimized Matlab code takes about 10 seconds on my machine.

I expected Julia to do much better on that same unoptimized code, which heavily utilizes for loops.

However, when I tried the following julia port:

function original_corr()
#ORIGINAL_CORR - The original, unoptimised code that simulates two correlated assets

## Correlated asset information
CurrentPrice = [78. 102.];       #Initial Prices of the two stocks
Corr = [1. 0.4; 0.4 1.];         #Correlation Matrix
T = 500;                       #Number of days to simulate = 2years = 500days
n = 100000;                    #Number of simulations
dt = 1./250.;                    #Time step (1year = 250days)
Div=[0.01 0.01];               #Dividend
Vol=[0.2 0.3];                 #Volatility

##Market Information
r = 0.03;                      #Risk-free rate

## Define storages
SimulPriceA=zeros(T,n);    #Simulated Price of Asset A
SimulPriceA[1,:]=CurrentPrice[1];
SimulPriceB=zeros(T,n);    #Simulated Price of Asset B
SimulPriceB[1,:]=CurrentPrice[2];

## Generating the paths of stock prices by Geometric Brownian Motion
UpperTriangle=chol(Corr);    #UpperTriangle Matrix by Cholesky decomposition

for i=1:n
   Wiener=randn(T-1,2);
   CorrWiener=Wiener*UpperTriangle;
   for j=2:T
      SimulPriceA[j,i]=SimulPriceA[j-1,i]*exp((r-Div[1]-Vol[1]^2/2)*dt+Vol[1]*sqrt(dt)*CorrWiener[j-1,1]);
      SimulPriceB[j,i]=SimulPriceB[j-1,i]*exp((r-Div[2]-Vol[2]^2/2)*dt+Vol[2]*sqrt(dt)*CorrWiener[j-1,2]);
   end
end

return (SimulPriceA, SimulPriceB)

end

I was surprised to see that it took 60 seconds... 6 times longer than Matlab.

If I take out the matrix multiplication:

Wiener=randn(T-1,2);
#CorrWiener=Wiener*UpperTriangle;
CorrWiener=Wiener;

Then the function takes only 6 seconds instead, which is still a bit slower than I would have expected. Vectorized Matlab code takes 3 seconds.

Is there something else in that code that is "slow" in julia?

And does the MKL in Matlab really make the 500x2 -times- 2x2 mat-mults that much faster?

@ViralBShah
Copy link
Member

That's interesting. Will certainly try this out. Although, multiplication of skinny matrices is something that does differ greatly across various BLAS.

@ghost ghost assigned ViralBShah Feb 22, 2012
@JeffBezanson
Copy link
Sponsor Member

We will be looking at this. For now I'll just point out that you don't need the semicolons :)

@JeffBezanson
Copy link
Sponsor Member

Yep, on my machine:

julia> a=rand(500,2)
julia> b=rand(2,2)
julia> @time for i=1:10000; a*b; end
elapsed time: 0.08245491981506348 seconds

In matlab:

>> tic();for i=1:10000,a*b;end;toc()
Elapsed time is 0.027304 seconds.

So the BLAS is at least part of the difference.

@ViralBShah
Copy link
Member

We should file an issue with openblas.

-viral

On 24-Feb-2012, at 12:34 AM, JeffBezansonreply@reply.github.com wrote:

Yep, on my machine:

julia> a=rand(500,2)
julia> b=rand(2,2)
julia> @time for i=1:10000; a*b; end
elapsed time: 0.08245491981506348 seconds

In matlab:

>> tic();for i=1:10000,a*b;end;toc()
Elapsed time is 0.027304 seconds.

So the BLAS is at least part of the difference.


Reply to this email directly or view it on GitHub:
#445 (comment)

@JeffBezanson
Copy link
Sponsor Member

Already done.

@nolta
Copy link
Member

nolta commented Feb 25, 2012

On my box,

julia> a=rand(500,2); b=rand(2,2); @time for i=1:100000; a*b; end
elapsed time: 1.1467909812927246 seconds

julia> a=rand(500,2); b=rand(2,2); @time for i=1:100000; a*b; end
elapsed time: 0.9408280849456787 seconds

and if i replace dlopen("liblapack") with dlopen("libmkl_rt") in j/linalg_blas.j & j/start_image.j, then i get:

julia> a=rand(500,2); b=rand(2,2); @time for i=1:100000; a*b; end
elapsed time: 0.9358298778533936 seconds

julia> a=rand(500,2); b=rand(2,2); @time for i=1:100000; a*b; end
elapsed time: 0.555898904800415 seconds

@StefanKarpinski
Copy link
Sponsor Member

Is it that easy to swap in MKL? I tried this at some point and thought it was harder than this. If so, we should make it even easier — perhaps to the point of automatically using MKL if we can find one installed (say under the Matlab install directory };-).

@ViralBShah
Copy link
Member

Yes, swapping BLAS and LAPACK should be that easy. I don't think we should do anything automatically, but have a config file.

Also, we need a bit of reworking, because other libraries that depend on BLAS/LAPACK will still continue using the old statically linked version. We need to do that -rpath thing.

-viral

On 25-Feb-2012, at 1:56 PM, Stefan Karpinski wrote:

Is it that easy to swap in MKL? I tried this at some point and thought it was harder than this. If so, we should make it even easier — perhaps to the point of automatically using MKL if we can find one installed (say under the Matlab install directory }:-).


Reply to this email directly or view it on GitHub:
#445 (comment)

@ViralBShah
Copy link
Member

I wonder if it would be faster to implement this DGEMM in julia rather than calling OpenBLAS. We have matmul2x2, which could be extended to handle this case, and it just may be faster. Worth trying.

@ViralBShah
Copy link
Member

I am now running @time stockcorr(), which is checked into examples. With the multiplication in there, the code takes 15 seconds, and without the multiplication, it now takes 13 seconds for me.

@dgleich Can you also post both, the unvectorized matlab code and the vectorized one?

@JeffBezanson It doesn't seem that matrix multiplication is taking much of the time now, even though it is still slower than matlab. For me the same matmul is only twice as much slower:

>> tic();for i=1:10000,a*b;end;toc()
Elapsed time is 0.038390 seconds.

julia> @time for i=1:10000; a*b; end
elapsed time: 0.06738996505737305 seconds

I updated my openblas to 0.1.0, which seems to have addressed the matrix-multiplication issue, but doing so does not make any difference.

@StefanKarpinski
Copy link
Sponsor Member

I don't think this is a v1.0 issue. It would be awesome to make this faster and I'm sure we will, but making something faster won't break any existing code. Since there's no breaking change involved in fixing this and it isn't a "can't possibly announce 1.0 without fixing this embarrassment" sort of thing, I'm removing the v1.0 milestone.

@ghost ghost assigned JeffBezanson Apr 29, 2012
@ViralBShah
Copy link
Member

I have checked in stockcorr.m into examples. Julia is taking 40 seconds for me, whereas Matlab is taking 10 seconds, using only one computational thread.

@ViralBShah
Copy link
Member

@JeffBezanson How do you suggest we sort this one out?

@JeffBezanson
Copy link
Sponsor Member

We just need to do a detailed performance analysis. Delete bits of it until the performance problem goes away, try running without GC, etc. until we have a clear idea what the problem is.

@ViralBShah
Copy link
Member

The julia code now takes 6.4 seconds, whereas the matlab code takes 9 seconds. I suppose we can close this issue.

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
Add invpermute! and a pairs method for replace
cmcaine pushed a commit to cmcaine/julia that referenced this issue Nov 11, 2022
The track CI runs `configlet lint`, which currently exits with a
non-zero exit code if it sees an optional string key that has the value
of the empty string.

This commit therefore removes all such key/value pairs on the track -
they otherwise cause a failing CI check.
Keno pushed a commit that referenced this issue Oct 9, 2023
* compat with Julia 1.x

* work around compiler error on 1.2

* rm travis/appveyor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants