Taking vector transposes seriously #4774

jiahao · 2013-11-10T22:12:14Z

We really should think carefully about how the transpose of a vector should dispatch the various A_*op*_B* methods. It must be possible to avoid new types and ugly mathematics. For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

What works for me mathematically (which avoids introducing a new type) is that for a 1-dimensional Vector v:

v' is a no-op (i.e. just returns v),
v'v or v'*v is a scalar,
v*v' is a matrix, and
v'A or v'*A (where A is an AbstractMatrix) is a vector

A general N-dimensional transpose reverses the order of indices. A vector, having one index, should be invariant under transposition.

In practice v' is rarely used in isolation, and is usually encountered in matrix-vector products and matrix-matrix products. A common example would be to construct bilinear forms v'A*w and quadratic forms v'A*v which are used in conjugate gradients, Rayleigh quotients, etc.

The only reason to introduce a new Transpose{Vector} type would be to represent the difference between contravariant and covariant vectors, and I don't find this compelling enough.

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2013-11-10T22:22:19Z

For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

The double-dual of a finite dimensional vector space is isomorphic to it, not identical. So I'm not clear on how this is bad mathematics. It's more that we tend gloss over the distinction between things that are isomorphic in mathematics, because human brains are good at handling that sort of slippery ambiguity and just doing the right thing. That said, I agree that this should be improved, but not because it's mathematically incorrect, but rather because it's annoying.

JeffBezanson · 2013-11-11T00:10:29Z

How can v' == v, but v'*v != v*v? Does it make more sense than we thought for x' * y to be its own operator?

jiahao · 2013-11-11T00:17:34Z

The double-dual of a finite dimensional vector space is isomorphic to it, not identical.

(Speaking as myself now) It is not just isomorphic, it is naturally isomorphic, i.e. the isomorphism is independent of the choice of basis. I can’t think of a practical application for which it would be worth distinguishing between this kind of isomorphism and an identity. IMO the annoyance factor comes from making this kind of distinction.

Does it make more sense than we thought for x' * y to be its own operator?

That was the impression I got out of this afternoon’s discussion with @alanedelman.

alanedelman · 2013-11-11T01:52:35Z

I think what Jeff is asking is right on the money ... it is starting to look like x'_y and x_y' is making more
sense than ever.

alanedelman · 2013-11-11T01:57:36Z

I'm in violent agreement with @Stefan. Bad mathematics was not meant to mean, wrong mathematics, it was
meant to mean annoying mathematics. There are lots of things that are technically right, but not very nice ....

alanedelman · 2013-11-11T02:06:21Z

If we go with this logic, here are two choices we have

x_x remains an error ..... perhaps with a suggestion "maybe you want to use dot"
or x_x is the dot product (I don't love that choice)

StefanKarpinski · 2013-11-11T17:57:06Z

If x and x' are the same thing, then if you want (x')*y to mean dot(x,y) that implies that x*y is also dot(x,y). There's no way out of that. We could make x'y and x'*y a different operation, but I'm not sure that's a great idea. People want to be able to parenthesize this in the obvious way and have it still work.

I would point out that if we allow x*x to mean the dot product, there's basically no going back. That is going to get put into people's code everywhere and eradicating it will be a nightmare. So, natural isomorphism or not, this ain't pure mathematics and we've got to deal with the fact that different things in a computer are different.

JeffBezanson · 2013-11-11T18:34:04Z

Here is a practical discussion of distinguishing "up tuples" and "down tuples" that I like:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_idx_3310

It carefully avoids words like "vector" and "dual", perhaps to avoid annoying people. I find the application to partial derivatives compelling though:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_sec_Temp_453

StefanKarpinski · 2013-11-11T18:39:08Z

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

StefanKarpinski · 2013-11-11T18:46:45Z

I like the idea of up and down vectors. The trouble is generalizing it to higher dimensions in a way that's not completely insane. Or you can just make vectors a special case. But that feels wrong too.

JeffBezanson · 2013-11-11T19:00:31Z

It's true that up/down may be a separate theory. The approach to generalizing them seems to be nested structure, which takes things in a different direction. Very likely there is a reason they don't call them vectors.

toivoh · 2013-11-11T20:21:32Z

Also, x*y = dot(x,y) would make * non-associative, as in x*(y*z) vs. (x*y)*z. I really hope that we can avoid that.

StefanKarpinski · 2013-11-11T20:22:54Z

Yes. To me, that's completely unacceptable. I mean technically, floating-point * is non-associative, but it's nearly associative, whereas this would just be flagrantly non-associative.

alanedelman · 2013-11-11T23:39:44Z

We all agree that x*x should not be the dot product.

The question remains whether we can think of v'w and v'*w as the dot product --
I really really like this working that way.

@JeffBezanson and I were chatting

A proposal is the following:

v' is an error for vectors (This is what mathematica does)
v'w and v'*w is a dot product (result = scalar)
v*w is an outer product matrix (result = matrix)

There is no distinction between rows and column vectors. I liked this anyway
and was happy to see mathematica's precedent
From mathematica: http://reference.wolfram.com/mathematica/tutorial/VectorsAndMatrices.html
Because of the way Mathematica uses lists to represent vectors and matrices, you never have to distinguish between "row" and "column" vectors

Users have to be aware that there are no row vectors .... period.

Thus if M is a matrix

M[1,:]*v is an error ..... (assuming we go with M[1,:] is a scalar
The warning could suggest trying dot or '* or M[i:i,:]

M[[1],:]*v or M[1:1,:]*v is a vector of length 1 (This is julia's current behavior anyway)

Regarding the closely related issue in https://groups.google.com/forum/#!topic/julia-users/L3vPeZ7kews

Mathematica compresses scalar-like array sections:

m = Array[a, {2, 2, 2}] 


Out[49]= {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1], 
   a[1, 2, 2]}}, {{a[2, 1, 1], a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}}}

In[123]:= Dimensions[m]
Dimensions[m[[All, 1, All]]]
Dimensions[m[[2, 1, All]]]
Dimensions[m[[2, 1 ;; 1, All]]]

Out[123]= {2, 2, 2}

Out[124]= {2, 2}

Out[125]= {2}

Out[126]= {1, 2}

[Edit: code formatting – @StefanKarpinski]

blakejohnson · 2013-11-12T02:15:22Z

@alanedelman

assuming we go with M[1,:] is a scalar

do you mean M[1,:] is just a vector?

alanedelman · 2013-11-12T07:26:05Z

Yes, sorry. My mind meant M[1,:] was processing the scalar 1 :-)

Mathematica uses the period . rather than the asterisk *
and then goes the whole 9 yards and makes (vector . vector) into a scalar, exactly what we are avoiding
with the asterisk.

There are no doubt many problems with the period, one of which is that it just doesn't
look like the "dot" in a dot product, and another of which is that it clashes with
the "pointwise op" reading of dot,

Unicode does provide very nicely a character named "the dot operator"
⋅ (char(8901)) which we can imagine offering

so we could have (v ⋅ w) becoming synonymous with (v'*w)

In summary a current proposal subject for debate is

Scalar indexing kills the dimension thus
A[i,:] is a vector as is A[:,i,j]
Vector indexing is thick
A[ i:i , : ] or A[ [i], : ] returns a matrix with one row
v'w or v'*w is the dot product for vectors (similarly v*w' for outer product)
v' is undefined for vectors (point the user to permutedims(v,1)????)
v*A returns a vector if A is a matrix
v⋅w also returns the dot product (but does not go as far as mathematica's . by working on matrices
v*w is undefined for vectors but a warning might tip the user off with good suggestions including
⋅

Consequences are that

a. if you stick to all vectors being column vectors, everything works
b. if you make everything a matrix, everything certainly works, and it's easy to make everything a matrix
c. if your mind can't distinguish a row vector from a one row matrix, chances are you'll be educated
politely and gracefully
d. This dot notation ⋅ is kind of pleasant to the eye

blakejohnson · 2013-11-12T14:38:35Z

Suggestion 5) looks very odd to me. I prefer v'*A so that it is explicit that you are using the dual vector. This is particularly important in complex vector spaces where the dual isn't just a "shape" transformation.

I want to echo @StefanKarpinski that it would be rather unfortunate to lose our concise broadcasting behavior in all this. After this change, what is the concise syntax for taking a vector v and normalizing the columns of matrix A by those values? Currently, one can use A ./ v'. This is extremely nice for data manipulation.

alanedelman · 2013-11-12T15:13:44Z

Good questions

My scheme does not preclude v'*A taking the complex conjugate of v and mulitiplying by A
and all the various other cases that I didn't yet explicitly mention, but readily could

we could eliminate 5
perhaps that's desirable
it doesn't conform to my column vector rule

This approach to broadcasting is cute and kludgy
One solution now is A ./ v[:,[1]]

It has the benefit of documenting which dimension is being broadcast on
and generalizes to higher dimensional arrays

Oh and the v[:,[1]] solution has the virtue of NOT taking the complex conjugate
which is probably what the user intends .....

I LIKE THESE TWO EXAMPLES because the first is a LINEAR ALGEBRA example
where a complex conjugate is desired very frequently, but the second example is
a MULTIDIMENSIONAL DATA EXAMPLE where we want things to work in all dimensions
not just for matrices, and we very likely don't want a complex conjuugate

jiahao · 2013-11-12T19:10:58Z

⋅ requires #552. This is the third time it's shown up in the past two weeks.

BobPortmann · 2013-11-12T21:35:03Z

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

It seems to me that while having the broadcasting behavior is nice some of the time, you end up having to squeeze those extra unit dims outs just as often. Thus, having to do the opposite some of the time is OK if the rest of the system is nicer (and I do think having scalar dimensions dropped will make the system nicer). Thus you will need a function like

julia> widen(A::AbstractArray,dim::Int) = reshape(A,insert!([size(A)...],dim,1)...)
# methods for generic function widen
widen(A::AbstractArray{T,N},dim::Int64) at none:1

which will allow code like M ./ widen(sum(M,2),2) or A ./ widen(v,1) (see @blakejohnson example above)

alanedelman · 2013-11-13T11:06:43Z

M[:,0,:] and v[:,0] ?????

timholy · 2013-11-13T11:41:57Z

I'm more with @blakejohnson on the reduction issue; I personally think it's clearer to squeeze dimensions than to widen them. I suspect I'd be perpetually looking at the docs to figure out whether widen inserts the dimension at or after the indicated index, and the numbering gets a little more complex if you want to widen over multiple dimensions at once. (What does widen(v, (1, 2)) for a vector v do?) None of these are issues for squeeze.

toivoh · 2013-11-13T12:24:08Z

Regardless of whether we would widen or squeeze per default, I think that Julia should follow the lead from numpy when it comes to widening and allow something like v[:, newaxis]. But I do believe that I prefer to keep dimensions instead of discarding them, it's harder to catch a bug where you accidentally widened the wrong way than when you squeezed the wrong way (which will usually give an error).

wenxgwen · 2013-11-13T13:10:10Z

In the list of @alanedelman
I feel that

v*A returns a vector if A is a matrix

is not good.

v_A should be an error if A is not 1x1 (mismatch of the range of index)
v'_A should be the proper way to do it.

One way to handle this issue is to automatically convert vector v to nx1 matrix (when needed)
and always treat v' as 1xn matrix (never convert that to a vector or nx1 matrix)
Also we allow to automatically convert 1x1 matrix to a scaler number (when needed).

I feel that this represents a consistent and uniform way to think about linear algebra. (good mathematics)

A uniform way to handle all those issues is to allow automatic (type?) conversion (when needed)
between array of size (n), (n,1), (n,1,1),(n,1,1,1) etc (but not between arrays of size (n,1) and (1,n) )
(Just like we automatically convert real number to complex number when needed)

In this case an array of size (1,1) can be converted to a number (when needed) (See #4797 )

Xiao-Gang (a physicist)

alanedelman · 2013-11-13T14:02:37Z

This leaves v'_A however .... I really want v'_A*w to work

toivoh · 2013-11-13T20:30:54Z

My impression of linear algebra in Julia is that it is very much organized like matrix algebra, although scalars and true vectors exist (which I think is good!)

Let us consider how to handle a product like x*y*z*w, where each factor may be a scalar, vector, or matrix, possibly with a transpose on it. Matrix algebra defines the product of matrices, where a matrix has size n x m. One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

a scalar would be absent x absent
a (column) vector would be n x absent
a row vector would be absent x n

Ideally, we would like to arrange things so that we never need to represent row vectors, but it would be enough to implement operations like x'*y and x*y'. I get the feeling that this is the flavor of scheme that many of us are searching for.

But I'm beginning to suspect that banning row vectors in this kind of scheme will come at a high cost. Example: Consider how we would need to parenthesize a product to avoid forming a row vector at any intermediate step: (a is a scalar, u and v are vectors)

a*u'*v = a*(u'*v) // a*u' is forbidden
v*u'*a = (v*u')*a // u'*a is forbidden

To evaluate a product x*y'*z while avoiding to produce row vectors, we would need to know the types of the factors before picking the multiplication order! If the user should do it himself, it seems like an obstacle to generic programming. And I'm not sure how Julia could do it automatically in a sane way.

Another reason not to fix the multiplication order in advance: I seem to remember that there was an idea to use dynamic programming to choose the optimal evaluation order of *(x,y,z,w) to minimize the number of operations needed. Anything that we do to avoid forming row vectors will likely interfere with this.

So right now, introducing a transposed vector type seems like the most sane alternative to me. That, or doing everything as now but dropping trailing singleton dimensions when keeping them would result in an error.

lsorber · 2013-11-14T09:55:43Z

Transposition is just a particular way to permute modes. If you allow v.' where v is a vector, then permutedims(v,[2 1]) should return the exact same thing. Either both return a special row vector type, or they introduce a new dimension.

Having a special type for row vectors doesn't look like a good solution to me, because what will you do about other types of mode-n vectors, e.g., permutedims([1:4],[3 2 1])? I urge you to also take the multilinear algebra into account before taking a decision.

wenxgwen · 2013-11-14T11:07:27Z

@toivoh mentioned that

"One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

a scalar would be absent x absent
a (column) vector would be n x absent
a row vector would be absent x n "

In multi linear algebra (or for high rand tensors) the above proposal correspond to use absent to represent
many indices of range 1, ie size (m,n,absent) may correspond to (m,n), (m,n,1), (m,n,1,1), etc.

If we use this interpretation of absent , then 1. and 2. is OK and nice to have, but 3. may not be OK.
We do not want to mix arrays of size (1,n) and (1,1,n).

`RowVector` is now defined as the `transpose` of any `AbstractVector`. If `v` is an `AbstractVector`, then it obeys the identity that `(v.').' === v` and the matrix multiplication rules follow that `(A * v).' == (v.' * A.')`. `RowVector` is a "view" and maintains the recursive nature of `transpose`. It is a subtype of `AbstractMatrix` and maintains the current shape and broadcast behavior for `v.'. Consequences include: * v'' is a vector, not a matrix * v'*v is a scalar, not a vector * v*v' is the outer produce (returns a matrix) * v*A (for A::AbstractMatrix) is removed, since its puprose was to provide a way of doing the outer product above, and is no longer necessary. Closes JuliaLang#4774

StefanKarpinski · 2017-01-13T22:49:06Z

BAM

andyferris · 2017-01-13T23:15:13Z

Woot.

Was this our longest issue thread?

Keno · 2017-01-13T23:18:02Z

No, #11004 has more

andyferris · 2017-01-14T04:00:12Z

Sorry. You're right, I should have specified open issue thread.

andreasnoack referenced this issue Nov 25, 2013

Make sqrtm work for vectors

87e36bc

jiahao mentioned this issue Dec 29, 2013

Framework for symbolic optimizations #122

Closed

stevengj closed this as completed in 3d6fa35 Jan 13, 2017

dpsanders mentioned this issue Jan 15, 2017

cumsum inconvenient for RowVector #20041

Open

vtjnash mentioned this issue Jan 15, 2017

Add a few new reflection methods (after type system merge) #20006

Merged

JeffreySarnoff mentioned this issue Jan 26, 2017

why the nightly version tests fail JuliaLinearAlgebra/MatrixDepot.jl#25

Closed

mbauman mentioned this issue Feb 2, 2017

0.6: vec*mat throws "Cannot left-multiply a matrix by a vector" even when mat is 1 x n #20389

Closed

ararslan mentioned this issue Mar 6, 2017

Enhance joining and grouping JuliaData/DataTables.jl#17

Merged

stevengj mentioned this issue Jun 6, 2017

Define dot product between vectors of arrays/numbers #22220

Closed

yuyichao mentioned this issue Aug 14, 2017

deprecate expm in favor of exp #23233

Merged

mbauman mentioned this issue Sep 22, 2017

Express reshape to a certain dimensionality as conversion instead? #23821

Open

andreasnoack mentioned this issue Oct 19, 2017

Extending inner products #16573

Closed

andyferris mentioned this issue Nov 10, 2018

Add +/- for Number and AbstractMatrix to LinearAlgebra #29951

Closed

mbauman mentioned this issue Mar 28, 2019

Add some (matrix) functions for UniformScaling #28872

Merged

13 tasks

mbauman mentioned this issue May 16, 2019

Broadcast and linear indexing #32051

Open

StefanKarpinski mentioned this issue Jun 8, 2019

The transpose of a 1x1 matrix should equal itself #32267

Closed

jessebett referenced this issue in google-research/dex-lang Nov 20, 2019

Add transposition rules for + and *

161a0ae

mcabbott mentioned this issue Mar 20, 2020

Shape of transpose(x)' differs for real and complex x #35161

Open

EricForgy mentioned this issue May 6, 2020

LinearAlgebra and TensorCore future design: Be more greedy #35763

Open

lostanlen mentioned this issue May 25, 2020

waveplot() segfaults when given input of shape (1, 35000) librosa/librosa#1122

Closed

mbauman mentioned this issue Aug 25, 2020

collect on a Adjoint{T,Array{T,1}} should return a Adjoint{T,Array{T,1}} #37189

Closed

oxinabox mentioned this issue May 5, 2021

RFC: more efficient ∇getindex FluxML/Zygote.jl#962

Closed

hurak mentioned this issue Oct 8, 2022

Known bugs, TODO, FIXME Lecrapouille/MaxPlus.jl#1

Closed

pablosanjose mentioned this issue Jan 9, 2023

RFC: curry underscore arguments to create anonymous functions #24990

Open

5 tasks

oscardssmith mentioned this issue May 25, 2023

Revert "Remove number / vector (#44358)" #49915

Merged

mbauman mentioned this issue Aug 25, 2023

2.0: Don't iterate Matrix #51043

Closed

nsajko mentioned this issue Oct 22, 2023

transpose should not be recursive #51813

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking vector transposes seriously #4774

Taking vector transposes seriously #4774

jiahao commented Nov 10, 2013

StefanKarpinski commented Nov 10, 2013

JeffBezanson commented Nov 11, 2013

jiahao commented Nov 11, 2013

alanedelman commented Nov 11, 2013

alanedelman commented Nov 11, 2013

alanedelman commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

JeffBezanson commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

JeffBezanson commented Nov 11, 2013

toivoh commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

alanedelman commented Nov 11, 2013

blakejohnson commented Nov 12, 2013

alanedelman commented Nov 12, 2013

blakejohnson commented Nov 12, 2013

alanedelman commented Nov 12, 2013

jiahao commented Nov 12, 2013

BobPortmann commented Nov 12, 2013

alanedelman commented Nov 13, 2013

timholy commented Nov 13, 2013

toivoh commented Nov 13, 2013

wenxgwen commented Nov 13, 2013

alanedelman commented Nov 13, 2013

toivoh commented Nov 13, 2013

lsorber commented Nov 14, 2013

wenxgwen commented Nov 14, 2013

StefanKarpinski commented Jan 13, 2017

andyferris commented Jan 13, 2017

Keno commented Jan 13, 2017

andyferris commented Jan 14, 2017

Taking vector transposes seriously #4774

Taking vector transposes seriously #4774

Comments

jiahao commented Nov 10, 2013

StefanKarpinski commented Nov 10, 2013

JeffBezanson commented Nov 11, 2013

jiahao commented Nov 11, 2013

alanedelman commented Nov 11, 2013

alanedelman commented Nov 11, 2013

alanedelman commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

JeffBezanson commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

JeffBezanson commented Nov 11, 2013

toivoh commented Nov 11, 2013

StefanKarpinski commented Nov 11, 2013

alanedelman commented Nov 11, 2013

blakejohnson commented Nov 12, 2013

alanedelman commented Nov 12, 2013

blakejohnson commented Nov 12, 2013

alanedelman commented Nov 12, 2013

jiahao commented Nov 12, 2013

BobPortmann commented Nov 12, 2013

alanedelman commented Nov 13, 2013

timholy commented Nov 13, 2013

toivoh commented Nov 13, 2013

wenxgwen commented Nov 13, 2013

alanedelman commented Nov 13, 2013

toivoh commented Nov 13, 2013

lsorber commented Nov 14, 2013

wenxgwen commented Nov 14, 2013

StefanKarpinski commented Jan 13, 2017

andyferris commented Jan 13, 2017

Keno commented Jan 13, 2017

andyferris commented Jan 14, 2017