Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I feed different graph for each data? #5

Open
hello072 opened this issue Nov 8, 2016 · 10 comments
Open

Can I feed different graph for each data? #5

hello072 opened this issue Nov 8, 2016 · 10 comments

Comments

@hello072
Copy link

hello072 commented Nov 8, 2016

Thanks you for share this implementation.

Suppose each data has different graph structure...(same node number but different edges and edge weights)

In this case... can I feed different graph for each data? this implementation looks like that graph structure is fixed (grid or random...) for datasets

Thanks.

regards, Sangyun Lee

@mdeff
Copy link
Owner

mdeff commented Nov 16, 2016

You're welcome ! Thanks for considering it. :)

Yep the current implementation assumes the graph is fixed, i.e. the same graph Laplacian is used across all samples / signals. You can however modify the implementation to feed a different graph (i.e. a different Laplacian computed from your edge weights) for each sample. In that case you make the assumption that the learned filters are transferable from one graph to the other, which only makes sense if your various graphs share similar properties.

Cheers, Michaël

@rushilanirudh
Copy link

rushilanirudh commented Jan 20, 2017

Hi, I am also trying to achieve something similar to the @hello072's comment. I want to classify/regress arbitrary graphs, where the signal may or may not be relevant (so purely based on their adjacency matrices).

I could be wrong, but it does not look like a trivial generalization since your models are defined with a single Laplacian, is that right? How would you suggest is the most efficient way to achieve this?

Thank you for the python notebooks explaining how to use your code, it really helps!
Rushil

edit: mean to say "does not look like a trivial generalization

@mdeff
Copy link
Owner

mdeff commented Feb 9, 2017

Thanks for your interest. To take the structure into account, you can add a constant signal on the graph (while keeping or not the feature signals). Then:

  • If the graphs are all of the same size, you only need to feed a different Laplacian for each signal. The inferred class of the signal is then the predicted class of the graph.
  • If the graphs are of different sizes, you need to somehow represent them as fixed-size vectors before the fully connected layers (same principle as varying-length sentences being represented as fixed-size vectors by RNNs). One way is to use an attention mechanism such as equation 7 in https://arxiv.org/abs/1511.05493.

@wangchu1
Copy link

wangchu1 commented Mar 9, 2017

Hi mdeff,

I am also thinking to do graph classification as rushil and hello. I see you mentioned:

"If the graphs are all of the same size, you only need to feed a different Laplacian for each signal. The inferred class of the signal is then the predicted class of the graph."

How do you do this in your current code framework? Is this something straightforward? And in your usage notebook, you actually mentioned this task: Another problem of interest is whole graph classification, with or without signals on top. We'll call that third regime graph classification / regression.

Do you have an example of how this works? Another question I have is my graph signal on each node is not a single scalar. Suppose all my graphs has 10 nodes, but each node have a signal vector of length 100, then my data matrix will be (#Samples, 10 (nodes), 100) as opposed to your current input format which is (#Samples, #Nodes). Does your current code support vector inputs? You implied this could be done in the "whole graph classification" statement.

Many thanks for your work!
-fate.

@mdeff
Copy link
Owner

mdeff commented Mar 11, 2017

Hi, thanks for your interest. The current code was not developed with this application in mind, so I don't have any example ready. It might in the future, but for now you'd have to adapt the code in the base_model and cgcnn classes (in lib/models.py). The Laplacians shall be fed to the model through a tf.placeholder.

Having multiple features by node is akin to have multiple feature maps and is thus supported by the filtering function chebyshev5. This only requires some minor changes, mainly to accept 3D inputs and to correctly connect the fully connected layers.

@halwai
Copy link

halwai commented Mar 20, 2017

Hello mdeff,
Continuing on the discussion of hello(Sangyun Lee), rushil and fate for the problem of graph classification , I had the following doubt:
For each data point (assuming same number of nodes) we have a unique Laplacian and unique graph signal/s on top of it. For each Laplacian, we will get a different graph coarsening, correspondingly we will need to rearrange(permutate) the (dimensions of) graph signals for each data point separately. Since the (chebyshev, Lancoz) parameters are learned on different coarsened version, the coarsened versions of the graph signals at higher levels (later stages of the network in GCN) may not have one-to-one information correspondence, and hence what will the learned parameters represent? To give an example (or to be clear as to what I am saying) of this suppose we have 2 data-points (4 nodes each):
Data point 1 : coarsening at level 1 involved clubbing nodes 1,2 and 3,4
Data point 1 : coarsening at level 1 involved clubbing nodes 1,4 and 2,3
Now the problem is does (1,2) in data-point1 correspond to data-point2's (1,4) or (2,3) ? and learned parameters will see different (kind of )information for each data-point (which is not expected in Neural Nets as a neuron is expected to take similar information input and give an output) .
Am I right or have I missed something trivial? If yes can you please point it out? If no, do you have some idea on how to coarsen/pool in such cases?

PS. Both the paper and the code were a great read
Regards,
Abhijeet

@mdeff
Copy link
Owner

mdeff commented Mar 21, 2017

Hi halwai, thanks for your interest. Your reasoning seems right. At the input and hidden layers you should not need to care which node in a graph corresponds to which node in the other, as feature maps are convolved with filters which result only depends on the K-neighborhood of a node. The only time you care is when passing the features (which reside on the most coarsened graph) to the fully connected layers. I see two solutions:

  1. Your application gives you some ordering if e.g. nodes have different roles. Then you can just stack the features in a vector. The position would have the same meaning across graphs.
  2. Otherwise you need to somehow summarize the graph signal as a fixed-length vector. You could use the attention mechanism I pointed out above.
  3. Another idea for an attention mechanism would be to learn filters which will compute the attention. If you want a d-dimensional output vector, then you'd need to learn d filters each of which will assign an attention score to all of your nodes. Then normalize.

Hope it helps. Cheers, Michaël

@anthony123
Copy link

hi, modeff
Many thanks for you work. Now I ran into a situation where 1)graphs are in different sizes(different node); and 2)for each graph, every node is a vector which are also in different sizes。

Can you gives me some advice on how I can change the code(which functions) to meet my requirement?

@mdeff
Copy link
Owner

mdeff commented Jul 11, 2017

Hi @anthony123, thanks for your interest. In such a case you'll have to normalize the size of your data at some point.

Graphs of different sizes are not a problem for graph convolutions, but it will be for the fully connected layer (I assume you're interested in classifying whole graphs). In such a case you probably want an attention mechanism as discussed above.

Having features of different dimensionality per node is more problematic. For graph convolutions you'll have to normalize them to a fixed number of feature maps as a pre-processing step (it's like having an image with a different number of colour channels per pixel). How to do this depends on your data. You may e.g. pad the features or summarize them with some statistics.

@pinkfloyd06
Copy link

Hello @mdeff , @anthony123 , @fate3439 .

To carry on the discussion about nodes with vectors of length 100 rather than a scalar.

Given a dataset of 1000 samples (n_samples), each sample has 10 nodes (number of nodes) and each node has a vector of 100 values. So dataset_dimension=(1000,10,100).

For each sample l compute its laplacian which is of dimension=(100,100) and then feed to a convolutional layer.

Let x[0] be the first sample of dimension [10,100] and L[0] its Laplacian. At the convolutional layer l need to do 10-D convolution (with respect to the number of nodes) .

x[0,0]*L[0]
x[0,1]*L[0]
...
...
...
x[0,9]*L[0]

Hence 10 convolution per sample. So, if l have a graph of n nodes (such that n is sufficiently large, let's say n=1500) , l will do n convolution per sample ?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants