Function to convert Program Graph to PyTorch Geometric Graph #174

zehanort · 2021-07-23T18:41:50Z

🚀 Feature

It would be nice to have a programl.to_pyg function to convert one or more Program Graphs to torch_geometric.data.Data, i.e. to PyTorch Geometric graphs.

Motivation

This would be extremely helpful in order to set up ML/DL pipelines with custom GNNs using the PyTorch Geometric library, which offers a lot of utilities regarding machine/deep learning tasks on graphs and it is a library that seems to gain a lot of popularity lately, especially in research.

Pitch

My idea is a 1-1 map between the nodes, edges and node features of the Program Graph to the PyG Graph, as well as turning the edge type of Program Graph (i.e., the CONTROL / DATA / CALL enum values) into a single edge feature of PyG Graph. Unfortunately, PyTorch Geometric does not (yet) explicitly support graph-level features. They seem to support only node-level features, node-level targets and graph-level targets for the time being. Therefore, a reasonable thing to do is to extend the torch_geometric.data.Data object with an additional attribute, as proposed in the documentation. Extending the first introductory example from the docs:

>>> import torch
>>> from torch_geometric.data import Data
>>> 
>>> edge_index = torch.tensor([[0, 1, 1, 2],
...                            [1, 0, 2, 1]], dtype=torch.long)
>>> x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
>>> 
>>> data = Data(x=x, edge_index=edge_index)
>>> data
Data(edge_index=[2, 4], x=[3, 1])
>>>
>>> data.graph_y = torch.tensor([42]) # adding a graph-level target
>>> data
Data(edge_index=[2, 4], graph_y=[1], x=[3, 1])

I believe I am not forgetting anything (feel free to remind me if I do!).

If you don't have something like that in the works and you are interested, I would love to work on it and send a PR eventually. I intend to write such a tool anyway (i.e. Program Graph -> PyG Graph), so I would love to contribute it to the project as well.

The text was updated successfully, but these errors were encountered:

ChrisCummins · 2021-07-23T19:03:20Z

Hi @zehanort, a PyTorch Geometric converter would be great! I would very happily review a patch for that, thanks a lot :)

Just thinking ahead - I'm a little wary of adding large dependencies like pytorch-geometric. Perhaps stick the converter in its own module like programl.torch_geometric_converter.to_pytorch_geometric() to simplify things? I'm thinking of doing something like that for the to_dgl() converter as that pulls in a lot of extras.

CC'ing @Zacharias030 as I believe he has some experience working with ProGraML using pytorch geometric

Cheers,
Chris

Zacharias030 · 2021-07-24T07:03:08Z

Hi @zehanort, I think this is a great pitch and I would welcome such an addition to the codebase very much!
Especially in light of the fact that #107 is incomplete, it would be great to interface to pytorch geometric such that training a range of models becomes very easy!

Note that in #107 we were also willing to introduce a dependency on pytorch geometric‘s Data and Batch classes.

igabirondo16 · 2024-05-16T08:09:16Z

Hi @ChrisCummins, @Zacharias030 and @zehanort ,

For my Master's thesis I have been working with ProGraML and Pytorch-Geometric, and I have an implementation of the to_pyg() method that you are mentioning.

My approach has been to use the torch_geometric.data.HeteroData data structure, which provides more flexibility for heterogeneous graphs than torch_geometric.data.Data.

The method I have been using takes as input a ProgramGraph and optionally the dictionary of the ProGraML vocabulary and makes the following conversion:

def to_pyg(graph: ProgramGraph, vocabulary: Optional[Dict[str, int]] = None) -> HeteroData:
        # 4 lists, one per edge type
        # (control, data, call and type edges)
        adjacencies = [[], [], [], []]
        edge_positions = [[], [], [], []]

        # Create the adjacency lists
        for edge in graph.edge:
            adjacencies[edge.flow].append([edge.source, edge.target])
            edge_positions[edge.flow].append(edge.position)

        node_text = [node.text for node in graph.node]

        vocab_ids = None
        if vocabulary is not None:
            vocab_ids = [
                vocabulary.get(node.text, len(vocabulary.keys()))
                for node in graph.node
            ]

        # Pass from list to tensor
        adjacencies = [torch.tensor(adj_flow_type) for adj_flow_type in adjacencies]
        edge_positions = [torch.tensor(edge_pos_flow_type) for edge_pos_flow_type in edge_positions]

        if vocabulary is not None:
            vocab_ids = torch.tensor(vocab_ids)

        # Create the graph structure
        hetero_graph = HeteroData()

        # Vocabulary index of each node
        hetero_graph['nodes']['text'] = node_text
        hetero_graph['nodes'].x = vocab_ids

        # Add the adjacency lists
        hetero_graph['nodes', 'control', 'nodes'].edge_index = (
            adjacencies[0].t().contiguous() if adjacencies[0].nelement() > 0 else torch.tensor([[], []])
        )
        hetero_graph['nodes', 'data', 'nodes'].edge_index = (
            adjacencies[1].t().contiguous() if adjacencies[1].nelement() > 0 else torch.tensor([[], []])
        )
        hetero_graph['nodes', 'call', 'nodes'].edge_index = (
            adjacencies[2].t().contiguous() if adjacencies[2].nelement() > 0 else torch.tensor([[], []])
        )
        hetero_graph['nodes', 'type', 'nodes'].edge_index = (
            adjacencies[3].t().contiguous() if adjacencies[3].nelement() > 0 else torch.tensor([[], []])
        )

        # Add the edge positions
        hetero_graph['nodes', 'control', 'nodes'].edge_attr = edge_positions[0]
        hetero_graph['nodes', 'data', 'nodes'].edge_attr = edge_positions[1]
        hetero_graph['nodes', 'call', 'nodes'].edge_attr = edge_positions[2]
        hetero_graph['nodes', 'type', 'nodes'].edge_attr = edge_positions[3]

        return hetero_graph

It first gathers the adjacency list of the graphs, the position attribute of the edges and the text of the nodes. If the vocabulary is given, it converts the text tokens to their respective vocabulary index. After that, the lists are transformed into tensors and stored in their respectives attributes. As you can see, using the HeteroData class provides more flexibility than Data, as it enables to add as many different type of nodes and edges as required.

I will create a pull request in the following days so that you can do further testing.

ChrisCummins · 2024-05-16T17:30:32Z

That's great thank you @igabirondo16! Look forward to your PR

zehanort added the Enhancement New feature or request label Jul 23, 2021

igabirondo16 mentioned this issue May 17, 2024

Feature/pytorch geometric #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to convert Program Graph to PyTorch Geometric Graph #174

Function to convert Program Graph to PyTorch Geometric Graph #174

zehanort commented Jul 23, 2021

ChrisCummins commented Jul 23, 2021

Zacharias030 commented Jul 24, 2021 •

edited

Loading

igabirondo16 commented May 16, 2024 •

edited

Loading

ChrisCummins commented May 16, 2024

Function to convert Program Graph to PyTorch Geometric Graph #174

Function to convert Program Graph to PyTorch Geometric Graph #174

Comments

zehanort commented Jul 23, 2021

🚀 Feature

Motivation

Pitch

ChrisCummins commented Jul 23, 2021

Zacharias030 commented Jul 24, 2021 • edited Loading

igabirondo16 commented May 16, 2024 • edited Loading

ChrisCummins commented May 16, 2024

Zacharias030 commented Jul 24, 2021 •

edited

Loading

igabirondo16 commented May 16, 2024 •

edited

Loading