Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 3.04 KB

README.md

File metadata and controls

21 lines (17 loc) · 3.04 KB

Query & annotation representation

Name

Each feature file is named as [File ID].pkl which corresponds to the file ID in Flickr30K Entities.

Feature organization

Each file is a dictionary containing information for query features and annotations. Each domain's name and content is listed below:
height: Height of each image.
width: Width of each image.
gt_pos_all: It is a list of length N. N represents the number of queries for each image. The i-th element of this list is also a list, recording the i-th query's positive proposals' IDs among the 100 proposals generated by Selective Search or Edge Box. The positive proposals are defined as the proposals with an Intersection of Union (IoU) larger than 0.5 for the corresponding ground truth bounding box of the i-th query.
pos_id: It is an N dimensional vector. The i-th element represent the proposal ID which covers most with ground truth bounding box for the i-th query. If the most covered proposal's IoU is less than 0.5, we replace the proposal ID as -1.
ss_box: It is a 100 x 4 matrix. Each row represents a proposal's coordinate information generated by Selective Search (For Referit Game dataset, the proposals are generated by Edge Box). Each row is in the formate of [xmin, ymin, xmax, ymax].
sens: It is a list of length N. The i-th element of this list is also a list, which represents the word ID sequence of the i-th query.
gt_box: It is an N x 4 matrix. The i-th row represents the ground truth bounding box annotation for the i-th query. The annotation is in the form of [xmin, ymin, xmax, ymax].
q_dist_soft_coco: Soft KBP value using pre-trained knowledge of VGG pre-trained on MSCOCO. It is an N x 100 matrix. The i-th row represents the i-th query's similarity scores for each proposal's prediction category. We use cosine distance as score to measure the KBP value between two words encoded by the word2vec program. When the score is negative, we set it as 0. More details can be found in the paper.
q_dist_hard_coco: Hard KBP value using pre-trained knowledge of VGG pre-trained on MSCOCO.
q_dist_soft_pas: Soft KBP value using pre-trained knowledge of VGG pre-trained on PASCAL VOC 2012.
q_dist_hard_pas: Hard KBP value using pre-trained knowledge of VGG pre-trained on PASCAL VOC 2012.
bbx_vlv: It is a 100 x 4 matrix. The i-th row is the regression ground truth value for the i-th proposal. More details can be found in the Equation 5 of the paper.