Skip to content
Amrut Prabhu edited this page Mar 6, 2020 · 1 revision

Dataset

A recipe video has multiple segments, each of which corresponds to a sub-action label. A segment is made up of multiple frames, each of which has the same sub-action label as the segment. Each frame is represented by a 400-dimension ID3 feature.

Problem Statement

From the project description:

For the submission, you are required to process the test videos and predict a label for each action segment.

segment.txt: We provide the frame locations of each action segment. Each row corresponds to a text video in the order given in 'splits/test.split1.bundle'. For example, the first row includes '30 150 428 575 705'. As we are ignoring the SIL action, the first segment starts at frame 30 and ends at frame 150, similarly the second segment starts at 151 and ends at 428. There are four segments in this example and you are asked to predict a class label for each segment as follows 4, 1, 2, 3. You are asked to create a CSV file and fill it with your prediction results in the order you make the predictions. You will submit this file. See the 'Evaluation' tab for the submission format.

Tasks

Frame to Label (Harder Problem)

Idea: First we use the whole video to figure out which recipe the video corresponds to. Then we can narrow the sub-actions that the video has.

Caveats:

  1. The set of possible sub-actions for a recipe should be the largest possible one because an action can be in multiple recipes.
  2. Need to decide the model architectures for the next step: multi-class classifier for this?

Segment to Label (Easier Problem)

  1. Get the segments of the training set (like segments.txt)
    • Get the segments for each file from the file's groundTruth.txt by getting the starting and ending indices of each contiguous action block
    • Each line will correspond to one entire video
    • Need to ignore SIL blocks
  2. Model architecture
    • Input: N * 400 (N = number of frames in the segment). N is variable, need to consider this while deciding the model architecture.
    • Output: Sub-action label
    • DNN
    • RNN
    • LSTM/GRU
Clone this wiki locally