Learning to Understand Video



Multimodal Decomposable Models for Human Pose Estimation

This page provides code for the paper

Multimodal Decomposable Models for Human Pose Estimation
Ben Sapp and Ben Taskar, CVPR 2013

The code includes an end-to-end implementation of the MODEC model trained for upper body human pose estimation---from pixels to joints.



Please cite as


  title={Multimodal Decomposable Models for Human Pose Estimation},
  author={Sapp, Benjamin and Taskar, Ben},
  booktitle={In Proc. CVPR},



  • Run demo_MODEC.m to load up an example image, detect a torso, predict joints and display output information. You should get output as shown.
  • Run eval_dataset.m to replicate the curves in the paper. You will need to download the FLIC dataset for this to work. These results match the paper, Figure 4 (.fig version here).
  • Run train_modec.m to train a new model from images. We provide by default a 'speed-run' version on 1% of the data without parallelization for demonstration, that should take about an hour. Please follow the script and parallelize constraint generation in order to scale up to a full sized training. For our purposes, we used 48 cores across 10 machines, using 64GB of RAM on the 'master' machine. For convenience, we include our compiled LIBLINEAR standard mexa64 binaries. If these don't work for you, feel free to grab LIBLINEAR yourself and swap it in.