To achieve human-like thinking and performance, machine learning systems need to comprehend the autonomous agents in their environment. Inspired by work in developmental psychology, we present challenges for machines that would test their abilities to make theory of mind inferences. Similar to developmental studies, we use a violation of expectation paradigm where the machine must predict the plausibility of a video sequence. We further present a few baselines to understand the challenges that might be faced in achieving a good score on this benchmark.