Humans are an inherently social species, with multiple focal brain regions sensitive to various visual social cues such as faces, bodies, and biological motion. More recently, research has begun to investigate how the brain responds to more complex, naturalistic social scenes, identifying a region in the posterior superior temporal sulcus (SI-pSTS; i.e., social interaction pSTS), amongst others, as an important region for processing social interaction. This research, however, has presented images or videos, and thus the contribution of motion to social interaction perception in these brain regions is not yet understood. In the current study, 22 participants viewed videos, image sequences, scrambled image sequences and static images of either social interactions or non-social independent actions. Combining univariate and multivariate analyses, we confirm that bilateral SI-pSTS plays a central role in dynamic social interaction perception but is much less involved when 'interactiveness' is conveyed solely with static cues. Regions in the social brain, including SI-pSTS and extrastriate body area (EBA), showed sensitivity to both motion and interactive content. While SI-pSTS is somewhat more tuned to video interactions than is EBA, both bilateral SI-pSTS and EBA showed a greater response to social interactions compared to non-interactions and both regions responded more strongly to videos than static images. Indeed, both regions showed higher responses to interactions than independent actions in videos and intact sequences, but not in other conditions. Exploratory multivariate regression analyses suggest that selectivity for simple visual motion does not in itself drive interactive sensitivity in either SI-pSTS or EBA. Rather, selectivity for interactions expressed in point-light animations, and selectivity for static images of bodies, make positive and independent contributions to this effect across the LOTC region. Our results strongly suggest that EBA and SI-pSTS work together during dynamic interaction perception, at least when interactive information is conveyed primarily via body information. As such, our results are also in line with proposals of a third visual stream supporting dynamic social scene perception.