Retrieving human actions from video databases is a paramount but challenging task in computer vision. In this work, we develop such a framework for robustly recognizing human actions in video sequences. The contribution of the paper is twofold. First a reliable neural model, the Multi-level Sigmoidal Neural Network (MSNN) as a classifier for the task of action recognition is presented. Second we unfold how the temporal shape variations can be accurately captured based on both temporal self-similarities and fuzzy log-polar histograms. When the method is evaluated on the popular KTH dataset, an average recognition rate of 94.3% is obtained. Such results have the potential to compare very favorably to those of other investigators published in the literature. Further the approach is amenable for real-time applications due to its low computational requirements.