top of page

Different Minds Collaborative Virtual Spring Conference
April 10th, 2024

Trainee Presenters

Please join us for an exciting series of talks featuring the trainees of the Different Minds Collaborative.

Filip Rybansky
Newcastle University
PI: Dr. Quoc Vuong

Semantic consistency in identifying human actions

People quickly recognise human actions carried out in everyday activities. There is evidence that Minimal Recognisable Configurations (MIRCs) contain a combination of spatial and temporal visual features critical for reliable recognition. For complex activities, observers may have different descriptions which may vary in their semantic similarity (e.g., washing dishes vs cleaning dishes), making it difficult to determine the role of MIRCs for action recognition. Therefore, we measured the semantic consistency for 128 short videos of complex actions from the Epic-Kitchens-100 dataset, selected based on poor classification performance by our state-of-the-art computer vision network MOFO. In an online experiment, participants viewed each video and identified the performed action by typing a description using 2-3 words (capturing action and object). Each video was classified by at least 30 participants (N=76 total). Semantic consistency of the responses was determined using a custom pipeline involving the sentence-BERT language model, which generated embedding vectors representing semantic properties of the responses. We then used adjusted pair-wise cosine similarities between response vectors to compute a ground truth description for each video, a response with the greatest semantic neighbourhood density (e.g., pouring oil, closing shelf). The greater the semantic neighbourhood density was for a ground truth candidate, the more semantically consistent were responses for the associated video. We uncovered 87 videos where semantic consistency confirmed their reliable recognisability, i.e. where cosine-similarity between the ground truth candidate and at least 70% of responses was above a similarity threshold of 0.65. We will use a subsample of these videos to investigate the role of MIRCs in human action recognition, e.g., gradually degrading the spatial and temporal information in videos and measuring the impact on action recognition. The derived semantic space and MIRCs will be used to revise our computer vision network into a more biologically consistent and better performing model.

Ben Steward
ANU Emotions and Faces Lab
The Australian National University
PI: Dr. Amy Dawel


Meta-analysis of face and visual context interactions in emotion perception

Long-standing theories, such as basic emotions theory, argue that our perception of others’ emotions is driven by facial expressions which signal core emotions (e.g., anger, disgust, happiness). However, there is now compelling evidence that visual details (“visual context”), such as body posture, affect perceived emotions of faces. The literature also shows that faces affect perceived emotions of visual contexts. We used meta-analytic techniques to quantify these effects, and potential moderating factors, for the first time. Data were drawn from 37 studies and analysed using three-level mixed-effects models. We found large effects for visual contexts influencing the perceived emotions of faces, and for faces influencing the perceived emotions of visual contexts. Both effects were larger for incongruent pairs (e.g., sad face on an angry body) than congruent pairs (e.g., sad face on a sad body). Importantly, our analyses highlighted that these effects are moderated by how clearly stimuli signal their intended emotions. Together, we show that the integration of visual signals during emotion perception is affected by several factors, including source (faces or context), congruency, and signal clarity. More sophisticated models are now needed to examine how these factors interact, providing important direction for technologies that seek to simulate human perception.

Brent Pitchford
University of Iceland
PI: Dr. Heida Sigurdardottir

Contributions of Visual and Semantic Information in Object Discrimination

Abstract: Object similarity may not be an abstract construct that can be defined outside of the operational definition of task context. We asked people to assess the similarity of objects by rating their semantic relatedness, overall shape, and internal features. Shape similarity was assessed by rating object silhouettes with no internal features. Featural similarity was assessed by rating grayscale objects where global shape was distorted. Object pairs were either different at the basic level (e.g., hairbrush, pipe) or at the subordinate level (e.g., two different bowties). Semantic similarity of objects differing at the basic level was measured by rating similarity in meaning of word pairs. We then assessed to which degree semantics, shape, and features predicted a) explicit judgments of visual similarity of objects, b) implicit measures of object similarity as assessed by object foraging, and c) similarity in an object space derived from activations of a deep layer of a convolutional neural network trained on object classification. Explicit judgments of visual similarity were predicted both by features and shapes, but not semantics. Unlike explicit judgments, implicit object similarity depended on whether people searched for target objects among distractors of the same or different category. If targets and distractors differed at the basic level, both shape and semantic similarity predicted unique variability in foraging not accounted for by features. If objects belonged to the same category, neither shape or featural similarity predicted unique variability. Contrary to previous suggestions that neural networks are primarily feature-based, shape uniquely explained variability in object space distance not accounted for by features in cases where objects differed at the basic level. Different information therefore contributes to people’s explicit vs. implicit judgments of object qualities – and can also be distinguished from measures of similarity extracted from artificial neural networks trained on object classification.

Paris Ash
Newcastle University
PI: Dr. Quoc Vuong

The development of a validated video database to investigate multi-sensory processing in misophonia.

Abstract: Coming soon!

bottom of page