Main Track Accepted Papers

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

12 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+’s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.

List of keywords

Machine Learning -> ML: Game Theory
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

Structure-Preserving Physics-Informed Neural Networks with Energy or Lyapunov Structure

Haoyu Chu, Yuto Miyatake, Wenjun Cui, Shikui Wei, Daisuke Furihata

6 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

Recently, there has been growing interest in using physics-informed neural networks (PINNs) to solve differential equations. However, the preservation of structure, such as energy and stability, in a suitable manner has yet to be established. This limitation could be a potential reason why the learning process for PINNs is not always efficient and the numerical results may suggest nonphysical behavior. Besides, there is little research on their applications on downstream tasks. To address these issues, we propose structure-preserving PINNs to improve their performance and broaden their applications for downstream tasks. Firstly, by leveraging prior knowledge about the physical system, a structure‐preserving loss function is designed to assist the PINN in learning the underlying structure. Secondly, a framework that utilizes structure-preserving PINN for robust image recognition is proposed. Here, preserving the Lyapunov structure of the underlying system ensures the stability of the system. Experimental results demonstrate that the proposed method improves the numerical accuracy of PINNs for partial differential equations (PDEs). Furthermore, the robustness of the model against adversarial perturbations in image data is enhanced.

List of keywords

Machine Learning -> ML: Deep learning architectures
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Supervised Learning

Automatic De-Biased Temporal-Relational Modeling for Stock Investment Recommendation

Weijun Chen, Shun Li, Xipu Yu, Heyuan Wang, Wei Chen, Tengjiao Wang

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

Stock investment recommendation is crucial for guiding investment decisions and managing portfolios. Recent studies have demonstrated the potential of temporal-relational models (TRM) to yield excess investment returns. However, in the complicated finance ecosystem, the current TRM suffer from both the intrinsic temporal bias from the low signal-to-noise ratio (SNR) and the relational bias caused by utilizing inappropriate relational topologies and propagation mechanisms. Moreover, the distribution shifts behind macro-market scenarios invalidate the underlying i.i.d. assumption and limit the generalization ability of TRM. In this paper, we pioneer the impact of the above issues on the effective learning of temporal-relational patterns and propose an Automatic De-Biased Temporal-Relational Model (ADB-TRM) for stock recommendation. Specifically, ADB-TRM consists of three main components, i.e., (i) a meta-learned architecture forms a dual-stage training process, with the inner part ameliorating temporal-relational bias and the outer meta-learner counteracting distribution shifts, (ii) automatic adversarial sample generation guides the model adaptively to alleviate bias and enhance its profiling ability through adversarial training, and (iii) global-local interaction helps seek relative invariant stock embeddings from local and global distribution perspectives to mitigate distribution shifts. Experiments on three datasets from distinct stock markets show that ADB-TRM excels state-of-the-arts over 28.41% and 9.53% in terms of cumulative and risk-adjusted returns.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Time series and data streams
Multidisciplinary Topics and Applications -> MTA: Finance

103

TaD: A Plug-and-Play Task-Aware Decoding Method to Better Adapt LLMs on Downstream Tasks

Xinhao Xu, Hui Chen, Zijia Lin, Jungong Han, Lixing Gong, Guoxin Wang, Yongjun Bao, Guiguang Ding

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Fine-tuning pre-trained models on downstream tasks is a common practice in leveraging large language models (LLMs) today. A critical issue is how to adapt pre-trained models to downstream tasks better, thereby enhancing their performance. This paper introduces Task-aware Decoding (TaD), a plug-and-play method that exploits the difference in probability distributions before and after fine-tuning to boost the performance of LLMs on downstream tasks. The proposed TaD argues that the difference between the pre-finetuning probability distribution and the post-finetuning one represents the direction from common knowledge towards specific downstream-task knowledge. Aligning the final output probability distribution to that direction can probably result in superior downstream task performance, compared to the original fine-tuned model. Experiments on various datasets across four different task categories well demonstrate TaD’s effectiveness on different LLMs, i.e., GPT, BLOOM, and LLaMA, with different fine-tuning methods. Moreover, further experiments reveal that TaD better enhances model performance in data-scarce scenarios.

List of keywords

Natural Language Processing -> NLP: Language generation
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Language models

128

Physics-Informed Trajectory Prediction for Autonomous Driving under Missing Observation

Haicheng Liao, Chengyue Wang, Zhenning Li, Yongkang Li, Bonan Wang, Guofa Li, Chengzhong Xu

6 min. talk | August 6th at 11:30 | Session: ROB: Robotics (1/2)

[+] More

[-] Less

This paper introduces a novel trajectory prediction approach for autonomous vehicles (AVs), adeptly addressing the challenges of missing observations and the need for adherence to physical laws in real-world driving environments. This study proposes a hierarchical two-stage trajectory prediction model for AVs. In the first stage we propose the Wavelet Reconstruction Network, an innovative tool expertly crafted for reconstructing missing observations, offering optional integration with state-of-the-art models to enhance their robustness. Additionally, the second stage of the model features the Wave Fusion Encoder, a quantum mechanics-inspired innovation for sophisticated vehicle interaction modeling. By incorporating the Kinematic Bicycle Model, we ensure that our predictions align with realistic vehicular kinematics. Complementing our methodological advancements, we introduce MoCAD-missing, a comprehensive real-world traffic dataset, alongside enhanced versions of the NGSIM and HighD datasets, designed to facilitate rigorous testing in environments with missing observations. Extensive evaluations demonstrate that our approach markedly outperforms existing methods, achieving high accuracy even in scenarios with up to 75% missing observations.

List of keywords

Robotics -> ROB: Other
Planning and Scheduling -> PS: Planning under uncertainty

129

MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Dongping Liao, Bonan Wang, Guofa Li, Chengzhong Xu

6 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph convolutional network captures both positional and behavioral features of road users, preserving spatial-temporal intricacies. Enhanced by a linear attention mechanism, the model achieves computational efficiency and reduced parameter overhead. Evaluations on the Argoverse, NGSIM, HighD, and MoCAD datasets underscore MFTraj’s robustness and adaptability, outperforming numerous benchmarks even in data-challenged scenarios without the need for additional information such as HD maps or vectorized maps. Importantly, it maintains competitive performance even in scenarios with substantial missing data (12.5%-50%), outperforming most existing state-of-the-art models. The results and methodology suggest a significant advancement in autonomous driving trajectory prediction, paving the way for safer and efficient autonomous systems.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Transportation
Agent-based and Multi-agent Systems -> MAS: Applications
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Robotics -> ROB: Other

130

A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environments

Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui

6 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. We introduce the Macao Connected Autonomous Driving (MoCAD) dataset as part of our contributions, which adds value to its complex urban driving scenarios. Our model represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the MoCAD dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Transportation
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Planning and Scheduling -> PS: Applications
Robotics -> ROB: Motion and path planning

139

Hyperparameter Optimization Can Even Be Harmful in Off-Policy Learning and How to Deal with It

Yuta Saito, Masahiro Nomura

12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in developing estimators aimed at accurately estimating the effectiveness of counterfactual policies based on biased logged data. However, there are many cases where those estimators are used not only to evaluate the value of decision making policies but also to search for the best hyperparameters from a large candidate space. This work explores the latter hyperparameter optimization (HPO) task for off-policy learning. We empirically show that naively applying an unbiased estimator of the generalization performance as a surrogate objective in HPO can cause an unexpected failure, merely pursuing hyperparameters whose generalization performance is greatly overestimated. We then propose simple and computationally efficient corrections to the typical HPO procedure to deal with the aforementioned issues simultaneously. Empirical investigations demonstrate the effectiveness of our proposed HPO algorithm in situations where the typical procedure fails severely.

List of keywords

Machine Learning -> ML: Causality
Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Multi-armed bandits
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

158

AutoAgents: A Framework for Automatic Agent Generation

Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje Karlsson, Jie Fu, Yemin Shi

6 min. talk | August 9th at 10:00 | Session: MAS: Applications

[+] More

[-] Less

Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents’ responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. The repository of this project is available at https://github.com/Link-AGI/AutoAgents.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Natural Language Processing -> NLP: Applications

161

Boosting Model Resilience via Implicit Adversarial Data Augmentation

Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang

12 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adaptive adjustment in the learning difficulty tailored to each sample’s specific characteristics. We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function as the number of augmented copies increases indefinitely. This insight leads us to develop a meta-learning-based framework for optimizing classifiers with this novel loss, introducing the effects of augmentation while bypassing the explicit augmentation process. We conduct extensive experiments across four common biased learning scenarios: long-tail learning, generalized long-tail learning, noisy label learning, and subpopulation shift learning. The empirical results demonstrate that our method consistently achieves state-of-the-art performance, highlighting its broad adaptability.

List of keywords

Machine Learning -> ML: Classification
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Robustness

168

Negative-Binomial Randomized Gamma Dynamical Systems for Heterogeneous Overdispersed Count Time Sequences

Rui Huang, Sikun Yang, Heinz Koeppl

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Modeling count-valued time sequences has been receiving growing interests because count time sequences naturally arise in physical and social domains. Poisson gamma dynamical systems (PGDSs) are newly-developed methods, which can well capture the expressive latent transition structure and bursty dynamics behind count sequences. In particular, PGDSs demonstrate superior performance in terms of data imputation and prediction, compared with canonical linear dynamical system (LDS) based methods. Despite these advantages, PGDS cannot capture the heterogeneous overdispersed behaviours of the underlying dynamic processes. To mitigate this defect, we propose a negative-binomial-randomized gamma Markov process, which not only significantly improves the predictive performance of the proposed dynamical system, but also facilitates the fast convergence of the inference algorithm. Moreover, we develop methods to estimate both factor-structured and graph-structured transition dynamics, which enable us to infer more explainable latent structure, compared with PGDSs. Finally, we demonstrate the explainable latent structure learned by the proposed method, and show its superior performance in imputing missing data and forecasting future observations, compared with the related models.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Probabilistic machine learning
Uncertainty in AI -> UAI: Tractable probabilistic models

169

Scale and Direction Guided GAN for Inertial Sensor Signal Enhancement

Yifeng Wang, Yi Zhao

6 min. talk | August 9th at 11:30 | Session: ML: Generative models

[+] More

[-] Less

Inertial sensors, serving as attitude and motion sensing components, are extensively used in various portable devices spanning consumer electronics, sports health, aerospace, etc. However, the severe intrinsic errors of inertial sensors greatly restrict their capability to implement advanced functions, such as motion tracking and semantic recognition. Although generative models hold significant potential for signal enhancement, unsupervised or weakly-supervised generative methods may not achieve ideal generation results due to the absence of guidance from paired data. To address this, we propose a scale and direction-guided generative adversarial network (SDG-GAN), which provides dual guidance mechanisms for GAN with unpaired data across two practical application scenarios. In the unsupervised scenario where only unpaired signals of varying quality are available, our scale-guided GAN (SG-GAN) forces the generator to learn high-quality signal characteristics at different scales simultaneously via the proposed self-supervised zoom constraint, thereby facilitating multi-scale interactive learning. In the weakly-supervised scenario, where additional experimental equipment can provide some motion information, our direction-guided GAN (DG-GAN) introduces auxiliary tasks to supervise signal generation while avoiding interference from auxiliary tasks on the main generation task. Extensive experiments demonstrate that both the unsupervised SG-GAN and the weakly-supervised DG-GAN significantly outperform all comparison methods, including fully-supervised approaches. The combined SDG-GAN achieves remarkable results, enabling unimaginable tasks based on the original inertial signal, such as 3D motion tracking.

List of keywords

Machine Learning -> ML: Generative models
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning
Multidisciplinary Topics and Applications -> MTA: Sensor networks and smart cities

170

Nukplex: An Efficient Local Search Algorithm for Maximum K-Plex Problem

Rui Sun, Yiyuan Wang, Shimao Wang, Hui Li, Ximing Li, Minghao Yin

6 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

The maximum k-plex problem (MKPP) is an significant relaxation version of the maximum clique problem with extensive applications. Recently, lots of researchers have proposed many heuristic algorithms based on various methods to solve the MKPP. In this work, to further improve the performance of solving the MKPP, we propose an efficient local search algorithm based on three main ideas. First, we propose a relaxed bounded configuration checking strategy that considers two kinds of historical searching information to relax the restricted strength of configuration checking and the forbidden condition of candidate vertices for the operation Add, respectively. Second, we present a novel solution information-based vertex selection strategy based on two kinds of solution information to select high-quality candidate vertices. Third, we define the solution core and then introduce a core-based perturbation strategy to help the algorithm jump out of local optima. The experimental results show that the proposed algorithm significantly outperforms the state-of-the-art MKPP algorithms in almost all the instances.

List of keywords

Search -> S: Local search
Search -> S: Heuristic search

185

Design a Win-Win Strategy That Is Fair to Both Service Providers and Tasks When Rejection Is Not an Option

Yohai Trabelsi, Pan Xu, Sarit Kraus

12 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

Assigning tasks to service providers is a frequent procedure across various applications. Often the tasks arrive dynamically while the service providers remain static. Preventing task rejection caused by service provider overload is of utmost significance. To ensure a positive experience in relevant applications for both service providers and tasks, fairness must be considered. To address the issue, we model the problem as an online matching within a bipartite graph and tackle two minimax problems: one focuses on minimizing the highest waiting time of a task, while the other aims to minimize the highest workload of a service provider. We show that the second problem can be expressed as a linear program and thus solved efficiently while maintaining a reasonable approximation to the objective of the first problem. We developed novel methods that utilize the two minimax problems. We conducted extensive simulation experiments using real data and demonstrated that our novel heuristics, based on the linear program, performed remarkably well.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Resource allocation

190

Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion

Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang, Xiaofeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations. Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations. In this paper, we resort to stereo matching technique and bird’s-eye-view (BEV) representation learning to address such issues in SSC. Complementary to each other, stereo matching mitigates geometric ambiguity with epipolar constraint while BEV representation enhances the hallucination ability for invisible regions with global semantic context. However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC. Therefore, we further develop a unified occupancy-based framework dubbed BRGScene, which effectively bridges these two representations with dense 3D volumes for reliable semantic scene completion. Specifically, we design a novel Mutual Interactive Ensemble (MIE) block for pixel-level reliable aggregation of stereo geometry and BEV features. Within the MIE block, a Bi-directional Reliable Interaction (BRI) module, enhanced with confidence re-weighting, is employed to encourage fine-grained interaction through mutual guidance. Besides, a Dual Volume Ensemble (DVE) module is introduced to facilitate complementary aggregation through channel-wise recalibration and multi-group voting. Our method outperforms all published camera-based methods on SemanticKITTI for semantic scene completion. Our code is available on https://github.com/Arlo0o/StereoScene.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding

195

Unbiased Active Semi-supervised Binary Classification Models

JooChul Lee, Weidong Ma, Ziyang Wang

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Active learning is known to be a well-motivated algorithm that aims to maximize model performance with relatively small data, but it introduces sampling bias due to active selection. To adjust the bias, current literature utilizes corrective weights in a supervised learning approach. However, those methods consider only a small amount of actively sampled data and thus estimation efficiency can be improved using unsampled data together. In this paper, we develop an actively improved augmented estimation equation (AI-AEE) based on corrective weights as well as imputation models that allow us to leverage unlabeled data. The asymptotic distribution of the proposed estimator as the solution to the AI-AEE is derived, and an optimal sampling scheme to minimize the asymptotic mean squared error of the estimator is proposed. We then propose a general practical algorithm for training prediction models in the active and semi-supervised learning framework. The superiority of our method is demonstrated on synthetic and real data examples.

List of keywords

Machine Learning -> ML: Active learning
Machine Learning -> ML: Regression
Machine Learning -> ML: Semi-supervised learning

200

Proportion-based Sensitivity Analysis of Uncontrolled Confounding Bias in Causal Inference

Haruka Yoshida, Manabu Kuroki

6 min. talk | August 7th at 10:00 | Session: UAI: Causality, structural causal models and causal inference

[+] More

[-] Less

Uncontrolled confounding bias causes a spurious relationship between an exposure variable and an outcome variable and precludes reliable evaluation of the causal effect from observed data.Thus, it is important to observe a sufficient set of confounders to reliably evaluate the causal effect.However, there is no statistical method for judging whether an available set of covariates is sufficient to derive a reliable estimator for the causal effect.To address this problem, we focus on the fact that the mean squared error (MSE) of the outcome variable with respect to the average causal risk can be described as the sum of "the conditional variance of the outcome variable given the exposure variable" and "the square of the uncontrolled confounding bias".We then propose a novel sensitivity analysis, namely, the proportion-based sensitivity analysis of uncontrolled confounding bias in causal effects (PSA) in which the sensitivity parameter is formulated as the proportion of "the square of the uncontrolled confounding bias" to the MSE, and we clarify some properties.We also demonstrate the applicability of the PSA through two case studies.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

204

Contrastive General Graph Matching with Adaptive Augmentation Sampling

Jianyuan Bo, Yuan Fang

12 min. talk | August 8th at 11:30 | Session: ML: Unsupervised learning

[+] More

[-] Less

Graph matching has important applications in pattern recognition and beyond. Current approaches predominantly adopt supervised learning, demanding extensive labeled data which can be limited or costly. Meanwhile, self-supervised learning methods for graph matching often require additional side information such as extra categorical information and input features, limiting their application to the general case. Moreover, designing the optimal graph augmentations for self-supervised graph matching presents another challenge to ensure robustness and efficacy. To address these issues, we introduce a novel Graph-centric Contrastive framework for Graph Matching (GCGM), capitalizing on a vast pool of graph augmentations for contrastive learning, yet without needing any side information. Given the variety of augmentation choices, we further introduce a Boosting-inspired Adaptive Augmentation Sampler (BiAS), which adaptively selects more challenging augmentations tailored for graph matching. Through various experiments, our GCGM surpasses state-of-the-art self-supervised methods across various datasets, marking a significant step toward more effective, efficient and general graph matching.

List of keywords

Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Sequence and graph learning

214

FedSSA: Semantic Similarity-based Aggregation for Efficient Model-Heterogeneous Personalized Federated Learning

Liping Yi, Han Yu, Zhuan Shi, Gang Wang, Xiaoguang Liu, Lizhen Cui, Xiaoxiao Li

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Federated learning (FL) is a privacy-preserving collaboratively machine learning paradigm. Traditional FL requires all data owners (a.k.a. FL clients) to train the same local model. This design is not well-suited for scenarios involving data and/or system heterogeneity. Model-Heterogeneous Personalized FL (MHPFL) has emerged to address this challenge. Existing MHPFL approaches often rely on a public dataset with the same nature as the learning task, or incur high computation and communication costs. To address these limitations, we propose the Federated Semantic Similarity Aggregation (FedSSA) approach for supervised classification tasks, which splits each client’s model into a heterogeneous (structure-different) feature extractor and a homogeneous (structure-same) classification header. It performs local-to-global knowledge transfer via semantic similarity-based header parameter aggregation. In addition, global-to-local knowledge transfer is achieved via an adaptive parameter stabilization strategy which fuses the seen-class parameters of historical local headers with that of the latest global header for each client. FedSSA does not rely on public datasets, while only requiring partial header parameter transmission to save costs. Theoretical analysis proves the convergence of FedSSA. Extensive experiments present that FedSSA achieves up to 3.62% higher accuracy, 15.54 times higher communication efficiency, and 15.52 times higher computational efficiency compared to 7 state-of-the-art MHPFL baselines.

List of keywords

Machine Learning -> ML: Federated learning

219

Empirical Analysis of Dialogue Relation Extraction with Large Language Models

Guozheng Li, Zijie Xu, Ziyu Shang, Jiajun Liu, Ke Ji, Yikai Guo

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Dialogue relation extraction (DRE) aims to extract relations between two arguments within a dialogue, which is more challenging than standard RE due to the higher person pronoun frequency and lower information density in dialogues. However, existing DRE methods still suffer from two serious issues: (1) hard to capture long and sparse multi-turn information, and (2) struggle to extract golden relations based on partial dialogues, which motivates us to discover more effective methods that can alleviate the above issues. We notice that the rise of large language models (LLMs) has sparked considerable interest in evaluating their performance across diverse tasks. To this end, we initially investigate the capabilities of different LLMs in DRE, considering both proprietary models and open-source models. Interestingly, we discover that LLMs significantly alleviate two issues in existing DRE methods. Generally, we have following findings: (1) scaling up model size substantially boosts the overall DRE performance and achieves exceptional results, tackling the difficulty of capturing long and sparse multi-turn information; (2) LLMs encounter with much smaller performance drop from entire dialogue setting to partial dialogue setting compared to existing methods; (3) LLMs deliver competitive or superior performances under both full-shot and few-shot settings compared to current state-of-the-art; (4) LLMs show modest performances on inverse relations but much stronger improvements on general relations, and they can handle dialogues of various lengths especially for longer sequences.

List of keywords

Natural Language Processing -> NLP: Information extraction

223

Graph Contrastive Learning with Reinforcement Augmentation

Ziyang Liu, Chaokun Wang, Cheng Wu

6 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)

[+] More

[-] Less

Graph contrastive learning (GCL), designing contrastive objectives to learn embeddings from augmented graphs, has become a prevailing method for extracting embeddings from graphs in an unsupervised manner. As an important procedure in GCL, graph data augmentation (GDA) directly affects the model performance on downstream tasks. Currently, the GCL methods typically treat GDA as independent events, neglecting its continuity. In this paper, we regard the GDA in GCL as a Markov decision process and propose a novel graph reinforcement augmentation framework for GCL. Based on this framework, we design a Graph Advantage Actor-Critic (GA2C) model. We conduct extensive experiments to evaluate GA2C on unsupervised learning, transfer learning, and semi-supervised learning. The experimental results demonstrate the performance superiority of GA2C over the state-of-the-art GCL models. Furthermore, we verify that GA2C is more efficient than the other GCL methods with learnable GDA and provide two examples of chemical molecular graphs from ZINC-2M to demonstrate that GA2C generates meaningful augmented views, where the edge weights reflect the importance of chemical bonds in the molecule.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Self-supervised Learning

230

Negative Prompt Driven Complementary Parallel Representation for Open-World 3D Object Retrieval

Yang Xu, Yifan Feng, Yue Gao

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

The limited availability of supervised labels (positive information) poses a notable challenge for open-world retrieval. However, negative information is more easily obtained but remains underexploited in current methods. In this paper, we introduce the Negative Prompt Driven Complementary Parallel Representation (NPCP) framework, which navigates the complexities of open-world retrieval through the lens of Negative Prompts. Specifically, we employ the Parallel Exclusive Embedding (PEE) to effectively utilize the prompt information, bilaterally capturing both explicit negative and implicit positive signals. To address the challenges of embedding unification and generalization, our method leverages high-order correlations among objects through the Complementary Structure Tuning (CST), by constructing a complementary hypergraph based on bi-directional and cross-category correlations. We have developed four multimodal datasets for open-world 3D object retrieval with negative prompts: NPMN, NPAB, NPNT, and NPES. Extensive experiments and ablation studies on these four benchmarks demonstrate the superiority of our method over current state-of-the-art approaches.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Representation learning

231

MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

Xiao-Yin Liu, Xiao-Hu Zhou, Guotao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Offline reinforcement learning (RL) faces a significant challenge of distribution shift. Model-free offline RL penalizes the Q value for out-of-distribution (OOD) data or constrains the policy closed to the behavior policy to tackle this problem, but this inhibits the exploration of the OOD region. Model-based offline RL, which uses the trained environment model to generate more OOD data and performs conservative policy optimization within that model, has become an effective method for this problem. However, the current model-based algorithms rarely consider agent robustness when incorporating conservatism into policy. Therefore, the new model-based offline algorithm with a conservative Bellman operator (MICRO) is proposed. This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm. Compared with previous model-based algorithms with robust adversarial models, MICRO can significantly reduce the computation cost by only choosing the minimal Q value in the state uncertainty set. Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Model-based and model learning reinforcement learning
Machine Learning -> ML: Offline reinforcement learning
Machine Learning -> ML: Robustness

234

Spear: Evaluate the Adversarial Robustness of Compressed Neural Models

Chong Yu, Tao Chen, Zhongxue Gan, Jiayuan Fan

12 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

As Artificial Intelligence evolves, the neural models vulnerable to adversarial attacks may produce fatal results in critical applications. This paper mainly discusses the robustness of the compressed neural models facing adversarial attacks. A few studies discuss the interaction between model compression and adversarial attack. However, they focus on the robustness against the traditional attacks designed for the dense models, not the attacks intended explicitly for the compressed models, using sparsity and quantization techniques. Compressed models often have fewer parameters and smaller sizes that are more friendly to resource-limited devices than dense models, so they are widely deployed in various edge and mobile devices. However, introducing the sparsity and quantization into neural models further imposes higher attack risks. A specific adversarial attack method (Spear) is proposed to generate the particular adversarial attack samples for evaluating the robustness of the compressed models. The Spear attack finds minimal perturbations to create the attack samples to maximize the different behaviors between the compressed and dense reference models. We demonstrate the proposed Spear attack technique can generally be applied to various networks and tasks through quantitative and ablation experiments.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Robustness

247

RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multi-modal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks. Code is available at https://github.com/adept-thu/RoboFusion.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)
Robotics -> ROB: Perception

260

A Semi-supervised Molecular Learning Framework for Activity Cliff Estimation

Fang Wu

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Machine learning (ML) enables accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Their success is based on the principle of similarity at its heart, assuming that similar molecules exhibit close properties. However, activity cliffs challenge this principle, and their presence leads to a sharp decline in the performance of existing ML algorithms, particularly graph-based methods. To overcome this obstacle under a low-data scenario, we propose a novel semi-supervised learning (SSL) method dubbed SemiMol, which employs predictions on numerous unannotated data as pseudo-signals for subsequent training. Specifically, we introduce an additional instructor model to evaluate the accuracy and trustworthiness of proxy labels because existing pseudo-labeling approaches require probabilistic outputs to reveal the model’s confidence and fail to be applied in regression tasks. Moreover, we design a self-adaptive curriculum learning algorithm to progressively move the target model toward hard samples at a controllable pace. Extensive experiments on 30 activity cliff datasets demonstrate that SemiMol significantly enhances graph-based ML architectures and outpasses state-of-the-art pretraining and SSL baselines.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Multidisciplinary Topics and Applications -> MTA: Life sciences
Humans and AI -> HAI: Applications

264

VSGT: Variational Spatial and Gaussian Temporal Graph Models for EEG-based Emotion Recognition

Chenyu Liu, Xinliang Zhou, Jiaping Xiao, Zhengri Zhu, Liming Zhai, Ziyu Jia, Yang Liu

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

Electroencephalogram (EEG), which directly reflects the emotional activity of the brain, has been increasingly utilized for emotion recognition. Most works exploit the spatial and temporal dependencies in EEG to learn emotional feature representations, but they still have two limitations to reach their full potential. First, prior knowledge is rarely used to capture the spatial dependency of brain regions. Second, the cross temporal dependency between consecutive time slices for different brain regions is ignored. To address these limitations, in this paper, we propose Variational Spatial and Gaussian Temporal (VSGT) graph models to investigate the spatial and temporal dependencies for EEG-based emotion recognition. The VSGT has two key components: Variational Spatial Encoder (VSE) and Gaussian Temporal Encoder (GTE). The VSE leverages the upper bound theorem to identify the dynamic spatial dependency based on prior knowledge by the variational Bayesian method. Besides, the GTE exploits the conditional Gaussian graph transform that computes comprehensive temporal dependency between consecutive time slices. Finally, the VSGT utilizes a recurrent structure to calculate the spatial and temporal dependencies for all time slices. Extensive experiments show the superiority of VSGT over state-of-the-art methods on multiple EEG datasets.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Brain sciences

281

Probabilistically Robust Watermarking of Neural Networks

Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.

List of keywords

Machine Learning -> ML: Adversarial machine learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Uncertainty in AI -> UAI: Applications

291

CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment

Kanglei Zhou, Junlin Li, Ruizhi Cai, Liyuan Wang, Xingxing Zhang, Xiaohui Liang

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Action Quality Assessment (AQA) is pivotal for quantifying actions across domains like sports and medical care. Existing methods often rely on pre-trained backbones from large-scale action recognition datasets to boost performance on smaller AQA datasets. However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA. Moreover, fine-tuning on smaller datasets risks overfitting. To address these issues, we propose Coarse-to-Fine Instruction Alignment (CoFInAl). Inspired by recent advances in large language model tuning, CoFInAl aligns AQA with broader pre-trained tasks by reformulating it as a coarse-to-fine classification task. Initially, it learns grade prototypes for coarse assessment and then utilizes fixed sub-grade prototypes for fine-grained assessment. This hierarchical approach mirrors the judging process, enhancing interpretability within the AQA framework. Experimental results on two long-term AQA datasets demonstrate CoFInAl achieves state-of-the-art performance with significant correlation gains of 5.49% and 3.55% on Rhythmic Gymnastics and Fis-V, respectively. Our Code is available at https://github.com/ZhouKanglei/CoFInAl_AQA.

List of keywords

Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Video analysis and understanding

297

Hacking Task Confounder in Meta-Learning

Jingyao Wang, Yi Ren, Zeen Song, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as “Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance. The code is provided in https://github.com/WangJingyao07/MetaCRL.

List of keywords

Machine Learning -> ML: Meta-learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Causality
Machine Learning -> ML: Few-shot learning

313

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Image and video synthesis and generation

322

Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning

Bo Ye, Kai Gan, Tong Wei, Min-Ling Zhang

12 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. Moreover, capturing the semantics of unlabeled novel category samples is also challenging due to the missing label information. To address the above issues, we introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together in the output space. Extensive experiments on benchmark datasets demonstrate that previous approaches may significantly hinder novel class learning, whereas our method strikingly balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset. Importantly, we find that fine-tuning the self-supervised pre-trained model significantly boosts the performance, which is overlooked in prior literature. Our code is available at https://github.com/yebo0216best/LPS-main.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Weakly supervised learning

334

Explore Internal and External Similarity for Single Image Deraining with Graph Neural Networks

Cong Wang, Wei Wang, Chengjin Yu, Jie Mu

6 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

Patch-level non-local self-similarity is an important property of natural images. However, most existing methods do not consider this property into neural networks for image deraining, thus affecting recovery performance. Motivated by this property, we find that there exists significant patch recurrence property of a rainy image, that is, similar patches tend to recur many times in one image and its multi-scale images and external images. To better model this property for image detaining, we develop a multi-scale graph network with exemplars, called MSGNN, that contains two branches: 1) internal data-based supervised branch is used to model the internal relations of similar patches from the rainy image itself and its multi-scale images and 2) external data-participated unsupervised branch is used to model the external relations of the similar patches in the rainy image and exemplar. Specifically, we construct a graph model by searching the k-nearest neighboring patches from both the rainy images in a multi-scale framework and the exemplar. After obtaining the corresponding k neighboring patches from the multi-scale images and exemplar, we build a graph and aggregate them in an attentional manner so that the graph can provide more information from similar patches for image deraining. We embed the proposed graph in a deep neural network and train it in an end-to-end manner. Extensive experiments demonstrate that the proposed algorithm performs favorably against eight state-of-the-art methods on five public synthetic datasets and one real-world dataset. The source codes will be available at https://github.com/supersupercong/MSGNN.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Computational photography

349

Contrastive and View-Interaction Structure Learning for Multi-view Clustering

Jing Wang, Songhe Feng

6 min. talk | August 8th at 10:00 | Session: ML: Clustering

[+] More

[-] Less

Existing Deep Multi-view Clustering (DMVC) approaches typically concentrate on capturing consensus semantics from multiple views, where contrastive learning is widely used to align view-specific representations of each view. Unfortunately, view-specific representations are extracted from the content information of the corresponding instance, neglecting the relationships among different instances. Furthermore, existing contrastive loss imports numerous false negative pairs that conflict with the clustering objectives. In response to these challenges, we propose a contraStive and viEw-interaction stRucture learning framework for multI-viEw cluStering (SERIES). Our method takes into account the structural relations among instances and boosts the contrastive loss to improve intra-class compactness. Meanwhile, a cross-view dual relation generation mechanism is introduced to achieve the consensus structural graph across multiple views for clustering. Specifically, we initially acquire view-specific representations using multiple graph autoencoders to exploit both content information and structural information. Furthermore, to pull together the same cluster instances, a soft negative pair aware contrastive loss is employed to distinguish the dissimilar instances while attracting similar instances. Thereafter, the view-specific representations are fed into cross-view dual relation generation layers to generate the affinity matrices of each other, aiming to reveal a consistent structural graph across various views. Extensive experiments conducted on six benchmarks illustrate the superiority of our method compared to other state-of-the-art approaches.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering

362

ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

Yong Wu, Yang Wang, Sanqing Qu, Zhijun Li, Guang Chen

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Humans and AI -> HAI: Applications

366

Efficiency Calibration of Implicit Regularization in Deep Networks via Self-paced Curriculum-Driven Singular Value Selection

Zhe Li, Shuo Chen, Jian Yang, Lei Luo

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

The generalization of neural networks has been a major focus of research in deep learning. It is often interpreted as an implicit bias towards solutions with specific properties. Especially, in practical applications, it has been observed that linear neural networks (LNN) tend to favor low-rank solutions for matrix completion tasks. However, most existing methods rely on increasing the depth of the neural network to enhance the low rank of solutions, resulting in higher complexity. In this paper, we propose a new explicit regularization method that calibrates the implicit bias towards low-rank trends in matrix completion tasks. Our approach automatically incorporates smaller singular values into the training process using a self-paced learning strategy, gradually restoring matrix information. By jointly using both implicit and explicit regularization, we effectively capture the low-rank structure of LNN and accelerate its convergence. We also analyze how our proposed penalty term interacts with implicit regularization and provide theoretical guarantees for our new model. To evaluate the effectiveness of our method, we conduct a series of experiments on both simulated and real-world data. Our experimental results clearly demonstrate that our method has better robustness and generalization ability compared with other methods.

List of keywords

Machine Learning -> ML: Representation learning
Data Mining -> DM: Recommender systems
Machine Learning -> ML: Theory of deep learning

376

Higher-Order Argumentation Frameworks: Principles and Gradual Semantics

Leila Amgoud, Dragan Doder, Marie-Christine Lagasquie-Schiex

12 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

The paper investigates how to evaluate elements in complex argumentation frameworks, where both arguments and attacks are weighted and might be attacked by arguments. We propose the first gradual semantics that assign a numerical value to every argument and attack. The value represents the acceptance (seriousness) degree of an argument (attack). We start by highlighting various technical challenges facing semantics in such complex settings, including how to deal with attacks vs arguments, and how to combine their values. We present principles that describe different strategies offered to semantics to meet such challenges. Then, we introduce various semantics per strategy. For instance, some semantics evaluate attacks and arguments in the same way while others, called hybrid, treat them differently. Finally, the principles are used to compare the plethora of novel semantics. The final result is a catalogue of semantics with different formal guarantees and behaviours.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Common-sense reasoning

387

InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification

Qi Han, Zhibo Tian, Chengwei Xia, Kun Zhan

6 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information across different augmented views. Moreover, we theoretically analyze that the information entropy of the posterior of an image classifier is approximated by maximizing the likelihood function of the softmax predictions. Guided by these insights, we optimize our model from both perspectives to ensure that the predicted probability distribution closely aligns with the ground-truth distribution. Given the theoretical connection to information entropy, we name our method InfoMatch. Through extensive experiments, we show its superior performance. The source code is available at https://github.com/kunzhan/InfoMatch.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Unsupervised learning
Computer Vision -> CV: Representation learning

395

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Zhaoxi Mu, Xinyu Yang

6 min. talk | August 8th at 10:00 | Session: NLP: Speech

[+] More

[-] Less

The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In audio-visual target speech extraction tasks, the audio modality tends to dominate, potentially overshadowing the importance of visual guidance. To tackle this issue, we propose AVSepChain, drawing inspiration from the speech chain concept. Our approach partitions the audio-visual target speech extraction task into two stages: speech perception and speech production. In the speech perception stage, audio serves as the dominant modality, while visual information acts as the conditional modality. Conversely, in the speech production stage, the roles are reversed. This transformation of modality status aims to alleviate the problem of modality imbalance. Additionally, we introduce a contrastive semantic matching loss to ensure that the semantic information conveyed by the generated speech aligns with the semantic information conveyed by lip movements during the speech production stage. Through extensive experiments conducted on multiple benchmark datasets for audio-visual target speech extraction, we showcase the superior performance achieved by our proposed method.

List of keywords

Natural Language Processing -> NLP: Speech
Machine Learning -> ML: Multi-modal learning

404

Scalable Federated Unlearning via Isolated and Coded Sharding

Yijing Lin, Zhipeng Gao, Hongyang Du, Dusit Niyato, Gui Gui, Shuguang Cui, Jinke Ren

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Federated unlearning has emerged as a promising paradigm to erase the client-level data effect without affecting the performance of collaborative learning models. However, the federated unlearning process often introduces extensive storage overhead and consumes substantial computational resources, thus hindering its implementation in practice. To address this issue, this paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing. We first divide distributed clients into multiple isolated shards across stages to reduce the number of clients being affected. Then, to reduce the storage overhead of the central server, we develop a coded computing mechanism by compressing the model parameters across different shards. In addition, we provide the theoretical analysis of time efficiency and storage effectiveness for the isolated and coded sharding. Finally, extensive experiments on two typical learning tasks, i.e., classification and generation, demonstrate that our proposed framework can achieve better performance than three state-of-the-art frameworks in terms of accuracy, retraining time, storage overhead, and F1 scores for resisting membership inference attacks.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Trustworthy machine learning

406

AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning

Zhiyu Wu, Jinshi Cui

12 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.

List of keywords

Machine Learning -> ML: Semi-supervised learning

418

IntensPure: Attack Intensity-aware Secondary Domain Adaptive Diffusion for Adversarial Purification

Eun-Gi Lee, Moon Seok Lee, Jae Hyun Yoon, Seok Bong Yoo

6 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

Adversarial attacks pose a severe threat to the accuracy of person re-identification (re-ID) systems, a critical security technology. Adversarial purification methods are promising approaches for defending against comprehensive attacks, including unseen ones. However, re-ID testing identities (IDs) are unseen, requiring more sophisticated purification than other classification tasks for adversarial defense. We propose IntensPure, an adversarial purification method in person re-ID that quantifies attack intensity via ID stability and attribute inconsistency to customize purification strength. Based on the estimated attack intensity, IntensPure employs secondary domain adaptive diffusion focused on purifying the low- and mid-frequency coefficients vulnerable to re-ID attacks. This method significantly reduces computational costs compared to the conventional diffusion method. For elaborate purification, IntensPure performs a directional diffusion process and refinements, leveraging the directional characteristics of secondary images. The experimental results on diverse attacks demonstrate that IntensPure outperforms the existing methods in terms of rank-1 accuracy.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Recognition (object detection, categorization)

434

Efficient Tuning and Inference for Large Language Models on Textual Graphs

Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Rich textual and topological information of textual graphs need to be modeled in real-world applications such as webpages, e-commerce, and academic articles. Practitioners have been long following the path of adopting a shallow text encoder and a subsequent graph neural network (GNN) to solve this problem. In light of recent advancements in large language models (LLMs), it is apparent that integrating LLMs for enhanced textual encoding can substantially improve the performance of textual graphs. Nevertheless, the efficiency of these methods poses a significant challenge. In this paper, we propose ENGINE, a parameter- and memory-efficient fine-tuning method for textual graphs with an LLM encoder. The key insight is to combine the LLMs and GNNs through a tunable side structure, which significantly reduces the training complexity without impairing the joint model’s capacity. Extensive experiments on textual graphs demonstrate our method’s effectiveness by achieving the best model performance, meanwhile having the lowest training cost compared to previous methods. Moreover, we introduce two variants with caching and dynamic early exit to further enhance training and inference speed. Specifically, caching accelerates ENGINE’s training by 12x, and dynamic early exit achieves up to 5x faster inference with a negligible performance drop (at maximum 1.17% relevant drop across 7 datasets). Our codes are available at: https://github.com/ZhuYun97/ENGINE.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
Natural Language Processing -> NLP: Language models
Machine Learning -> ML: Supervised Learning

439

Exploiting Multi-Label Correlation in Label Distribution Learning

Zhiqiang Kou, Jing Wang, Jiawei Tang, Yuheng Jia, Boyu Shi, Xin Geng

12 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Numerous LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent research has unveiled that label distribution matrices typically maintain full rank, posing a challenge to approaches relying on low-rank label correlation. Notably, low-rank label correlation finds widespread adoption in multi-label learning (MLL) literature due to the often low-rank nature of multi-label matrices. Inspired by that, we introduce an auxiliary MLL process within the LDL framework, focusing on capturing low-rank label correlation within this auxiliary MLL component rather than the LDL itself. By doing so, we adeptly exploited low-rank label correlation in our LDL methods. We conduct comprehensive experiments and demonstrate that our methods are superior to existing LDL methods. Besides, the ablation studies justify the advantages of exploiting low-rank label correlation in the auxiliary MLL.

List of keywords

Machine Learning -> ML: Multi-label learning
Machine Learning -> ML: Applications
Machine Learning -> ML: Classification

440

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Zuan Gao, Yuxin Wang, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun Xu, Hongtao Xie

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or sequence contrastive learning. However, they omit modeling the linguistic information in text images, which is crucial for recognizing text. To simultaneously capture local character features and linguistic information in visual space, we propose Symmetric Superimposition Modeling (SSM). The objective of SSM is to reconstruct the direction-specific pixel and feature signals from the symmetrically superimposed input. Specifically, we add the original image with its inverted views to create the symmetrically superimposed inputs. At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context. At the feature level, we reconstruct the feature of the same original image and inverted image with different augmentations to model the semantic-level linguistic context and the local character discrimination. In our design, we disrupt the character shape and linguistic rules. Consequently, the dual-level reconstruction facilitates understanding character shapes and linguistic information from the perspective of visual texture and feature semantics. Experiments on various text recognition benchmarks demonstrate the effectiveness and generality of SSM, with 4.1\% average performance gains and 86.6% new state-of-the-art average word accuracy on Union14M benchmarks. The code is available at https://github.com/FaltingsA/SSM.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

465

EvaNet: Elevation-Guided Flood Extent Mapping on Earth Imagery

Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou

12 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Accurate and timely mapping of flood extent from high resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which cannot segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent mapping. We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp. dry), then its adjacent locations with a lower (resp. higher) elevation must also be flooded (resp. dry); (2) a new (de)convolution operation that integrates the elevation map by a location-sensitive gating mechanism to regulate how much spectral features flow through adjacent layers. Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent mapping. EvaNet is open-sourced at https://github.com/MTSami/EvaNet.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Segmentation
Data Mining -> DM: Mining spatial and/or temporal data

478

Dual Enhancement in ODI Super-Resolution: Adapting Convolution and Upsampling to Projection Distortion

Xiang Ji, Changqiao Xu, Lujie Zhong, Shujie Yang, Han Xiao, Gabriel-Miro Muntean

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Omnidirectional images (ODIs) demand considerably higher resolution to ensure high quality across all viewports. Traditional convolutional neural networks (CNN)-based single-image super-resolution (SISR) networks, however, are not effective for spherical ODIs. This is due to the uneven pixel density distribution and varying texture complexity in different regions that arise when projecting from a sphere to a plane. Additionally, the computational and memory costs associated with large-sized ODIs present a challenge for real-world application. To address these issues, we propose an efficient distortion-adaptive super-resolution network (ODA-SRN). Specifically, ODA-SRN employs a series of specially designed Distortion Attention Block Groups (DABG) as its backbone. Our Distortion Attention Blocks (DABs) utilize multi-segment parameterized convolution to generate dynamic filters, which compensate for distortion and texture fading during feature extraction. Moreover, we introduce an upsampling scheme that accounts for the dependence of pixel position and distortion degree to achieve pixel-level distortion offset. A comprehensive set of results demonstrates that our ODA-SRN significantly improves the super-resolution performance for ODIs, both quantitatively and qualitatively, when compared to other state-of-the-art methods.

List of keywords

Computer Vision -> CV: 3D computer vision
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Applications
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning

479

Structure-Aware Spatial-Temporal Interaction Network for Video Shadow Detection

Housheng Wei, Guanyu Xing, Jingwei Liao, Yanci Zhang, Yanli Liu

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Video shadow detection faces significant challenges due to ambiguous semantics and variable shapes. Existing video shadow detection algorithms typically overlook the fine shadow details, resulting in inconsistent detection between consecutive frames in complex real-world video scenarios. To address this issue, we propose a spatial-temporal feature interaction strategy, which refines and enhances global shadow semantics with local prior features in the modeling of shadow relations between frames. Moreover, a structure-aware shadow prediction module is proposed, which focuses on modeling the distance relation between local shadow edges and regions. Quantitative experimental results demonstrate that our approach significantly outperforms the state-of-the-art methods, providing stable and consistent shadow detection results in complex video shadow scenarios.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Video analysis and understanding

481

Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

Haolin Yang, Chaoqiang Zhao, Lu Sheng, Yang Tang

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

Nighttime self-supervised monocular depth estimation has received increasing attention in recent years. However, using night images for self-supervision is unreliable because the photometric consistency assumption is usually violated in the videos taken under complex lighting conditions. Even with domain adaptation or photometric loss repair, performance is still limited by the poor supervision of night images on trainable networks. In this paper, we propose a self-supervised nighttime monocular depth estimation method that does not use any night images during training. Our framework utilizes day images as a stable source for self-supervision and applies physical priors (e.g., wave optics, reflection model and read-shot noise model) to compensate for some key day-night differences. With day-to-night data distribution compensation, our framework can be trained in an efficient one-stage self-supervised manner. Though no nighttime images are considered during training, qualitative and quantitative results demonstrate that our method achieves SoTA depth estimating results on the challenging nuScenes-Night and RobotCar-Night compared with existing methods.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Other
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

485

SwiftThief: Enhancing Query Efficiency of Model Stealing by Contrastive Learning

Jeonghyun Lee, Sungmin Han, Sangkyun Lee

6 min. talk | August 7th at 10:00 | Session: ETF: Safety and robustness

[+] More

[-] Less

Model-stealing attacks are emerging as a severe threat to AI-based services because an adversary can create models that duplicate the functionality of the black-box AI models inside the services with regular query-based access. To avoid detection or query costs, the model-stealing adversary must consider minimizing the number of queries to obtain an accurate clone model. To achieve this goal, we propose SwiftThief, a novel model-stealing framework that utilizes both queried and unqueried data to reduce query complexity. In particular, SwiftThief uses contrastive learning, a recent technique for representation learning. We formulate a new objective function for model stealing consisting of self-supervised (for abundant unqueried inputs from public datasets) and soft-supervised (for queried inputs) contrastive losses, jointly optimized with an output matching loss (for queried inputs). In addition, we suggest a new sampling strategy to prioritize rarely queried classes to improve attack performance. Our experiments proved that SwiftThief could significantly enhance the efficiency of model-stealing attacks compared to the existing methods, achieving similar attack performance using only half of the query budgets of the competing approaches. Also, SwiftThief showed high competence even when a defense was activated for the victims.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Multidisciplinary Topics and Applications -> MTA: Security and privacy

486

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji, Wenwei Jin, Hongtao Lu, Feng Zhao

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel Pseudo Multi-Perspective Transformer network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Decomposition, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation

496

Online Submodular Maximization via Adaptive Thresholds

Zhengchen Yang, Jiping Zheng

12 min. talk | August 7th at 11:30 | Session: S: Combinatorial search and optimisation

[+] More

[-] Less

Submodular function maximization has been studied extensively in recent years due to its numerous applications in machine learning and artificial intelligence. We study a natural online variant of this problem on massive streaming data in which elements arrive one-by-one and the algorithm has to maintain a solution under cardinality constraint, i.e., k. Upon arrival of an element, the algorithm to maximize a monotone submodular function has to decide whether to accept the element and may replace a previously chosen element. Existing algorithms cannot simultaneously achieve optimal performance in terms of competitive ratio, memory complexity and running time. Also, the algorithm with best competitive ratio performs poorly in practice. In this paper, we propose a new algorithm OnlineAdaptive with optimal performance by exploiting adaptive thresholds to decide the acceptance of arriving elements by replacement. We prove that the competitive ratio of OnlineAdaptive is at least 1/4, and the ratio is about 0.2959 when k>=4 and approaches 0.3178 when k tends to infinity. In addition, OnlineAdaptive only needs O(k) memory and just performs one oracle per element. Experiments on diverse datasets confirm that OnlineAdaptive outperforms existing algorithms in both quality and efficiency.

List of keywords

Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search

509

EVE: Efficient Zero-Shot Text-Based Video Editing With Depth Map Guidance and Temporal Consistency Constraints

Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and Efficient zero-shot Video Editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark named ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE achieves a satisfactory trade-off between performance and efficiency. Codebase, datasets, and video editing demos are available at https://github.com/alipay/Ant-Multi-Modal-Framework/blob/main/prj/EVE.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications

524

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

Shengjian Wu, Li Sun, Qingli Li

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

DEtection TRansformer (DETR) becomes a dominant paradigm, mainly due to its common architecture with high accuracy and no post-processing. However, DETR suffers from unstable training dynamics. It consumes more data and epochs to converge compared with CNN-based detectors. This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA), and distills its knowledge into the online model in following three aspects. First, the matching relation between object queries and ground truth (GT) boxes in the teacher is employed to guide the student, so queries within the student are not only assigned labels based on their own predictions, but also refer to the matching results from the teacher. Second, the teacher’s initial query is given to the online student, and its prediction is directly constrained by the corresponding output from the teacher. Finally, the object queries from teacher’s different decoding stages are used to build the auxiliary groups to accelerate the convergence. For each GT, two queries with the least matching costs are selected into this extra group, and they predict the GT box and participate the optimization. Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)

549

MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection

Guiping Cao, Wenjian Huang, Xiangyuan Lan, Jianguo Zhang, Dongmei Jiang, Yaowei Wang

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Popular transformer-based detectors detect objects in a one-to-one manner, where both the bounding box and category of each object are predicted only by the single query, leading to the box-sensitive category predictions. Additionally, the initialization of positional queries solely based on the predicted confidence scores or learnable embeddings neglects the significant spatial interrelation between different queries. This oversight leads to an imbalanced spatial distribution of queries (SDQ). In this paper, we propose a new MLP-DINO model to address these issues. Firstly, we present a new Query-Independent Category Supervision (QICS) approach for modeling categories information, decoupling the sensitive bounding box prediction process to improve the detection performance. Additionally, to further improve the category predictions, we introduce a deep MLP model into transformer-based detection framework to capture the long-range and short-range information simultaneously. Thirdly, to balance the SDQ, we design a novel Graph-based Query Selection (GQS) method that distributes each query point in a discrete manner by graphing the spatial information of queries to cover a broader range of potential objects, significantly enhancing the hit-rate of queries. Experimental results on COCO indicate that our MLP-DINO achieves 54.6% AP with only 44M parame ters under 36-epoch setting, greatly outperforming the original DINO by +3.7% AP with fewer parameters and FLOPs. The source codes will be available at https://github.com/Med-Process/MLP-DINO.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding
Machine Learning -> ML: Multi-label learning

554

Feature Norm Regularized Federated Learning: Utilizing Data Disparities for Model Performance Gains

Ke Hu, Liyao Xiang, Peng Tang, Weidong Qiu

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

Federated learning (FL) is a machine learning paradigm that aggregates knowledge and utilizes computational power from multiple participants to train a global model. However, a commonplace challenge—non-independent and identically distributed (non-i.i.d.) data across participants—can lead to significant divergence in model updates, thus diminishing training efficacy. In this paper, we propose the Feature Norm Regularized Federated Learning (FNR-FL) algorithm to tackle the non-i.i.d challenge. FNR-FL incorporates class average feature norms into the loss function by a straightforward yet effective regularization strategy. The core idea of FNR-FL is to penalize the deviations in the update directions of local models caused by the non-i.i.d data. Theoretically, we provide convergence guarantees for FNR-FL when training under non-i.i.d scenarios. Practically, our comprehensive experimental evaluations demonstrate that FNR-FL significantly outperforms existing FL algorithms in terms of test accuracy, and maintains a competitive convergence rate with lower communication overhead and shorter duration. Compared to FedAvg, FNR-FL exhibits a 66.24% improvement in accuracy and an 11.40% reduction in training time, underscoring its enhanced effectiveness and efficiency. The code is available on GitHub at: https://github.com/LonelyMoonDesert/FNR-FL.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Optimization
Machine Learning -> ML: Robustness
Machine Learning -> ML: Supervised Learning

564

Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments

Yujie Lin, Chen Zhao, Minglai Shao, Baoluo Meng, Xujiang Zhao, Haifeng Chen

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Recognizing domain generalization as a commonplace challenge in machine learning, data distribution might progressively evolve across a continuum of sequential domains in practical scenarios. While current methodologies primarily concentrate on bolstering model effectiveness within these new domains, they tend to neglect issues of fairness throughout the learning process. In response, we propose an innovative framework known as Disentanglement for Counterfactual Fairness-aware Domain Generalization (DCFDG). This approach adeptly removes domain-specific information and sensitive information from the embedded representation of classification features. To scrutinize the intricate interplay between semantic information, domain-specific information, and sensitive attributes, we systematically partition the exogenous factors into four latent variables. By incorporating fairness regularization, we utilize semantic information exclusively for classification purposes. Empirical validation on synthetic and authentic datasets substantiates the efficacy of our approach, demonstrating elevated accuracy levels while ensuring the preservation of fairness amidst the evolving landscape of continuous domains.

List of keywords

Machine Learning -> ML: Time series and data streams
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Machine Learning -> ML: Causality
Machine Learning -> ML: Generative models

567

Class-Specific Semantic Generation and Reconstruction Learning for Open Set Recognition

Liu Haoyang, Yaojin Lin, Peipei Li, Jun Hu, Xuegang Hu

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Open set recognition is a crucial research theme for open-environment machine learning. For this problem, a common solution is to learn compact representations of known classes and identify unknown samples by measuring deviations from these known classes. However, the aforementioned methods (1) lack open training consideration, which is detrimental to the fitting of known classes, and (2) recognize unknown classes on an inadequate basis, which limits the accuracy of recognition. In this study, we propose an open reconstruction learning framework that learns a union boundary region of known classes to characterize unknown space. This facilitates the isolation of known space from unknown space to represent known classes compactly and provides a more reliable recognition basis from the perspective of both known and unknown space. Specifically, an adversarial constraint is used to generate class-specific boundary samples. Then, the known classes and approximate unknown space are fitted with manifolds represented by class-specific auto-encoders. Finally, the auto-encoders output the reconstruction error in terms of known and unknown spaces to recognize samples. Extensive experimental results show that the proposed method outperforms existing advanced methods and achieves new stateof-the-art performance. The code is available at https://github.com/Ashowman98/CSGRL.

List of keywords

Data Mining -> DM: Other
Data Mining -> DM: Anomaly/outlier detection

577

Learning with Posterior Sampling for Revenue Management under Time-varying Demand

Kazuma Shimizu, Junya Honda, Shinji Ito, Shinji Nakadai

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments. An extended version of this paper with appendixes is available at: http://arxiv.org/abs/2405.04910.

List of keywords

Machine Learning -> ML: Online learning
Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Multi-armed bandits

586

ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning

Jingcai Guo, Qihua Zhou, Xiaocheng Lu, Ruibin Li, Ziming Liu, Jie Zhang, Bo Han, Junyang Chen, Xin Xie, Song Guo

12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

This paper provides a novel parsimonious yet efficient design for zero-shot learning (ZSL), dubbed ParsNets, in which we are interested in learning a composition of on-device friendly linear networks, each with orthogonality and low-rankness properties, to achieve equivalent or better performance against deep models. Concretely, we first refactor the core module of ZSL, i.e., the visual-semantics mapping function, into several base linear networks that correspond to diverse components of the semantic space, wherein the complex nonlinearity can be collapsed into simple local linearities. Then, to facilitate the generalization of local linearities, we construct a maximal margin geometry on the learned features by enforcing low-rank constraints on intra-class samples and high-rank constraints on inter-class samples, resulting in orthogonal subspaces for different classes. To enhance the model’s adaptability and counterbalance the over-/under-fittings, a set of sample-wise indicators is employed to select a sparse subset from these base linear networks to form a composite semantic predictor for each sample. Notably, maximal margin geometry can guarantee the diversity of features and, meanwhile, local linearities guarantee efficiency. Thus, our ParsNets can generalize better to unseen classes and can be deployed flexibly on resource-constrained devices.

List of keywords

Machine Learning -> ML: Cost-sensitive learning
Machine Learning -> ML: Ensemble methods
Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Learning sparse models

603

A Swap Relaxation-Based Local Search for the Latin Square Completion Problem

Zhenxuan Xie, Zhipeng Lü, Zhouxing Su, Chu-Min Li, Junwen Ding, Yuxuan Wang

6 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

The Latin square completion (LSC) problem aims to assign n symbols to the empty cells of a partially filled Latin square such that in each row and each column, each symbol appears exactly once. In this paper, we propose a swap relaxation-based fast local search algorithm called SRLS for solving the LSC problem. First, it introduces a novel search space definition, which forbids row conflicts based on which a swap-based neighborhood is defined. Second, a color domain relaxation technique is employed in the swap-based neighborhood by temporarily accepting the violation of some constraints to connect high-quality solutions. Third, two effective scoring functions are adopted to select neighborhood moves minimizing the number of conflicting edges as well as the number of color domain violations. Finally, SRLS employs an adaptive restart mechanism to balance the exploitation and exploration of the search. Extensive experiments on 1819 public benchmark instances demonstrate that SRLS outperforms the state-of-the-art algorithms in the literature in terms of both success rate and computational efficiency.

List of keywords

Search -> S: Local search
Search -> S: Heuristic search
Search -> S: Meta-reasoning and meta-heuristics
Search -> S: Combinatorial search and optimisation

605

LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion

Qian Li, Zhuo Chen, Cheng Ji, Shiqi Jiang, Jianxin Li

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Knowledge Graphs (KGs) are pivotal in various NLP applications but often grapple with incompleteness, especially due to the long-tail problem where infrequent, unpopular relationships drastically reduce the KG completion performance. In this paper, we focus on Few-shot Knowledge Graph Completion (FKGC), a task addressing these gaps in long-tail scenarios. Amidst the rapid evolution of Large Language Models, we propose a generation-based FKGC paradigm facilitated by LLM distillation. Our MuKDC framework employs multi-level knowledge distillation for few-shot KG completion, generating supplementary knowledge to mitigate data scarcity in few-shot environments. MuKDC comprises two primary components: Multi-level Knowledge Generation, which enriches the KG at various levels, and Consistency Assessment, to ensure the coherence and reliability of the generated knowledge. Most notably, our method achieves SOTA results in both FKGC and multi-modal FKGC benchmarks, significantly advancing KG completion and enhancing the understanding and application of LLMs in structured knowledge generation and assessment.

List of keywords

Data Mining -> DM: Knowledge graphs and knowledge base completion
Natural Language Processing -> NLP: Applications

616

A Multi-Valued Decision Diagram-Based Approach to Constrained Optimal Path Problems over Directed Acyclic Graphs

Mingwei Zhang, Liangda Fang, Zhenhao Gu, Quanlong Guan, Yong Lai

6 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization

[+] More

[-] Less

Numerous combinatorial optimization problems can be reduced to the optimal path problem over directed acyclic graphs (DAGs). The constrained version of the optimal path problem requires the solution to satisfy a given logical constraint. BDD-constrained search (BCS) is an efficient algorithm for the constrained optimal path problem over DAGs. This algorithm considers edges as variables and constraints as Boolean functions and maintains constraints via binary decision diagrams (BDDs), a compact form of Boolean functions. However, BCS involves redundant operations during the search process. To reduce these redundant operations, we use vertices instead of edges as variables and hence represent constraints as multi-valued functions. Due to the multi-valued representation of constraints, we propose a novel algorithm, namely MDD-constrained search (MCS), by using multi-valued decision diagrams (MDDs) instead of BDDs, an efficient representation of multi-valued functions. In addition, we improve MCS via domain reduction in multi-valued functions. Experimental results prove that our proposed algorithm outperforms BCS.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Knowledge Representation and Reasoning -> KRR: Knowledge compilation

620

GenSeg: On Generating Unified Adversary for Segmentation

Yuxuan Zhang, Zhenbo Shi, Wei Yang, Shuchang Wang, Shaowei Wang, Yinxing Xue

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Great advancements in semantic, instance, and panoptic segmentation have been made in recent years, yet the top-performing models remain vulnerable to imperceptible adversarial perturbation. Current attacks on segmentation primarily focus on a single task, and these methods typically rely on iterative instance-specific strategies, resulting in limited attack transferability and low efficiency. In this paper, we propose GenSeg, a Generative paradigm that creates unified adversaries for Segmentation tasks. In particular, we propose an intermediate-level objective to enhance attack transferability, including a mutual agreement loss for feature deviation, and a prototype obfuscating loss to disrupt intra-class and inter-class relationships. Moreover, GenSeg crafts an adversary in a single forward pass, significantly boosting the attack efficiency. Besides, we unify multiple segmentation tasks to GenSeg in a novel category-and-mask view, which makes it possible to attack these segmentation tasks within this unified framework, and conduct cross-domain and cross-task attacks as well. Extensive experiments demonstrate the superiority of GenSeg in black-box attacks compared with state-of-the-art attacks. To our best knowledge, GenSeg is the first approach capable of conducting cross-domain and cross-task attacks on segmentation tasks, which are closer to real-world scenarios.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods

634

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Chenxi Qiu

6 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets, and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism’s superior scalability and efficiency.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Multidisciplinary Topics and Applications -> MTA: Security and privacy

638

Learning a Spiking Neural Network for Efficient Image Deraining

Tianyu Song, Guiyue Jin, Pengpeng Li, Kui Jiang, Xiang Chen, Jiyu Jin

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Recently, spiking neural networks (SNNs) have demonstrated substantial potential in computer vision tasks. In this paper, we present an Efficient Spiking Deraining Network, called ESDNet. Our work is motivated by the observation that rain pixel values will lead to a more pronounced intensity of spike signals in SNNs. However, directly applying deep SNNs to image deraining task still remains a significant challenge. This is attributed to the information loss and training difficulties that arise from discrete binary activation and complex spatiotemporal dynamics. To this end, we develop a spiking residual block to convert the input into spike signals, then adaptively optimize the membrane potential by introducing attention weights to adjust spike responses in a data-driven manner, alleviating information loss caused by discrete binary activation. By this way, our ESDNet can effectively detect and analyze the characteristics of rain streaks by learning their fluctuations. This also enables better guidance for the deraining process and facilitates high-quality image reconstruction. Instead of relying on the ANN-SNN conversion strategy, we introduce a gradient proxy strategy to directly train the model for overcoming the challenge of training. Experimental results show that our approach gains comparable performance against ANN-based methods while reducing energy consumption by 54%. The code source is available at https://github.com/MingTian99/ESDNet.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications
Computer Vision -> CV: Computational photography

652

Revitalizing Real Image Deraining via a Generic Paradigm towards Multiple Rainy Patterns

Xin Li, Yuxin Feng, Fan Zhou, Yun Liang, Zhuo Su

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

Synthetic data-driven methods perform well on image rain removal task, but they still face many challenges in real rainfall scenarios due to the complexity and diversity of rainy patterns. In this paper, we propose a new generic paradigm for real image deraining from the perspective of synthesizing data covering more rainy patterns and constructing image rain removal networks with strong generalization performance. Firstly, instead of simply superimposing rain layers, we integrate various rainy patterns and design a phenomenal pipeline that incorporates multiple degradation types. Secondly, we construct a Patterns-aware Rain Removal Network (PRRN), which learns from both synthetic and real data simultaneously. In addition, to eliminate the inevitable distribution differences between synthetic and real data, we design a new Multi-representation Inter-domain Alignment Module (MIAM) in PRRN. By using multiple parallel submodules, MIAM achieves alignment of data domains in multiple feature subspaces. Based on several authoritative objective evaluation metrics, we successfully validate the effectiveness and robustness of the proposed method in real scenarios through extensive experiments carried out on five challenging real datasets.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

669

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) to further improve the effectiveness and efficiency in audio SSL. The proposed EAT adopts the bootstrap self-supervised training paradigm to the audio domain. A novel Utterance-Frame Objective (UFO) is designed to enhance the modeling capability of acoustic events. Furthermore, we reveal that the masking strategy is critical in audio SSL pre-training, and superior audio representations can be obtained with large inverse block masks. Experiment results demonstrate that EAT achieves state-of-the-art (SOTA) performance on a range of audio-related tasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with a significant pre-training speedup up to ~15x compared to existing audio SSL models.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
Natural Language Processing -> NLP: Speech

670

Graph Collaborative Expert Finding with Contrastive Learning

Qiyao Peng, Wenjun Wang, Hongtao Liu, Cuiying Huo, Minglai Shao

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

In Community Question Answering (CQA) websites, most current expert finding methods often model expert embeddings from textual features and optimize them with expert-question first-order interactions, i.e., this expert has answered this question. In this paper, we try to address the limitation of current models that typically neglect the intrinsic high-order connectivity within expert-question interactions, which is pivotal for collaborative effects. We introduce an innovative and simple approach: by conceptualizing expert-question interactions as a bipartite graph, and then we propose a novel graph-based expert finding method based on contrastive learning to effectively capture both first-order and intricate high-order connectivity, named CGEF. Specifically, we employ a question encoder to model questions from titles and employ the graph attention network to recursively propagate embeddings. Besides, to alleviate the problem of sparse interactions, we devise two auxiliary tasks to enhance expert modeling. First, we generate multiple views of one expert, including: 1) behavior-level augmentation drops interaction edges randomly in the graph; 2) interest-level augmentation randomly replaces question titles with tags in the graph. Then we maximize the agreement between one expert and the corresponding augmented expert on a specific view. In this way, the model can effectively inject collaborative signals into expert modeling. Extensive experiments on six CQA datasets demonstrate significant improvements compared with recent methods.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Networks
Data Mining -> DM: Recommender systems

671

Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge

Yupei Yang, Biwei Huang, Shikui Tu, Lei Xu

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

The effectiveness of model training heavily relies on the quality of available training resources. However, budget constraints often impose limitations on data collection efforts. To tackle this challenge, we introduce causal exploration in this paper, a strategy that leverages the underlying causal knowledge for both data collection and model training. We, in particular, focus on enhancing the sample efficiency and reliability of the world model learning within the domain of task-agnostic reinforcement learning. During the exploration phase, the agent actively selects actions expected to yield causal insights most beneficial for world model training. Concurrently, the causal knowledge is acquired and incrementally refined with the ongoing collection of data. We demonstrate that causal exploration aids in learning accurate world models using fewer data and provide theoretical guarantees for its convergence. Empirical experiments, on both synthetic data and real-world applications, further validate the benefits of causal exploration. The source code is available at https://github.com/CMACH508/CausalExploration.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Active learning
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

685

Agentive Permissions in Multiagent Systems

Qi Shi

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

This paper proposes to distinguish four forms of agentive permissions in multiagent settings. The main technical results are the complexity analysis of model checking, the semantic undefinability of modalities that capture these forms of permissions through each other, and a complete logical system capturing the interplay between these modalities.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions
AI Ethics, Trust, Fairness -> ETF: Moral decision making
AI Ethics, Trust, Fairness -> ETF: AI and law, governance, regulation
AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues

694

Exploring the Inefficiency of Heavy Ball as Momentum Parameter Approaches 1

Xiaoge Deng, Tao Sun, Dongsheng Li, Xicheng Lu

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

The heavy ball momentum method is a commonly used technique for accelerating training processes in the machine learning community. However, empirical evidence suggests that the convergence of stochastic gradient descent (SGD) with heavy ball may slow down when the momentum hyperparameter approaches 1. Despite this observation, there are no established theories or solutions to explain and address this issue. In this study, we provide the first theoretical result that elucidates why momentum slows down SGD as it tends to 1. To better understand this inefficiency, we focus on the quadratic convex objective in the analysis. Our findings show that momentum accelerates SGD when the scaling parameter is not very close to 1. Conversely, when the scaling parameter approaches 1, momentum impairs SGD and degrades its stability. Based on the theoretical findings, we propose a descending warmup technique for the heavy ball momentum, which exploits the advantages of the heavy ball method and overcomes the inefficiency problem when the momentum tends to 1. Numerical results demonstrate the effectiveness of the proposed SHB-DW algorithm.

List of keywords

Machine Learning -> ML: Optimization
Machine Learning -> ML: Applications
Machine Learning -> ML: Learning theory

705

Counterfactual User Sequence Synthesis Augmented with Continuous Time Dynamic Preference Modeling for Sequential POI Recommendation

Lianyong Qi, Yuwen Liu, Weiming Liu, Shichao Pei, Xiaolong Xu, Xuyun Zhang, Yingjie Wang, Wanchun Dou

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

With the proliferation of Location-based Social Networks (LBSNs), user check-in data at Points-of-Interest (POIs) has surged, offering rich insights into user preferences. However, sequential POI recommendation systems always face two pivotal challenges. A challenge lies in the difficulty of modeling time in a discrete space, which fails to accurately capture the dynamic nature of user preferences. Another challenge is the inherent sparsity and noise in continuous POI recommendation, which hinder the recommendation process. To address these challenges, we propose counterfactual user sequence synthesis with continuous time dynamic preference modeling (CussCtpm). CussCtpm innovatively combines Gated Recurrent Unit (GRU) with neural Ordinary Differential Equations (ODEs) to model user preferences in a continuous time framework. CussCtpm captures user preferences at both the POI-level and interest-level, identifying deterministic and non-deterministic preference concepts. Particularly at the interest-level, we employ GRU and neural ODEs to model users’ dynamic preferences in continuous space, aiming to capture finer-grained shifts in user preferences over time. Furthermore, CussCtpm utilizes counterfactual data augmentation to generate counterfactual positive and negative user sequences. Our extensive experiments on two widely-used public datasets demonstrate that CussCtpm outperforms several advanced baseline models.

List of keywords

Data Mining -> DM: Recommender systems

709

Conflict-Alleviated Gradient Descent for Adaptive Object Detection

Wenxu Shi, Bochuan Zheng

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Unsupervised domain adaptive object detection (DAOD) aims to adapt detectors from a labeled source domain to an unlabelled target domain. Existing DAOD works learn feature representations with both class discriminative and domain invariant by jointly minimizing the loss across domain alignment and detection tasks. However, joint resolution of different tasks may lead to conflicts, with one contributing factor being gradient conflicts during optimization. If left untouched, such disagreement may degrade adaptation performance. In this work, we propose an efficient optimization strategy named Conflict-Alleviated Gradient descent (CAGrad) which aims to alleviate the conflict between two tasks (i.e., alignment and classification). Particularly, we alter the gradients by projecting each onto the normal plane of the other. The projection operation changes conflicting gradients from obtuse angles to acute angles, thus alleviating the conflict and achieving gradient harmonization. We further validate our theoretical analysis and methods on several domain adaptive object detection tasks, including cross-camera, weather, scene, and synthetic to real-world adaptation. Extensive experiments on multiple DAOD benchmarks demonstrate the effectiveness and superiority of our CAGrad.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Optimization
Machine Learning -> ML: Unsupervised learning

715

Invertible Residual Rescaling Models

Jinmin Li, Tao Dai, Yaohua Zha, Yilu Luo, Longfei Lu, Bin Chen, Zhi Wang, Shu-Tao Xia, Jingyun Zhang

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0.3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https://github.com/THU-Kingmin/IRRM.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications

719

Probabilistic Contrastive Learning for Domain Adaptation

Junjie Li, Yixin Zhang, Zilei Wang, Saihui Hou, Keyu Tu, Man Zhang

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Contrastive learning has shown impressive success in enhancing feature discriminability for various visual tasks in a self-supervised manner, but the standard contrastive paradigm (features+l2 normalization) has limited benefits when applied in domain adaptation. We find that this is mainly because the class weights (weights of the final fully connected layer) are ignored in the domain adaptation optimization process, which makes it difficult for features to cluster around the corresponding class weights. To solve this problem, we propose the simple but powerful Probabilistic Contrastive Learning (PCL), which moves beyond the standard paradigm by removing l2 normalization and replacing the features with probabilities. PCL can guide the probability distribution towards a one-hot configuration, thus minimizing the discrepancy between features and class weights. We conduct extensive experiments to validate the effectiveness of PCL and observe consistent performance gains on five tasks, i.e., Unsupervised/Semi-Supervised Domain Adaptation (UDA/SSDA), Semi-Supervised Learning (SSL), UDA Detection and Semantic Segmentation. Notably, for UDA Semantic Segmentation on SYNTHIA, PCL surpasses the sophisticated CPSL-D by 2% in terms of mean IoU with a much lower training cost (PCL: 1*3090, 5 days v.s. CPSL-D: 4*V100, 11 days). Code is available at https://github.com/ljjcoder/Probabilistic-Contrastive-Learning.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Recognition (object detection, categorization)

756

WSRFNet: Wavelet-Based Scale-Specific Recurrent Feedback Network for Diabetic Retinopathy Lesion Segmentation

Xuan Li, Xiangqian Wu

6 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Diabetic retinopathy lesion segmentation (DRLS) faces a challenge of significant variation in the size of different lesions. An effective method to address this challenge is to fuse multi-scale features. To boost the performance of this kind of method, most existing DRLS methods work on devising sophisticated multi-scale feature fusion modules. Differently, we focus on improving the quality of the multi-scale features to enhance the fused multi-scale feature representation. To this end, we design a Wavelet-based Scale-specific Recurrent Feedback Network (WSRFNet), which refines multi-scale features using recurrent feedback mechanism. Specifically, to avoid information loss when introducing feedback to multi-scale features, we propose a wavelet-based feedback pyramid module (WFPM), which is based on a reversible downsampling operation, i.e., Haar wavelet transform. Unlike scale-agnostic feedback used in previous feedback methods, we develop a scale-specific refinement module (SRM), which utilizes scale-specific feedback to pointedly refine features of different scales. Experimental results on IDRiD and DDR datasets show that our approach outperforms state-of-the-art models. The code is available at https://github.com/xuanli01/WSRFNet.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation

768

Generating More Audios for End-to-End Spoken Language Understanding

Xuxin Cheng, Yuexian Zou

6 min. talk | August 6th at 11:30 | Session: NLP: Dialogue and interactive systems

[+] More

[-] Less

End-to-end spoken language understanding (SLU) aims to directly capture the comprehensive semantics from the given spoken utterance without generating any transcript. Since the transcripts might not always be available, Textless SLU is attracting increasing attention, which could eliminate the need for transcripts but often does not perform as well as SLU models trained with transcripts. In this paper, we focus on the scenarios where the transcripts are not available and propose a framework GMA-SLU to generate more audios according to the labels. In order to alleviate the modality gap between text and audio, two language models are developed and discrete tokens are utilized as a bridge, where the first language model utilizes labels to generate semantic tokens and the second language model adopts these obtained semantic tokens and the acoustic tokens of source audios to generate the synthetic audios. All the experiments are conducted on the monolingual SLU dataset SLURP and the multilingual SLU dataset MINDS-14. Experimental results show that our method outperforms the previous best Textless End-to-end SLU models and can obtain the comparable performance with the models trained with the assistance of the corresponding transcripts.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems

775

A New Guaranteed Outlier Removal Method Based on Plane Constraints for Large-Scale LiDAR Point Cloud Registration

Gang Ma, Hui Wei, Runfeng Lin, Jialiang Wu

6 min. talk | August 8th at 11:30 | Session: ROB: Robotics (2/2)

[+] More

[-] Less

In this paper, we present a novel registration method based on plane constraints for large-scale LiDAR point clouds, effectively decoupling rotation estimation and translation estimation. For rotation estimation, we propose an outlier removal method that combines coarse filtering with rotation-invariant constraints and refined filtering based on computational geometric consistency checks, effectively pruning outliers and robustly estimating accurate relative rotations from plane normals. In translation estimation, we propose a component-wise method based on plane translation constraints to efficiently estimate relative translations. The robustness and effectiveness of our proposed method are empirically validated on three popular LiDAR point cloud datasets. The experimental results convincingly demonstrate that our approach achieves state-of-the-art performance.

List of keywords

Robotics -> ROB: Robotics and vision
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Scene analysis and understanding
Robotics -> ROB: Perception

778

OSIC: A New One-Stage Image Captioner Coined

Bo Wang, Zhao Zhang, Mingbo Zhao, Xiaojie Jin, Mingliang Xu, Meng Wang

6 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

Mainstream image captioning models are usually two-stage captioners, i.e., encoding the region features by a pre-trained detector and then feeding them into a language model to generate the captions. However, such a two-stage procedure will lead to a task-based information gap that decreases the performance, because the region features in the detection task are suboptimal representations and cannot provide all the necessary information for subsequent captions generation. Besides, the region features are usually represented from the last layer of the detectors that lose the local details of images. In this paper, we propose a novel One-Stage Image Captioner (OSIC) with dynamic multi-sight learning, which directly transforms the images into descriptive sentences in one stage for eliminating the information gap. Specifically, to obtain rich features, multi-level features are captured by Swin Transformer, and then fed into a novel dynamic multi-sight embedding module to exploit both the global structure and local texture of input images. To enhance the global modeling capacity of the visual encoder, we propose a new dual-dimensional refining to non-locally model the features interaction. As a result, OSIC can directly obtain rich semantic information to improve the captioner. Extensive comparisons on the benchmark MS-COCO, Flickr8K and Flickr30K datasets verified the superior performance of our method.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision, language and reasoning
Natural Language Processing -> NLP: Language generation

805

Strengthening Layer Interaction via Dynamic Layer Attention

Kaishen Wang, Xun Xia, Jian Liu, Zhang Yi, Tao He

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

In recent years, employing layer attention to enhance interaction among hierarchical layers has proven to be a significant advancement in building network structures. In this paper, we delve into the distinction between layer attention and the general attention mechanism, noting that existing layer attention methods achieve layer interaction on fixed feature maps in a static manner. These static layer attention methods limit the ability for context feature extraction among layers. To restore the dynamic context representation capability of the attention mechanism, we propose a Dynamic Layer Attention (DLA) architecture. The DLA comprises dual paths, where the forward path utilizes an improved recurrent neural network block, named Dynamic Sharing Unit (DSU), for context feature extraction. The backward path updates features using these shared context representations. Finally, the attention mechanism is applied to these dynamically refreshed feature maps among layers. Experimental results demonstrate the effectiveness of the proposed DLA architecture, outperforming other state-of-the-art methods in image recognition and object detection tasks. Additionally, the DSU block has been evaluated as an efficient plugin in the proposed DLA architecture. The code is available at https://github.com/tunantu/Dynamic-Layer-attention.

List of keywords

Machine Learning -> ML: Attention models
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Theory of deep learning

822

BeyondVision: An EMG-driven Micro Hand Gesture Recognition Based on Dynamic Segmentation

Nana Wang, Jianwei Niu, Xuefeng Liu, Dongqin Yu, Guogang Zhu, Xinghao Wu, Mingliang Xu, Hao Su

6 min. talk | August 6th at 11:30 | Session: MTA: Multidisciplinary Topics and Applications (1/2)

[+] More

[-] Less

Hand gesture recognition (HGR) plays a pivotal role in natural and intuitive human-computer interactions. Recent HGR methods focus on recognizing gestures from vision-based images or videos. However, vision-based methods are limited in recognizing micro hand gestures (MHGs) (e.g., pinch within 1cm) and gestures with occluded fingers. To address these issues, combined with the electromyography (EMG) technique, we propose BeyondVision, an EMG-driven MHG recognition system based on deep learning. BeyondVision consists of a wristband-style EMG sampling device and a tailored lightweight neural network BV-Net that can accurately translate EMG signals of MHGs to control commands in real-time. Moreover, we propose a post-processing mechanism and a weight segmentation algorithm to effectively improve the accuracy rate of MHG recognition. Subjective and objective experimental results show that our approach achieves over 95% average recognition rate, 2000Hz sampling frequency, and real-time micro gesture recognition. Our technique has been applied in a commercially available product, introduced at: https://github.com/tyc333/NoBarriers.

List of keywords

Multidisciplinary Topics and Applications -> MTA: AI hardware
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MTA: Interactive entertainment

825

Dynamically Anchored Prompting for Task-Imbalanced Continual Learning

Chenxing Hong, Yan Jin, Zhiqi Kang, Yizhou Chen, Mengke Li, Yang Lu, Hanzi Wang

6 min. talk | August 7th at 10:00 | Session: ML: Incremental learning

[+] More

[-] Less

Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4.5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https://github.com/chenxing6666/DAP.

List of keywords

Machine Learning -> ML: Incremental learning
Computer Vision -> CV: Recognition (object detection, categorization)
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Classification

883

Efficient Screen Content Image Compression via Superpixel-based Content Aggregation and Dynamic Feature Fusion

Sheng Shen, Huanjing Yue, Jingyu Yang

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

This paper addresses the challenge of efficiently compressing screen content images (SCIs) – computer generated images with unique attributes such as large uniform regions, sharp edges, and limited color palettes, which pose difficulties for conventional compression algorithms. We propose a Superpixel-based Content Aggregation Block (SCAB) to aggregate local pixels into one super-pixel and aggregate non-local information via super-pixel transformer. Such aggregation enables the dynamic assimilation of non-local information while maintaining manageable complexity. Furthermore, we enhance our channel-wise context entropy model with a Dynamic Feature Fusion (DFF) mechanism. This mechanism integrates decoded slices and side information dynamically based on their global correlation, allowing the network to dynamically learn the optimal weights for global information usage. Extensive experiments on three SCI datasets (SCID, CCT, and SIQAD) show our method’s superior RD performance and inference time, making it the first network comparable with the advanced VVC-SCC standard.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Other

889

Bandits with Concave Aggregated Reward

Yingqi Yu, Sijia Zhang, Shaoang Li, Lan Zhang, Wei Xie, Xiang-Yang Li

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Multi-armed bandit is a simple but powerful algorithmic framework, and many effective algorithms have been proposed for various online models. In numerous applications, the decision-maker faces diminishing marginal utility. With non-linear aggregations, those algorithms often have poor regret bounds. Motivated by this, we study a bandit problem with diminishing marginal utility, which we termed the bandits with concave aggregated reward(BCAR). To tackle this problem, we propose two algorithms SW-BCAR and SWUCB-BCAR. Through theoretical analysis, we establish the effectiveness of these algorithms in addressing the BCAR issue. Extensive simulations demonstrate that our algorithms achieve better results than the most advanced bandit algorithms.

List of keywords

Machine Learning -> ML: Multi-armed bandits

898

Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition

Jiahang Zhang, Lilang Lin, Jiaying Liu

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

In real-world scenarios, human actions often fall into a long-tailed distribution. It makes the existing skeleton-based action recognition works, which are mostly designed based on balanced datasets, suffer from a sharp performance degradation. Recently, many efforts have been made to image/video long-tailed learning. However, directly applying them to skeleton data can be sub-optimal due to the lack of consideration of the crucial spatial-temporal motion patterns, especially for some modality-specific methodologies such as data augmentation. To this end, considering the crucial role of the body parts in the spatially concentrated human actions, we attend to the mixing augmentations and propose a novel method, Shap-Mix, which improves long-tailed learning by mining representative motion patterns for tail categories. Specifically, we first develop an effective spatial-temporal mixing strategy for the skeleton to boost representation quality. Then, the employed saliency guidance method is presented, consisting of the saliency estimation based on Shapley value and a tail-aware mixing policy. It preserves the salient motion parts of minority classes in mixed data, explicitly establishing the relationships between crucial body structure cues and high-level semantics. Extensive experiments on three large-scale skeleton datasets show our remarkable performance improvement under both long-tailed and balanced settings. Our project is publicly available at: https://jhang2020.github.io/Projects/Shap-Mix/Shap-Mix.html.

List of keywords

Computer Vision -> CV: Action and behavior recognition

899

Class-consistent Contrastive Learning Driven Cross-dimensional Transformer for 3D Medical Image Classification

Qikui Zhu, Chuan Fu, Shuo Li

6 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Transformer emerges as an active research topic in medical image analysis. Yet, three substantial challenges limit the effectiveness of both 2D and 3D Transformers in 3D medical image classification: 1) Challenge in capturing spatial structure correlation due to the unreasonable flattening operation; 2) Challenge in burdening the high computational complexity and memory consumption due to the quadratic growth of computational complexity and memory consumption for 3D medical data; 3) Challenge in discriminative representation learning, due to data-sensitivity. To address the above challenges, a novel Cross-dimensional Transformer (CdTransformer) and a creative Class-consistent Contrastive Learning (CcCL) are proposed. Specifically, CdTransformer consists of two novel modules: 1) Cross-dimensional Attention Module (CAM), which breaks the limitation that Transformer cannot reasonably establish spatial structure correlation when meeting 3D medical data, meanwhile, reduces the computational complexity and memory consumption. 2) Inter-dimensional Feed-forward Network (IdFN), which addresses the challenge of traditional feed-forward networks not being able to learn depth dimension information that is unique to 3D medical data. CcCL innovatively takes full advantage of the inter-class and intra-class features from the slice-distorted samples to boost Transformer in learning feature representation. CdTransformer and CcCL are validated on six 3D medical image classification tasks. Extensive experimental results demonstrate that CdTransformer outperforms state-of-the-art CNNs and Transformers on 3D medical image classification, and CcCL enables significantly improving Transformer in discriminative representation learning.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Applications
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Classification

901

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications

907

MAS-SAM: Segment Any Marine Animal with Aggregated Features

Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

12 min. talk | August 6th at 11:30 | Session: ROB: Robotics (1/2)

[+] More

[-] Less

Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM’s decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM’s encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM’s encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.

List of keywords

Robotics -> ROB: Applications
Robotics -> ROB: Perception
Robotics -> ROB: Robotics and vision

932

Hierarchical Reinforcement Learning on Multi-Channel Hypergraph Neural Network for Course Recommendation

Lu Jiang, Yanan Xiao, Xinxin Zhao, Yuanbo Xu, Shuli Hu, Pengyang Wang, Minghao Yin

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

With the widespread popularity of massive open online courses, personalized course recommendation has become increasingly important due to enhancing users’ learning efficiency. While achieving promising performances, current works suffering from the vary across the users and other MOOC entities. To address this problem, we propose hierarchical reinforcement learning with a multi-channel hypergraphs neural network for course recommendation(called HHCoR). Specifically, we first construct an online course hypergraph as the environment to capture the complex relationships and historical information by considering all entities. Then, we design a multi-channel propagation mechanism to aggregate embeddings in the online course hypergraph and extract user interest through an attention layer. Besides, we employ two-level decision-making: the low-level focuses on the rating courses, while the high-level integrates these considerations to finalize the decision. Furthermore, in co-optimization, we design a joint reward function to improve the policy of two-layer agents. Finally, we conducted extensive experiments on two real-world datasets and the quantitative results have demonstrated the effectiveness of the proposed method.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining heterogenous data
Data Mining -> DM: Mining spatial and/or temporal data

936

Evolutionary Generalized Zero-Shot Learning

Dubing Chen, Chenyi Jiang, Haofeng Zhang

12 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, \ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. The codes are available at https://github.com/cdb342/EGZSL.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Vision, language and reasoning

937

Learning Spatial Similarity Distribution for Few-shot Object Counting

Yuanwu Xu, Feifan Song, Haofeng Zhang

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Recognition (object detection, categorization)

965

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Kai Zhao, Jianye Hao, Yi Ma, Jinyi Liu, Yan Zheng, Zhaopeng Meng

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Ensemble methods
Machine Learning -> ML: Offline reinforcement learning
Machine Learning -> ML: Online learning

973

Optimisation and Approximation in Abstract Argumentation: The Case of Stable Semantics

Matthias Thimm

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

We analyse two soft notions of stable extensions in abstract argumentation, one that weakens the requirement of having full range and one that weakens the requirement of conflict-freeness. We then consider optimisation problems over these two notions that represent optimisation variants of the credulous reasoning problem with stable semantics. We investigate the computational complexity of these two problems in terms of the complexity of solving the optimisation problem exactly and in terms of approximation complexity. We also present some polynomial-time approximation algorithms for these optimisation problems and investigate their approximation quality experimentally.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

980

Delve into Base-Novel Confusion: Redundancy Exploration for Few-Shot Class-Incremental Learning

Haichen Zhou, Yixiong Zou, Ruixuan Li, Yuhua Li, Kui Xiao

6 min. talk | August 7th at 10:00 | Session: ML: Incremental learning

[+] More

[-] Less

Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i.e., classifying novel-class samples into base classes.In this paper, we delve into this phenomenon to study its cause and solution. We first interpret the confusion as the collision between the novel-class and the base-class region in the feature space.Then, we find the collision is caused by the label-irrelevant redundancies within the base-class feature and pixel space. Through qualitative and quantitative experiments, we identify this redundancy as the shortcut in the base-class training, which can be decoupled to alleviate the collision. Based on this analysis, to alleviate the collision between base and novel classes, we propose a method for FSCIL named Redundancy Decoupling and Integration (RDI). RDI first decouples redundancies from base-class space to shrink the intra-base-class feature space. Then, it integrates the redundancies as a dummy class to enlarge the inter-base-class feature space. This process effectively compresses the base-class feature space, creating buffer space for novel classes and alleviating the model’s confusion between the base and novel classes. Extensive experiments across benchmark datasets, including CIFAR-100, miniImageNet, and CUB-200-2011 demonstrate that our method achieves state-of-the-art performance.

List of keywords

Machine Learning -> ML: Incremental learning
Machine Learning -> ML: Few-shot learning

993

MetaISP: Efficient RAW-to-sRGB Mappings with Merely 1M Parameters

Zigeng Chen, Chaowei Liu, Yuan Yuan, Michael Bi Mi, Xinchao Wang

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

State-of-the-art deep ISP models alleviate the dilemma of limited generalization capabilities across heterogeneous inputs by increasing the size and complexity of the network, which inevitably leads to considerable growth in parameter counts and FLOPs. To address this challenge, this paper presents MetaISP – a streamlined model that achieves superior reconstruction quality by adaptively modulating its parameters and architecture in response to diverse inputs. Our rationale revolves around obtaining corresponding spatial and channel-wise correction matrices for various inputs within distinct feature spaces, which assists in assigning optimal attention. This is achieved by predicting dynamic weights for each input image and combining these weights with multiple learnable basis matrices to construct the correction matrices. The proposed MetaISP makes it possible to obtain best performance while being computationally efficient. SOTA results are achieved on two large-scale datasets, e.g. 23.80dB PSNR on ZRR, exceeding the previous SOTA 0.19dB with only 9.2% of its parameter count and 10.6% of its FLOPs; 25.06dB PSNR on MAI21, exceeding the previous SOTA 0.17dB with only 0.9% of its parameter count and 2.7% of its FLOPs.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Applications

996

Tolerating Outliers: Gradient-Based Penalties for Byzantine Robustness and Inclusion

Latifa Errami, El Houcine Bergou

12 min. talk | August 9th at 10:00 | Session: ML: Robustness

[+] More

[-] Less

This work investigates the interplay between Robustness and Inclusion in the context of poisoning attacks targeting the convergence of Stochastic Gradient Descent (SGD). While robustness has received significant attention, the standard Byzantine defenses rely on the Independent and Identically Distributed (IID) assumption causing their performance to deteriorate on non-IID data distributions, even without any attack. This is largely due to these defenses being excessively cautious and discarding benign outliers. We introduce a penalty-based aggregation that accounts for the discrepancy between trusted clients and outliers. We propose the use of Linear Scalarization (LS) as an enhancing method to enable current defenses to simultaneously circumvent Byzantine attacks while also granting inclusion of outliers. This empowers existing defenses to not only counteract malicious adversaries effectively but also to incorporate outliers into the learning process. We conduct a theoretical analysis to demonstrate the convergence of our approach. Specifically, we establish the robustness and resilience of our method under standard assumptions. Empirical analysis further validates the viability of the proposed approach. Across mild to strong non-IID data splits, our method consistently either matches or surpasses the performance of current approaches in the literature, under state-of-the-art Byzantine attack scenarios.

List of keywords

Machine Learning -> ML: Robustness
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Machine Learning -> ML: Trustworthy machine learning

998

LSPAN: Spectrally Localized Augmentation for Graph Consistency Learning

Heng-Kai Zhang, Yi-Ge Zhang, Zhi Zhou, Yu-Feng Li

6 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

Graph-based consistency principle has been successfully applied to many semi-supervised problems in machine learning. Its performance largely depends on the quality of augmented graphs, which has been recently proven that revealing graph properties and maintaining the invariance of graphs are crucial for good performance. However, existing topology- or feature-based augmentation methods are spectrally non-localized — important spectrums are disturbed throughout the entire frequency range, and their invariance may not be well preserved. Efforts on this issue remain to be limited. This paper proposes a simple yet effective model called Localized SPectral AugmentatioN (LSPAN), which perturbs a concentrated part of graph spectrum with equivalent intensity using Fourier orthogonality, so as to enhance graph spectrum preservation as well as model prediction. Moreover, it also avoids the significant training time of inverse Fourier transform. Extensive empirical evaluation on real-world datasets clearly shows the performance gain of spectrally localized augmentation, as well as its good convergence and efficiency compared to existing graph methods.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Active learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-task and transfer learning

1018

Scaling Up Unbiased Search-based Symbolic Regression

Paul Kahlmeyer, Joachim Giesen, Michael Habeck, Henrik Voigt

12 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

In a regression task, a function is learned from labeled data to predict the labels at new data points. The goal is to achieve small prediction errors. In symbolic regression, the goal is more ambitious, namely, to learn an interpretable function that makes small prediction errors. This additional goal largely rules out the standard approach used in regression, that is, reducing the learning problem to learning parameters of an expansion of basis functions by optimization. Instead, symbolic regression methods search for a good solution in a space of symbolic expressions. To cope with the typically vast search space, most symbolic regression methods make implicit, or sometimes even explicit, assumptions about its structure. Here, we argue that the only obvious structure of the search space is that it contains small expressions, that is, expressions that can be decomposed into a few subexpressions. We show that systematically searching spaces of small expressions finds solutions that are more accurate and more robust against noise than those obtained by state-of-the-art symbolic regression methods. In particular, systematic search outperforms state-of-the-art symbolic regressors in terms of its ability to recover the true underlying symbolic expressions on established benchmark data sets.

List of keywords

Machine Learning -> ML: Symbolic methods
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Regression
Search -> S: Search and machine learning

1019

Bridging LiDAR Gaps: A Multi-LiDARs Domain Adaptation Dataset for 3D Semantic Segmentation

Shaoyang Chen, Bochun Yang, Yan Xia, Ming Cheng, Siqi Shen, Cheng Wang

12 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

We focus on the domain adaptation problem for 3D semantic segmentation, addressing the challenge of data variability in point clouds collected by different LiDARs. Existing benchmarks often mix different types of datasets, which blurs and complicates segmentation evaluations. Here, we introduce a Multi-LiDARs Domain Adaptation Segmentation (MLDAS) dataset, which contains point-wise semantic annotated point clouds captured simultaneously by a 128-beam LiDAR, a 64-beam LiDAR, a 32-beam LiDAR. We select 31,875 scans from 2 representative scenarios: campus and urban street. Furthermore, we evaluate the current 3D segmentation unsupervised domain adaptation methods on the proposed dataset and propose Hierarchical Segmentation Network with Spatial Consistency (HSSC) as a novel knowledge transfer method to mitigate the domain gap significantly using spatial-temporal consistency constraints. Extensive experiments show that HSSC greatly improves the state-of-the-art cross-domain semantic segmentation methods. Our project is available at https://sychen320.github.io/projects/MLDAS.

List of keywords

Computer Vision -> CV: 3D computer vision

1025

A De-singularity Subgradient Approach for the Extended Weber Location Problem

Zhao-Rong Lai, Xiaotian Wu, Liangda Fang, Ziliang Chen

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

The extended Weber location problem is a classical optimization problem that has inspired some new works in several machine learning scenarios recently. However, most existing algorithms may get stuck due to the singularity at the data points when the power of the cost function 1\<= q<2, such as the widely-used iterative Weiszfeld approach. In this paper, we establish a de-singularity subgradient approach for this problem. We also provide a complete proof of convergence which has fixed some incomplete statements of the proofs for some previous Weiszfeld algorithms. Moreover, we deduce a new theoretical result of superlinear convergence for the iteration sequence in a special case where the minimum point is a singular point. We conduct extensive experiments in a real-world machine learning scenario to show that the proposed approach solves the singularity problem, produces the same results as in the non-singularity cases, and shows a reasonable rate of linear convergence. The results also indicate that the q-th power case (1<q<2) is more advantageous than the 1-st power case and the 2-nd power case in some situations. Hence the de-singularity subgradient approach is beneficial to advancing both theory and practice for the extended Weber location problem.

List of keywords

Machine Learning -> ML: Optimization
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Constraint Satisfaction and Optimization -> CSO: Other

1036

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Taeho Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha

12 min. talk | August 9th at 10:00 | Session: NLP: Resources and evaluation

[+] More

[-] Less

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with an error rate of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups.

List of keywords

Natural Language Processing -> NLP: Language models
Machine Learning -> ML: Deep learning architectures

1039

Markov Constraint as Large Language Model Surrogate

Alexandre Bonlarron, Jean-Charles Régin

6 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization

[+] More

[-] Less

This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
Natural Language Processing -> NLP: Language generation

1042

Temporal Inductive Logic Reasoning over Hypergraphs

Yuan Yang, Siheng Xiong, Ali Payani, James C. Kerce, Faramarz Fekri

6 min. talk | August 7th at 11:30 | Session: KRR: Logic programming

[+] More

[-] Less

Inductive logic reasoning is a fundamental task in graph analysis, which aims to generalize patterns from data. This task has been extensively studied for traditional graph representations, such as knowledge graphs (KGs), using techniques like inductive logic programming (ILP). Existing ILP methods assume learning from KGs with static facts and binary relations. Beyond KGs, graph structures are widely present in other applications such as procedural instructions, scene graphs, and program executions. While ILP is beneficial for these applications, applying it to those graphs is nontrivial: they are more complex than KGs, which usually involve timestamps and n-ary relations, effectively a type of hypergraph with temporal events. In this work, we propose temporal inductive logic reasoning (TILR), an ILP method that reasons on temporal hypergraphs. To enable hypergraph reasoning, we introduce the multi-start random B-walk, a novel graph traversal method for hypergraphs. By combining it with a path-consistency algorithm, TILR learns logic rules by generalizing from both temporal and relational data. To address the lack of hypergraph benchmarks, we create and release two temporal hypergraph datasets: YouCook2-HG and nuScenes-HG. Experiments on these benchmarks demonstrate that TILR achieves superior reasoning capability over various strong baselines.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Data Mining -> DM: Knowledge graphs and knowledge base completion

1043

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul Goldberg, Vincent Conitzer

6 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes PPAD, PLS, Σ_2^P, ∃R, and ∃∀R.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

1062

Self-supervised Weighted Information Bottleneck for Multi-view Clustering

Zhengzheng Lou, Chaoyang Zhang, Hang Xue, Yangdong Ye, Qinglei Zhou, Shizhe Hu

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view clustering (MVC) is a long-standing topic in machine learning and data mining community, focusing on investigating and utilizing the relationships among views for final consistent data cluster structure discovery. Generally, weighted MVC is one of the popular methods working by learning and applying the view weight/importance on each view for fully exploring the complementary information across views. However, most existing weighted MVCs only consider the quality of each view, ignoring the vital role of pseudo label self-supervision information in weight learning. In this work, we propose a novel self-supervised weighted information bottleneck (SWIB) method for solving the multi-view clustering problem. It combines the weighted information from different views based on information bottleneck theory, and the view weight learning mechanism is newly designed by simultaneously taking into accounting both the quality of view-contained information and the self-supervised information on the data partition of each view. Experimental results on multi-view text, multi-feature image, multi-angle video, and multi-modal text-image dataset as well as large-scale datasets show the superiority of the SWIB method. To our knowledge, this is the first work incorporating the self-supervised learning into weighted multi-view clustering.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Unsupervised learning

1074

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu, Hua Huang

6 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3,603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose an evaluation strategy called Positional Error Variance for assessing multiple-choice questions. The strategy aims to perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs. The data and code are available at https://github.com/FlagOpen/CMMU.

List of keywords

Computer Vision -> CV: Multimodal learning
Multidisciplinary Topics and Applications -> MTA: Education

1086

IMM: An Imitative Reinforcement Learning Approach with Predictive Representation Learning for Automatic Market Making

Hui Niu, Siyuan Li, Jiahao Zheng, Zhouchi Lin, Bo An, Jian Li, Jian Guo

12 min. talk | August 7th at 10:00 | Session: MTA: Finance

[+] More

[-] Less

Market making (MM) via Reinforcement Learning (RL) has attracted significant attention in financial trading. Most existing RL-based MM methods focus on optimizing single-price level strategies which fail at frequent order cancellations and loss of queue priority. By comparison, strategies involving multiple price levels align better with actual trading scenarios. However, given the complexity that multi-price level RL strategies involve a comprehensive trading action space, the challenge of effectively training RL persists. Inspired by the effective workflow of professional human market makers, we propose Imitative Market Maker (IMM), a novel RL framework leveraging knowledge from both suboptimal signal-based experts and direct policy interactions. Our framework starts with introducing effective state and action formulations that well encode information about multiprice level orders. Furthermore, IMM integrates a representation learning unit capable of capturing both short- and long-term market trends to mitigate adverse selection risk. Subsequently, IMM designs an expert strategy based on predictive signals, and trains the agent through the integration of RL and imitation learning techniques to achieve efficient learning. Extensive experimental results on four real-world market datasets demonstrate the superiority of IMM against current RL-based MM strategies.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Finance
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Representation learning

1093

Linear-Time Optimal Deadlock Detection for Efficient Scheduling in Multi-Track Railway Networks

Hastyn Doshi, Ayush Tripathi, Keshav Agarwal, Harshad Khadilkar, Shivaram Kalyanakrishnan

6 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

The railway scheduling problem requires the computation of an operable timetable that satisfies constraints involving railway infrastructure and resource occupancy times, while minimising average delay over a set of events. Since this problem is computationally hard, practical solutions typically roll out feasible (but suboptimal) schedules one step at a time, by choosing which train to move next in every step. The choices made by such algorithms are necessarily myopic, and incur the risk of driving the system to a deadlock. To escape deadlocks, the predominant approach is to stay away from states flagged as potentially unsafe by some fast-to-compute rule R. While many choices of R guarantee deadlock avoidance, they are suboptimal in the sense of also flagging some safe states as unsafe. In this paper, we revisit the literature on process scheduling and describe a rule R0 that is (i) necessary and sufficient for deadlock detection when the network has at least two tracks in each resource (station / track section), (ii) computable in linear time, and (iii) yields lower delays when combined with existing scheduling algorithms on both synthetic and real data sets from Indian Railways.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Transportation
Planning and Scheduling -> PS: Applications
Planning and Scheduling -> PS: Markov decisions processes
Planning and Scheduling -> PS: Scheduling

1095

A Lightweight U-like Network Utilizing Neural Memory Ordinary Differential Equations for Slimming the Decoder

Quansong He, Xiaojun Yao, Jun Wu, Zhang Yi, Tao He

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

In recent years, advanced U-like networks have demonstrated remarkable performance in medical image segmentation tasks. However, their drawbacks, including excessive parameters, high computational complexity, and slow inference speed, pose challenges for practical implementation in scenarios with limited computational resources. Existing lightweight U-like networks have alleviated some problems, but they often have pre-designed structures and consist of non-detachable modules, limiting their application scenarios. In this paper, we propose three plug-and-play decoders by employing different discretization methods of the neural memory Ordinary Differential Equation (nmODE). These decoders integrate features at various levels of abstraction by processing information from skip connections and performing numerical operations on upward paths. Through experiments on the PH2, ISIC2017, and ISIC2018 datasets, we embed these decoders into different U-like networks, demonstrating their effectiveness in significantly reducing the number of parameters and computation while maintaining performance. In summary, the proposed discretized nmODE decoder is capable of reducing the number of parameters by about 20% ~ 50% and computation by up to 74%, while being adaptive to all U-like networks. Our code is available at https://github.com/nayutayuki/Lightweight-nmODE-Decoders-For-U-like-networks.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Convolutional networks

1098

Common-Individual Semantic Fusion for Multi-View Multi-Label Learning

Gengyu Lyu, Weiqi Kang, Haobo Wang, Zheng Li, Zhen Yang, Songhe Feng

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

In Multi-View Multi-Label Learning, each instance is described by several heterogeneous features and associated with multiple valid labels simultaneously. Existing methods mainly focus on leveraging feature-level view fusion to capture a common representation for multi-label classifier induction. In this paper, we take a new perspective and propose a new semantic-level fusion model named Common-Individual Semantic Fusion Multi-View Multi-Label Learning Method (CISF). Different from previous feature-level fusion model, our proposed method directly focuses on semantic-level view fusion and simultaneously take both the common semantic across different views and the individual semantic of each specific view into consideration. Specifically, we first assume each view involves some common semantic labels while owns a few exclusive semantic labels. Then, the common and exclusive semantic labels are separately forced to be consensus and diverse to excavate the consistences and complementarities among different views. Afterwards, we introduce the low-rank and sparse constraint to highlight the label co-occurrence relationship of common semantics and the view-specific expression of individual semantics. We provide theoretical guarantee for the strict convexity of our method by properly setting parameters. Extensive experiments on various data sets have verified the superiority of our method.

List of keywords

Machine Learning -> ML: Multi-label learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Weakly supervised learning

1106

Multi-Attention Based Visual-Semantic Interaction for Few-Shot Learning

Peng Zhao, Yin Wang, Wei Wang, Jie Mu, Huiting Liu, Cong Wang, Xiaochun Cao

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Few-Shot Learning (FSL) aims to train a model that can generalize to recognize new classes, with each new class having only very limited training samples. Since extracting discriminative features for new classes with few samples is challenging, existing FSL methods leverage visual and semantic prior knowledge to guide discriminative feature learning. However, for meta-learning purposes, the semantic knowledge of the query set is unavailable, so their features lack discriminability. To address this problem, we propose a novel Multi-Attention based Visual-Semantic Interaction (MAVSI) approach for FSL. Specifically, we utilize spatial and channel attention mechanisms to effectively select discriminative visual features for the support set based on its ground-truth semantics while using all the support set semantics for each query set sample. Then, a relation module with class prototypes of the support set is employed to supervise and select discriminative visual features for the query set. To further enhance the discriminability of the support set, we introduce a visual-semantic contrastive learning module to promote the similarity between visual features and their corresponding semantic features. Extensive experiments on four benchmark datasets demonstrate that our proposed MAVSI could outperform existing state-of-the-art FSL methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Meta-learning

1114

Large Language Models Are Not Strong Abstract Reasoners

Gaël Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie

6 min. talk | August 9th at 11:30 | Session: NLP: Language models

[+] More

[-] Less

Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks. We perform extensive evaluations of state-of-the-art LLMs, showing that they currently achieve very limited performance in contrast with other natural language tasks, even when applying techniques that have been shown to improve performance on other NLP tasks. We argue that guiding LLM generation to follow causal paths could help improve the generalisation and reasoning abilities of LLMs.

List of keywords

Natural Language Processing -> NLP: Language models
Machine Learning -> ML: Evaluation
Machine Learning -> ML: Robustness
Natural Language Processing -> NLP: Question answering

1120

Toward a Manifold-Preserving Temporal Graph Network in Hyperbolic Space

Viet Quan Le, Viet Cuong Ta

12 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Hyperbolic geometry provides an ideal setting to represent the scale-free or hierarchical characteristics of an input graph naturally. Utilizing hyperbolic geometry for learning dynamic graph representation has gained a growing interest in recent years. However, the majority of hyperbolic-based approaches rely on tangent spaces to perform graph operations, which could distort the structure of the dynamic graph when the graph grows over time. To avoid the distortion in tangent space, we propose a Hyperbolic Manifold-Preserving Temporal Graph Network that works directly on the hyperbolic manifold. The model includes a graph convolution module for learning the spatial dependencies, an attention architecture for capturing the temporal properties, and a gated recurrent unit for extracting the spatio-temporal relationships. By evaluating on diverse real-world dynamic graphs, our model has achieved significant improvements in link prediction and new link prediction tasks, in comparison with other baselines.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Geometric learning
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining spatial and/or temporal data

1125

Continual Compositional Zero-Shot Learning

Yang Zhang, Songhe Feng, Jiazheng Yuan

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions with the knowledge learned from seen compositions, where each composition is composed of two primitives (attribute and object). However, existing CZSL methods are designed to learn compositions from fixed primitive set, which cannot handle the continually expanding primitive set in real-world applications. In this paper, we propose a new CZSL setting, named Continual Compositional Zero-Shot Learning (CCZSL), which requires the model to recognize unseen compositions composed of learned primitive set while continually increasing the size of learned primitive set. Contextuality and catastrophic forgetting are the main issues to be addressed in this setting. Specifically, we capture similar contextuality in compositions through several learnable Super-Primitives that can modify the invariant primitive embedding to better adapt the contextuality in the corresponding composition. Then we introduce a dual knowledge distillation loss which aims at maintaining old knowledge learned from previous sessions and avoiding overfitting of new session. We design the CCZSL evaluation protocol and conduct extensive experiments on widely used benchmarks, demonstrating the superiority of our method compared to the state-of-the-art CZSL methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Incremental learning

1129

Causality-enhanced Discreted Physics-informed Neural Networks for Predicting Evolutionary Equations

Ye Li, Siqi Chen, Bin Shan, Sheng-Jun Huang

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Physics-informed neural networks (PINNs) have shown promising potential for solving partial differential equations (PDEs) using deep learning. However, PINNs face training difficulties for evolutionary PDEs, particularly for dynamical systems whose solutions exhibit multi-scale or turbulent behavior over time. The reason is that PINNs may violate the temporal causality property since all the temporal features in the PINNs loss are trained simultaneously. This paper proposes to use implicit time differencing schemes to enforce temporal causality, and use transfer learning to sequentially update the PINNs in space as surrogates for PDE solutions in different time frames. The evolving PINNs are better able to capture the varying complexities of the evolutionary equations, while only requiring minor updates between adjacent time frames. Our method is theoretically proven to be convergent if the time step is small and each PINN in different time frames is well-trained. In addition, we provide state-of-the-art (SOTA) numerical results for a variety of benchmarks for which existing PINNs formulations may fail or be inefficient. We demonstrate that the proposed method improves the accuracy of PINNs approximation for evolutionary PDEs and improves efficiency by a factor of 4–40x. The code is available at https://github.com/SiqiChen9/TL-DPINNs.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Causality
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Regression

1133

SGDCL: Semantic-Guided Dynamic Correlation Learning for Explainable Autonomous Driving

Chengtai Cao, Xinhong Chen, Jianping Wang, Qun Song, Rui Tan, Yung-Hui Li

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

By learning expressive representations, deep learning (DL) has revolutionized autonomous driving (AD). Despite significant advancements, the inherent opacity of DL models engenders public distrust, impeding their widespread adoption. For explainable autonomous driving, current studies primarily concentrate on extracting features from input scenes to predict driving actions and their corresponding explanations. However, these methods underutilize semantics and correlation information within actions and explanations (collectively called categories in this work), leading to suboptimal performance. To address this issue, we propose Semantic-Guided Dynamic Correlation Learning (SGDCL), a novel approach that effectively exploits semantic richness and dynamic interactions intrinsic to categories. SGDCL employs a semantic-guided learning module to obtain category-specific representations and a dynamic correlation learning module to adaptively capture intricate correlations among categories. Additionally, we introduce an innovative loss term to leverage fine-grained co-occurrence statistics of categories for refined regularization. We extensively evaluate SGDCL on two well-established benchmarks, demonstrating its superiority over seven state-of-the-art baselines and a large vision-language model. SGDCL significantly promotes explainable autonomous driving with up to 15.3% performance improvement and interpretable attention scores, bolstering public trust in AD.

List of keywords

Computer Vision -> CV: Interpretability and transparency
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision

1138

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling

Gongye Liu, Haoze Sun, Jiayi Li, Fei Yin, Yujiu Yang

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Diffusion models have recently demonstrated an impressive ability to address inverse problems in an unsupervised manner. While existing methods primarily focus on modifying the posterior sampling process, the potential of the forward process remains largely unexplored. In this work, we propose Shortcut Sampling for Diffusion(SSD), a novel approach for solving inverse problems in a zero-shot manner. Instead of initiating from random noise, the core concept of SSD is to find a specific transitional state that bridges the measurement image y and the restored image x. By utilizing the shortcut path of "input – transitional state – output", SSD can achieve precise restoration with fewer steps. To derive the transitional state during the forward process, we introduce Distortion Adaptive Inversion. Moreover, we apply back projection as additional consistency constraints during the generation process. Experimentally, we demonstrate SSD’s effectiveness on multiple representative IR tasks. Our method achieves competitive results with only 30 NFEs compared to state-of-the-art zero-shot methods(100 NFEs) and outperforms them with 100 NFEs in certain tasks. Code is available at https://github.com/GongyeLiu/SSD.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications

1163

Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval

Likai Tian, Zhengwei Yang, Zechao Hu, Hao Li, Yifang Yin, Zheng Wang

6 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

The rise of online shopping and social media has spurred the Video-to-Shop Retrieval (VSR) task, which involves identifying fashion items (e.g., clothing) in videos and matching them with identical products provided by stores. In real-world scenarios, human movement in dynamic video scenes can cause substantial morphological alterations of fashion items with aspects of occlusion, shifting viewpoints (parallax), and partial visibility (truncation). This results in those high-quality frames being overwhelmed by a vast of redundant ones, which makes the retrieval less effectiveness. To this end, this paper introduces a framework, named Self-supervised Fashion-aware CLIP (SF-CLIP), for effective VSR. The SF-CLIP enables the discovery of salient frames with high fashion expressiveness via generating pseudo-labels from three key aspects of fashion expressiveness to assess occlusion, parallax, and truncation. With such pseudo-labels, the ability of CLIP is expanded to facilitate the discovery of salient frames. Furthermore, to encompass the comprehensive representations among salient frames, a dual-branch graph-based fusion module is proposed to extract and integrate inter-frame features. Extensive experiments demonstrate the superiority of SF-CLIP over the state-of-the-arts.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Interpretability and transparency

1175

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Wenzhen Yue, Xianghua Ying, Ruohao Guo, DongDong Chen, Ji Shi, Bowei Xing, Yuqing Zhu, Taiyan Chen

12 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable mapping function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Time series and data streams

1177

Meta-Learning via PAC-Bayesian with Data-Dependent Prior: Generalization Bounds from Local Entropy

Shiyu Liu, Wei Shi, Zenglin Xu, Shaogao Lv, Yehong Zhang, Hui Wang

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

Meta-learning accelerates the learning process on unseen learning tasks by acquiring prior knowledge through previous related tasks. The PAC-Bayesian theory provides a theoretical framework to analyze the generalization of meta-learning to unseen tasks. However, previous works still encounter two notable limitations: (1) they merely focus on the data-free priors, which often result in inappropriate regularization and loose generalization bounds; (2) more importantly, their optimization process usually involves nested optimization problems, incurring significant computational costs. To address these issues, we derive new generalization bounds and introduce a novel PAC-Bayesian framework for meta-learning that integrates data-dependent priors. This framework enables the extraction of optimal posteriors for each task in closed form, thereby allowing us to minimize generalization bounds incorporated data-dependent priors with only a simple local entropy. The resulting algorithm, which employs SGLD for sampling from the optimal posteriors, is stable, efficient, and computationally lightweight, eliminating the need for nested optimization. Extensive experimental results demonstrate that our proposed method outperforms the other baselines.

List of keywords

Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Meta-learning

1185

DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition

Rui Yan, Hongyu Qu, Xiangbo Shu, Wenbin Li, Jinhui Tang, Tieniu Tan

6 min. talk | August 6th at 11:30 | Session: CV: Video analysis and understanding

[+] More

[-] Less

Finetuning the large vision-language models on video data with a set of learnable prompts has shown promising performance on zero-shot activity recognition but still requires extra video data and expensive training costs. Inspired by recent Test-time Prompt Tuning (TPT) on the image domain, this work attempts to extend TPT to video data for zero-shot activity recognition. However, monotonous spatial augmentation and short class names cannot meet the need to capture diverse and complicated semantics of human behavior during prompt tuning. To this end, this work proposes a Dual Temporal-Sync Test-time Prompt Tuning (DTS-TPT) framework for zero-shot activity recognition. DTS-TPT tunes the learnable prompts appended to text inputs on video feature sequences of different temporal scales in multiple steps during test time. In each tuning step, we minimize the semantic consistency among the predictions from video feature sequences randomly augmented via AugMix with both original class names and the corresponding description generated through LLM. Compared with the state-of-the-art methods, the proposed method improves the zero-shot top-1 accuracy by approximately 2% ~ 5% on popular benchmarks. The code is available at https://github.com/quhongyu/DTS-TPT.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1194

On the Effects of Fairness to Adversarial Vulnerability

Cuong Tran, Keyu Zhu, Pascal Van Hentenryck, Ferdinando Fioretto

6 min. talk | August 8th at 11:30 | Session: ETF: AI Ethics, Trust, Fairness (2/2)

[+] More

[-] Less

Fairness and robustness are two important notions of learning models. Fairness ensures that models do not disproportionately harm (or benefit) some groups over others, while robustness measures the models’ resilience against small input perturbations. While equally important properties, this paper illustrates a dichotomy between fairness and robustness, and analyzes when striving for fairness decreases the model robustness to adversarial samples. The reported analysis sheds light on the factors causing such contrasting behavior, suggesting that distance to the decision boundary across groups as a key factor. Experiments on non-linear models and different architectures validate the theoretical findings. In addition to the theoretical analysis, the paper also proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems

1201

Hybrid Frequency Modulation Network for Image Restoration

Yuning Cui, Mingyu Liu, Wenqi Ren, Alois Knoll

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Image restoration involves recovering a high-quality image from its corrupted counterpart. This paper presents an effective and efficient framework for image restoration, termed CSNet, based on “channel + spatial" hybrid frequency modulation. Different feature channels include different degradation patterns and degrees, however, most current networks ignore the importance of channel interactions. To alleviate this issue, we propose a frequency-based channel feature modulation module to facilitate channel interactions through the channel-dimension Fourier transform. Furthermore, based on our observations, we develop a multi-scale frequency-based spatial feature modulation module to refine the direct-current component of features using extremely lightweight learnable parameters. This module contains a densely connected coarse-to-fine learning paradigm for enhancing multi-scale representation learning. In addition, we introduce a frequency-inspired loss function to achieve omni-frequency learning. Extensive experiments on nine datasets demonstrate that the proposed network achieves state-of-the-art performance for three image restoration tasks, including image dehazing, image defocus deblurring, and image desnowing. The code and models are available at https://github.com/c-yn/CSNet.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Representation learning

1227

Continual Multi-Objective Reinforcement Learning via Reward Model Rehearsal

Lihe Li, Ruotong Chen, Ziqian Zhang, Zhichao Wu, Yi-Chen Li, Cong Guan, Yang Yu, Lei Yuan

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Multi-objective reinforcement learning (MORL) approaches address real-world problems with multiple objectives by learning policies maximizing returns weighted by different user preferences. Typical methods assume the objectives remain unchanged throughout the agent’s lifetime. However, in some real-world situations, the agent may encounter dynamically changing learning objectives, i.e., different vector-valued reward functions at different learning stages. This issue has not been considered in problem formulation or algorithm design. To address this issue, we formalize the setting as a continual MORL (CMORL) problem for the first time, accounting for the evolution of objectives throughout the learning process. Subsequently, we propose Continual Multi-Objective Reinforcement Learning via Reward Model Rehearsal (CORe3), incorporating a dynamic agent network for rapid adaptation to new objectives. Moreover, we develop a reward model rehearsal technique to recover the reward signals for previous objectives, thus alleviating catastrophic forgetting. Experiments on four CMORL benchmarks showcase that CORe3 effectively learns policies satisfying different preferences on all encountered objectives, and outperforms the best baseline by 171%, highlighting the capability of CORe3 to handle situations with evolving objectives.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Incremental learning
Machine Learning -> ML: Optimization

1230

DCDet: Dynamic Cross-based 3D Object Detector

Shuai Liu, Boyang Li, Zhiyu Fang, Kai Huang

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

Recently, significant progress has been made in the research of 3D object detection. However, most prior studies have focused on the utilization of center-based or anchor-based label assignment schemes. Alternative label assignment strategies remain unexplored in 3D object detection. We find that the center-based label assignment often fails to generate sufficient positive samples for training, while the anchor-based label assignment tends to encounter an imbalanced issue when handling objects with different scales. To solve these issues, we introduce a dynamic cross label assignment (DCLA) scheme, which dynamically assigns positive samples for each object from a cross-shaped region, thus providing sufficient and balanced positive samples for training. Furthermore, to address the challenge of accurately regressing objects with varying scales, we put forth a rotation-weighted Intersection over Union (RWIoU) metric to replace the widely used L1 metric in regression loss. Extensive experiments demonstrate the generality and effectiveness of our DCLA and RWIoU-based regression loss. The Code is available at https://github.com/Say2L/DCDet.git.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)

1231

Sparse Multi-Relational Graph Convolutional Network for Multi-type Object Trajectory Prediction

Jianhui Zhang, Jun Yao, Liqi Yan, Yanhong Xu, Zheng Wang

6 min. talk | August 6th at 11:30 | Session: CV: Video analysis and understanding

[+] More

[-] Less

Object trajectory prediction is a hot research issue with wide applications in video surveillance and autonomous driving. The previous studies consider the interaction sparsity mainly among the pedestrians instead of multi-type of objects, which brings new types of interactions and consequently superfluous ones. This paper proposes a Multi-type Object Trajectory Prediction (MOTP) method with a Sparse Multi-relational Graph Convolutional Network (SMGCN) and a novel multi-round Global Temporal Aggregation (GTA). MOTP introduces a novel adaptive sparsification and multi-scale division method to model interactions among multitype of objects. It further incorporates a Sparse Multi-relational Temporal Graph to capture the temporal division of multi-type trajectories, along with a multi-round Global Temporal Aggregation (GTA) mechanism to mitigate error accumulation, and enhances the trajectory prediction accuracy. The extensive evaluation on the ETH, UCY and SDD datasets shows that our method outperforms the typical state-of-the-art works by significant margins. Codes will be available in https://github.com/ sounio/SMGCN.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Action and behavior recognition

1239

DenseKoopman: A Plug-and-Play Framework for Dense Pedestrian Trajectory Prediction

Xianbang Li, Yilong Ren, Han Jiang, Haiyang Yu, Yanlei Cui, Liang Xu

6 min. talk | August 8th at 10:00 | Session: CV: Motion and tracking

[+] More

[-] Less

Pedestrian trajectory prediction has emerged as a core component of human-robot interaction and autonomous driving. Fast and accurate prediction of surrounding pedestrians contributes to making decisions and improves safety and efficiency. However, pedestrians’ future trajectories will interact with their surrounding traffic participants. As the density of pedestrians increases, the complexity of such interactions also increases significantly, leading to an inevitable decrease in the accuracy of pedestrian trajectory prediction. To address this issue, we propose DenseKoopman, a plug-and-play framework for dense pedestrian trajectory prediction. Specifically, we introduce the Koopman operator theory to find an embedding space for a global linear approximation of a nonlinear pedestrian motion system. By encoding historical trajectories as linear state embeddings in the Koopman space, we transforms nonlinear trajectory data for pedestrians in dense scenes. This linearized representation greatly reduces the complexity of dense pedestrian trajectory prediction. Extensive experiments on pedestrian trajectory prediction benchmarks demonstrate the superiority of the proposed framework. We also conducted an analysis of the data transformation to explore how our DenseKoopman framework works with each validation method and uncovers motion patterns that may be hidden within the trajectory data. Code is available at https://github.com/lixianbang/DenseKoopman.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Other

1240

Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling

Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, Rizen Guo

12 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into semantic high-frequency that adheres to a Boundary distribution and non-semantic high-frequency counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution. Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by 4.4 dB and the SSIM by 0.1 on average over GRAIN, utilizing only 74% of the parameters and 20% of the computation. The code will be available at https://github.com/THU-Kingmin/BAFlow.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Applications

1256

PointTFA: Training-Free Clustering Adaption for Large 3D Point Cloud Models

Jinmeng Wu, Chong Cao, Hao Zhang, Basura Fernando, Yanbin Hao, Hanyu Hong

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

The success of contrastive learning models like CLIP, known for aligning 2D image-text pairs, has inspired the development of triplet alignment for Large 3D Point Cloud Models (3D-PCM). Examples like ULIP integrate images, text, and point clouds into a unified semantic space. However, despite showing impressive zero-shot capabilities, frozen 3D-PCM still falls short compared to fine-tuned methods, especially when downstream 3D datasets are significantly different from upstream data. Addressing this, we propose a Data-Efficient, Training-Free 3D Adaptation method named PointTFA that adjusts ULIP outputs with representative samples. PointTFA comprises the Representative Memory Cache (RMC) for selecting a representative support set, Cloud Query Refactor (CQR) for reconstructing a query cloud using the support set, and Training-Free 3D Adapter (3D-TFA) for inferring query categories from the support set. A key advantage of PointTFA is that it introduces no extra training parameters, yet outperforms vanilla frozen ULIP, closely approaching few-shot fine-tuning training methods in downstream cloud classification tasks like ModelNet10 & 40 and ScanObjectNN. The code is available at: https://github.com/CaoChong-git/PointTFA.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)

1258

R2V-MIF: Rule-to-Vector Contrastive Learning and Multi-channel Information Fusion for Therapy Recommendation

Nengjun Zhu, Jieyun Huang, Jian Cao, Liang Hu, Zixuan Yuan, Huanjing Gao

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

Integrating data-driven and rule-based approaches is crucial for therapy recommendations since they can collaborate to achieve better performance. Medical rules, which are chains of reasoning that can infer therapies, widely exist. However, their symbolic and logical forms make integrating them with data-driven modeling technologies hard. Although rare attempts have indirectly modeled rules using data that supports them, the poor generalization of medical rules leads to inadequate supporting data and thus impairs the benefit of medical rules. To this end, we propose R2V-MIF, which fills the gap by rule-to-vector contrastive learning (R2V) and multi-channel information fusion (MIF). R2V is a data-free module and utilizes a hypergraph, including condition and result nodes, to instantiate the logic of medical rules. Each rule is reflected in the relations between nodes, and their representations are determined through contrastive learning. By taking rule representations as a bridge, MIF integrates the knowledge from medical rules, similar neighbors, and patient contents, and then recommends therapies. Extensive experiments show that R2V-MIF outperforms the baselines in several metrics using real-world medical data. Our code is available at https://github.com/vgeek-z/r2vmif.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Mining heterogenous data
Multidisciplinary Topics and Applications -> MTA: Health and medicine

1301

Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents

Timothy S. Lyon, Jonas Karge

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving

1315

UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation

Qingdong He, Jinlong Peng, Zhengkai Jiang, Kai Wu, Xiaozhong Ji, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Mingang Chen, Yunsheng Wu

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

3D open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space. However, existing works not only fail to fully utilize all the available modal information in the 3D domain but also lack sufficient granularity in representing the features of each modality. In this paper, we propose a unified multimodal 3D open-vocabulary scene understanding network, namely UniM-OV3D, aligning point clouds with image, language and depth. To better integrate global and local features of the point clouds, we design a hierarchical point cloud feature extraction module that learns fine-grained feature representations. Further, to facilitate the learning of coarse-to-fine point-semantic representations from captions, we propose the utilization of hierarchical 3D caption pairs, capitalizing on geometric constraints across various viewpoints of 3D scenes. Extensive experimental results have demonstrated the effectiveness and superiority of our method in open-vocabulary semantic and instance segmentation, which achieves state-of-the-art performance on both indoor and outdoor benchmarks such as ScanNet, ScanNet200, S3IDS and nuScenes. Code is available at https://github.com/hithqd/UniM-OV3D.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Scene analysis and understanding

1323

KTCN: Enhancing Open-World Object Detection with Knowledge Transfer and Class-Awareness Neutralization

Xing Xi, Yangyang Huang, Jinhao Lin, Ronghua Luo

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

Open-World Object Detection (OWOD) has garnered widespread attention due to its ability to recall unannotated objects. Existing works generate pseudo-labels for the model using heuristic priors, which limits the model’s performance. In this paper, we leverage the knowledge of the large-scale visual model to provide supervision for unknown categories. Specifically, we use the Segment Anything Model (SAM) to generate raw pseudo-labels for potential objects and refine them through Intersection over Union (IOU) and the shortest bounding box side length. Nevertheless, the abundance of pseudo-labels still exacerbates the competition issue in the one-to-many label assignment. To address this, we propose the Dual Matching Label Assignment (DMLA) strategy. Furthermore, we propose the Class-Awareness Neutralizer (CAN) to reduce the model’s bias towards known categories. Evaluation results on open-world object detection benchmarks, including MS COCO and Pascal VOC, show that our method achieves nearly 200% the unknown recall rate of previous state-of-the-art (SOTA) methods, reaching 41.5 U-Recall. Additionally, our approach does not add any extra parameters, maintaining the inference speed advantage of Faster R-CNN, leading the SOTA methods based on deformable DETR at a speed of over 10 FPS. Our code is available at https://github.com/xxyzll/KTCN.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)

1351

Ansatz-Agnostic Exponential Resource Saving in Variational Quantum Algorithms Using Shallow Shadows

Afrad Basheer, Yuan Feng, Christopher Ferrie, Sanjiang Li

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Variational Quantum Algorithms (VQA) have been identified as a promising candidate for the demonstration of near-term quantum advantage in solving optimization tasks in chemical simulation, quantum information, and machine learning. The standard model of training requires a significant amount of quantum resources, which led researchers to use classical shadows to devise an alternative that consumes exponentially fewer quantum resources. However, the approach only works when the observables are local and the ansatz is the shallow Alternating Layered Ansatz (ALA), thus severely limiting its potential in solving problems such as quantum state preparation, where the ideal state might not be approximable with an ALA. In this work, we present a protocol based on shallow shadows that achieves similar levels of savings for almost any shallow ansatz studied in the literature, when combined with observables of low Frobenius norm. We show that two important applications in quantum information for which VQAs can be a powerful option, namely variational quantum state preparation and variational quantum circuit synthesis, are compatible with our protocol. We also experimentally demonstrate orders of magnitude improvement in comparison to the standard VQA model.

List of keywords

Machine Learning -> ML: Other
Machine Learning -> ML: Optimization

1358

Span-based Unified Named Entity Recognition Framework via Contrastive Learning

Hongli Mao, Xian-Ling Mao, Hanlin Tang, Yu-Ming Shang, Xiaoyan Gao, Ao-Jie Ma, Heyan Huang

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

Traditional Named Entity Recognition (NER) models are typically designed for domain-specific datasets and limited to fixed predefined types, resulting in difficulty generalizing to new domains. Recently, prompt-based generative methods attempt to mitigate this constraint by training models jointly on diverse datasets and extract specified entities via prompt instructions. However, due to autoregressive structure, these methods cannot directly model entity span and suffer from slow sequential decoding. To address these issues, we propose a novel Span-based Unified NER framework via contrastive learning (SUNER), which aligns text span and entity type representations in a shared semantic space to extract entities in parallel. Specifically, we first extract mention spans without considering entity types to better generalize across datasets. Then, by leveraging the power of contrastive learning and well-designed entity marker structure, we map candidate spans and their textual type descriptions into the same vector representation space to differentiate entities across domains. Extensive experiments on both supervised and zero/few-shot settings demonstrate that proposed SUNER model achieves better performance and higher efficiency than previous state-of-the-art unified NER models.

List of keywords

Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Information extraction

1366

Hundredfold Accelerating for Pathological Images Diagnosis and Prognosis through Self-reform Critical Region Focusing

Xiaotian Yu, Haoming Luo, Jiacong Hu, Xiuming Zhang, Yuexuan Wang, Wenjie Liang, Yijun Bei, Mingli Song, Zunlei Feng

12 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Pathological slides are commonly gigapixel images with abundant information and are therefore significant for clinical diagnosis. However, the ultra-large size makes both training and evaluation extremely time-consuming. Most existing methods need to crop the slide into patches, which also leads to large memory requirements. In this paper, we propose the Self-reform Multilayer Transformer (SMT) to accelerate the pathological image diagnosis and prognosis. Inspired by the pathologists’ diagnostic procedure, SMT is designed to achieve layer-by-layer focus on critical regions. In the forward process, the first layer takes thumbnails as inputs and measures the significance of each patch that deserves focusing. Images from focused regions are cropped with a higher magnification and used as the input of the next layer. By analogy, the third layer inputs are focused images of second layer, which contain abundant cellular features. In addition to the forward focusing, the backward reform strategy is proposed to improve the precision of former layers. This cyclic process achieves iterative interactions for better performance on both classification and focusing. In this way, only a small part of critical patches are required in SMT for diagnosis and prognosis. Sufficient experiments demonstrate that SMT achieves hundreds times faster speed, while achieving comparable accuracy and less storage compared with existing SOTA methods.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Classification

1367

Optimal Graph Learning and Nuclear Norm Maximization for Deep Cross-Domain Robust Label Propagation

Wei Wang, Hanyang Li, Ke Shi, Chao Huang, Yang Cao, Cong Wang, Xiaochun Cao

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Domain adaptation aims to achieve label transfer from a labeled source domain to an unlabeled target domain, where the two domains exhibit different distributions. Existing methods primarily concentrate on designing a feature extractor to learn better domain-invariant features, along with developing an effective classifier for reliable predictions. In this paper, we introduce optimal graph learning to generate a cross-domain graph that effectively connects the two domains, and two domain-specific graphs to capture domain-specific structures. On the one hand, we incorporate the three graphs into the label propagation (LP) classifier to enhance its robustness to distribution difference. On the other hand, we leverage the three graphs to introduce graph embedding losses, promoting the learning of locally discriminative and domain-invariant features. Furthermore, we maximize the nuclear norm of predictions in LP to enhance class diversity, thereby improving its robustness to class imbalance problem. Correspondingly, we develop an efficient algorithm to solve the associated optimization problem. Finally, we integrate the proposed LP and graph embedding losses into a deep neural network, resulting in our proposed deep cross-domain robust LP. Extensive experiments conducted on three cross-domain benchmark datasets demonstrate that our proposed approach could outperform existing state-of-the-art domain adaptation methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction

1377

DifTraj: Diffusion Inspired by Intrinsic Intention and Extrinsic Interaction for Multi-Modal Trajectory Prediction

Yanghong Liu, Xingping Dong, Yutian Lin, Mang Ye

6 min. talk | August 8th at 10:00 | Session: CV: Motion and tracking

[+] More

[-] Less

Recent years have witnessed the success of generative adversarial networks and diffusion models in multi-model trajectory prediction. However, prevailing algorithms only explicitly consider human interaction, but ignore the modeling of human intention, yielding that the generated results deviate largely from real trajectories in some complex scenes. In this paper, we analyze the conditions of multi-modal trajectory prediction from two objective perspectives and propose a novel end-to-end framework based on the diffusion model to predict more precise and socially-acceptable trajectories for humans. Firstly, a spatial-temporal aggregation module is built to extract the extrinsic interaction features for capturing socially-acceptable behaviors. Secondly, we explicitly construct the intrinsic intention module to obtain intention features for precise prediction. Finally, we estimate a noise trajectory distribution with these two features as the initiation of diffusion model and leverage denoising process to obtain the final trajectories. Furthermore, to reduce the noise of the initiative trajectory estimation, we present a novel sample consistency loss to constrain multiple predictions. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on ETH-UCY and SDD benchmarks, specifically achieving 19.0%/24.2% ADE/FDE improvement on ETH-UCY.

List of keywords

Computer Vision -> CV: Motion and tracking
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Time series and data streams

1383

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces

Libing Yang, Yang Li, Long Chen

6 min. talk | August 6th at 11:30 | Session: ROB: Robotics (1/2)

[+] More

[-] Less

Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pre-training stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment’s surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods. Our project is available at https://vpx-ecnu.github.io/ClothPPO-website/.

List of keywords

Robotics -> ROB: Learning in robotics
Robotics -> ROB: Manipulation
Robotics -> ROB: Perception
Machine Learning -> ML: Reinforcement learning

1385

Rethinking Centered Kernel Alignment in Knowledge Distillation

Zikai Zhou, Yunhang Shen, Shitong Shao, Linrui Gong, Shaohui Lin

6 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

Knowledge distillation has emerged as a highly effective method for bridging the representation discrepancy between large-scale models and lightweight models. Prevalent approaches involve leveraging appropriate metrics to minimize the divergence or distance between the knowledge extracted from the teacher model and the knowledge learned by the student model. Centered Kernel Alignment (CKA) is widely used to measure representation similarity and has been applied in several knowledge distillation methods. However, these methods are complex and fail to uncover the essence of CKA, thus not answering the question of how to use CKA to achieve simple and effective distillation properly. This paper first provides a theoretical perspective to illustrate the effectiveness of CKA, which decouples CKA to the upper bound of Maximum Mean Discrepancy (MMD) and a constant term. Drawing from this, we propose a novel Relation-Centered Kernel Alignment (RCKA) framework, which practically establishes a connection between CKA and MMD. Furthermore, we dynamically customize the application of CKA based on the characteristics of each task, with less computational source yet comparable performance than the previous methods. The extensive experiments on the CIFAR-100, ImageNet-1k, and MS-COCO demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs for image classification and object detection, validating the effectiveness of our approaches. Our code is available in https://github.com/Klayand/PCKA.

List of keywords

Machine Learning -> ML: Deep learning architectures
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Representation learning

1399

Massively Parallel Single-Source SimRanks in O(log N) Rounds

Siqiang Luo, Zulun Zhu

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data mining and machine learning tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node u and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems. Existing distributed SimRank algorithms enforce either Ω(log n) communication round complexity or Ω(n) machine space for a graph of n nodes. We overcome this barrier. Particularly, given a graph of n nodes, for any query node v and constant error ϵ>3/n, we show that using O(log² log n) rounds of communication among machines is enough to compute single-source SimRank values with at most ϵ absolute errors, while each machine only needs a space sub-linear to n. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the Θ(log n) round complexity barrier with provable result accuracy.

List of keywords

Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
Data Mining -> DM: Theoretical foundations of data mining

1423

Truth Table Net: Scalable, Compact & Verifiable Neural Networks with a Dual Convolutional Small Boolean Circuit Networks Form

Adrien Benamira, Thomas Peyrin, Trevor Yap, Tristan Guérand, Bryan Hooi

12 min. talk | August 8th at 11:30 | Session: MAS: Formal verification, validation and synthesis

[+] More

[-] Less

We introduce "Truth Table net"’ (TTnet), a novel Deep Neural Network (DNN) architecture designed to provide excellent scalability/compactness trade-offs among DNNs, allowing in turn to tackle the DNN challenge of fast formal verification. TTnet is constructed using Learning Truth Table (LTT) filters, analogous to how a Deep Convolutional Neural Network (DCNN) is built upon convolutional filters. The differentiable LTT filters are unique by their dual form: they are both a neural network-based function and a small-sized truth table that can be computed within a practical time frame. This characteristic guarantees, by design and independently of the overall architecture, the ability to practically extract an efficient (in terms of the number of logical gates) and functionally equivalent Conjunctive Normal Form (CNF) Boolean logic gate implementation. This CNF circuit is even optimal when the LTT truth table’s input bit size n < 12. In particular, TTnet architecture is the first differentiable DNN with as dual form a compact logic gate representation that can scale to datasets larger than CIFAR-10: we achieve an accuracy of 41% on the ImageNet dataset while ensuring that each LTT filter truth table is fully computable within 2^{16} operations. We further compare the compactness and scalability performances of TTnet Boolean logic circuit representation to state-of-the-art differentiable logic DNNs across tabular, MNIST, and CIFAR-10 datasets. We emphasize that TTnet is the first solution to the open problem of designing differentiable convolutional neural networks with an exact dual logic gate circuit representation, bridging the gap between symbolic AI and trainable DCNNs. Finally, as improving DNNs compactness in Boolean logic circuit form reduces the complexity of their formal verification, we demonstrate TTnet effectiveness in exact sound and complete formal verification. Notably, our model achieves robustness verification in 10ms vs 100s for traditional state-of-the-art DNNs solvers.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Machine Learning -> ML: Convolutional networks

1425

SemanticMask: A Contrastive View Design for Anomaly Detection in Tabular Data

Shuting Tao, Tongtian Zhu, Hongwei Wang, Xiangming Meng

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

Contrastive learning based on data augmentation techniques has recently achieved substantial advancement in learning a representation well-suited for anomaly detection in image domain. However, due to the lack of spatial structure, designing effective data augmentation methods for tabular data remains challenging. Conventional techniques, such as random mask, disregard the inter-feature correlations and fail to accurately represent the data. To address this issue, we propose a novel augmentation technique called SemanticMask which leverages the semantic information from column names to generate better augmented views. SemanticMask aims to ensure that the shared information between views contains sufficient information for anomaly detection without redundancy. We analyze the relationship between shared information and anomaly detection performance and empirically demonstrate that good views for tabular anomaly detection tasks are feature-dependent. Our experiment results validate the superiority of SemanticMask over the state-of-the-art anomaly detection methods and existing augmentation techniques for tabular data. In further evaluations of the multi-class novelty detection task, SemanticMask also significantly outperforms the baseline.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Unsupervised learning

1427

Proximal Curriculum with Task Correlations for Deep Reinforcement Learning

Georgios Tzannetos, Parameswaran Kamalaruban, Adish Singla

12 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Curriculum design for reinforcement learning (RL) can speed up an agent’s learning process and help it learn to perform well on complex tasks. However, existing techniques typically require domain-specific hyperparameter tuning, involve expensive optimization procedures for task selection, or are suitable only for specific learning objectives. In this work, we consider curriculum design in contextual multi-task settings where the agent’s final performance is measured w.r.t. a target distribution over complex tasks. We base our curriculum design on the Zone of Proximal Development concept, which has proven to be effective in accelerating the learning process of RL agents for uniform distribution over all tasks. We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent’s learning toward the target distribution via leveraging task correlations. We theoretically justify the task selection strategy of ProCuRL-Target by analyzing a simple learning setting with REINFORCE learner model. Our experimental results across various domains with challenging target task distributions affirm the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.

List of keywords

Machine Learning -> ML: Reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes

1455

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Zun Li, Michael P. Wellman

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods. We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

1461

Unified Physical-Digital Face Attack Detection

Hao Fang, Ajian Liu, Haocheng Yuan, Junze Zheng, Dingheng Zeng, Yanhong Liu, Jiankang Deng, Sergio Escalera, Xiaoming Liu, Jun Wan, Zhen Lei

6 min. talk | August 7th at 11:30 | Session: CV: Biometrics, face, gesture and pose recognition

[+] More

[-] Less

Face Recognition (FR) systems can suffer from physical (i.e., print photo) and digital (i.e., DeepFake) attacks. However, previous related work rarely considers both situations at the same time. This implies the deployment of multiple models and thus more computational burden. The main reasons for this lack of an integrated model are caused by two factors: (1) The lack of a dataset including both physical and digital attacks which the same ID covers the real face and all attack types; (2) Given the large intra-class variance between these two attacks, it is difficult to learn a compact feature space to detect both attacks simultaneously. To address these issues, we collect a Unified physical-digital Attack dataset, called UniAttackData. The dataset consists of 1,800 participations of 2 and 12 physical and digital attacks, respectively, resulting in a total of 28,706 videos. Then, we propose a Unified Attack Detection framework based on Vision-Language Models (VLMs), namely UniAttackDetection, which includes three main modules: the Teacher-Student Prompts (TSP) module, focused on acquiring unified and specific knowledge respectively; the Unified Knowledge Mining (UKM) module, designed to capture a comprehensive feature space; and the Sample-Level Prompt Interaction (SLPI) module, aimed at grasping sample-level semantics. These three modules seamlessly form a robust unified attack detection framework. Extensive experiments on UniAttackData and three other datasets demonstrate the superiority of our approach for unified face attack detection. Dataset link: https://sites.google.com/view/face-anti-spoofing-challenge/dataset-download/uniattackdatacvpr2024

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Machine Learning -> ML: Multi-modal learning

1500

Memorizing Documents with Guidance in Large Language Models

Bumjin Park, Jaesik Choi

6 min. talk | August 9th at 11:30 | Session: NLP: Language models

[+] More

[-] Less

Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document representations to memory entries, which softly mask memories in the forward process of LLMs. Additionally, we propose document guidance loss, which increases the likelihood of text with document memories and reduces the likelihood of the text with the memories of other documents. Experimental results on Wikitext-103-v1 with Pythia-1B show that the proposed methods provide different memory entries for documents and high recall of document-related content in generation with trained document-wise memories.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Embeddings
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Language generation

1505

Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors

Guozheng Li, Peng Wang, Jiajun Liu, Yikai Guo, Ke Ji, Ziyu Shang, Zijie Xu

6 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Relation extraction (RE) is an important task that aims to identify the relationships between entities in texts. While large language models (LLMs) have revealed remarkable in-context learning (ICL) capability for general zero and few-shot learning, recent studies indicate that current LLMs still struggle with zero and few-shot RE. Previous studies are mainly dedicated to design prompt formats and select good examples for improving ICL-based RE. Although both factors are vital for ICL, if one can fundamentally boost the ICL capability of LLMs in RE, the zero and few-shot RE performance via ICL would be significantly improved. To this end, we introduce Micre (Meta In-Context learning of LLMs for Relation Extraction), a new meta-training framework for zero and few-shot RE where an LLM is tuned to do ICL on a diverse collection of RE datasets (i.e., learning to learn in context for RE). Through meta-training, the model becomes more effectively to learn a new RE task in context by conditioning on a few training examples with no parameter updates or task-specific templates at inference time, enabling better zero and few-shot task generalization. We experiment Micre on various LLMs with different model scales and 12 public RE datasets, and then evaluate it on unseen RE benchmarks under zero and few-shot settings. Micre delivers comparable or superior performance compared to a range of baselines including supervised fine-tuning and typical in-context learning methods. We find that the gains are particular significant for larger model scales, and using a diverse set of the meta-training RE datasets is key to improvements. Empirically, we show that Micre can transfer the relation semantic knowledge via relation label name during inference on target RE datasets.

List of keywords

Natural Language Processing -> NLP: Information extraction

1525

Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization

Rui Kong, Chenyang Wu, Chen-Xiao Gao, Zongzhang Zhang, Ming Li

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

In offline Reinforcement Learning (RL), the pre-trained policies are utilized for initialization and subsequent online fine-tuning. However, existing methods suffer from instability and low sample efficiency compared to pure online learning. This paper identifies these limitations stemming from direct policy initialization using offline-trained policy models. We propose Continual Policy Revitalization (CPR) as a novel efficient, stable fine-tuning method. CPR incorporates a periodic policy revitalization technique, restoring the overtrained policy network to full learning capacity while ensuring stable initial performance. This approach enables fine-tuning without being adversely affected by low-quality pre-trained policies. In contrast to previous research, CPR initializes the new policy with an adaptive policy constraint in policy optimization. Such optimization keeps the new policy close to behavior policy constructed from historical policies. This contributes to stable policy improvement and optimal converged performance. Practically, CPR can seamlessly integrate into existing offline RL algorithms with minimal modification. We empirically validate the effectiveness of our method through extensive experiments, demonstrating substantial improvements in learning stability and efficiency compared to previous approaches. Our code is available at https://github.com/LAMDA-RL/CPR.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Offline reinforcement learning

1531

Optimal Auction Design with User Coupons in Advertising Systems

Xiaodong Liu, Zhikang Fan, Yiming Ding, Yuan Guo, Lihua Zhang, Changcheng Li, Dongying Kong, Han Li, Weiran Shen

6 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

Online advertising is a major revenue source for most Internet companies. The advertising opportunities are usually sold to advertisers through auctions that take into account the bids of the advertisers and the click-through rates (CTRs) and the conversion rates (CVRs) of the users. Standard auction design theory perceives both the CTRs and the CVRs as constants. We consider a new auction mechanism that offers coupons to users when displaying the ads. Such coupons allow the user to buy the advertisers’ products or services at a lower price, which increases both the CTRs and the CVRs of the ads. In this paper, we formulate the problem mathematically and perform a systematic analysis. We characterize the set of individually rational and incentive compatible mechanisms in our setting. Based on the characterization, we identify the optimal strategy of offering coupons that maximizes the platform’s expected revenue. We also conduct extensive experiments on both synthetic data and industrial data. Our experiment results show that our mechanism significantly improves both the revenue and welfare of the platform, thereby creating a win-win situation for all parties including the platform, the advertisers, and the user.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

1532

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Zhaopeng Peng, Xiaoliang Fan, Yufan Chen, Zheng Wang, Shirui Pan, Chenglu Wen, Ruisheng Zhang, Cheng Wang

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Adapting Foundation Models (FMs) for down- stream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine- tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumula- tions of gradients. In this paper, we propose Feder- ated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise com- pression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations—layer- level and neuron-level—before and during FL fine- tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theo- retical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vi- sion) demonstrate the superiority of FedPFT. Our code is available at https://github.com/pzp-dzd/FedPFT.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Trustworthy machine learning
Multidisciplinary Topics and Applications -> MTA: Security and privacy

1540

Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Computer Vision -> CV: Applications
Computer Vision -> CV: Image and video synthesis and generation

1542

Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and artifacts from pose changes and keypoint localization errors. To address this, we propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. PGTFormer leverages semantic parsing guidance to select optimal face priors for generating temporally coherent artifact-free results. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. Then, the temporal parse-guided codebook predictor (TPCP) restores faces in different poses based on face parsing context cues without performing face pre-alignment. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment. Finally, the temporal fidelity regulator (TFR) enhances fidelity through temporal feature interaction and improves video temporal consistency. Extensive experiments on face videos show that our method outperforms previous face restoration baselines. The code will be released on https://github.com/kepengxu/PGTFormer.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Applications

1546

Unified Evidence Enhancement Inference Framework for Fake News Detection

Lianwei Wu, Linyong Wang, Yongqiang Zhao

6 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

The current approaches for fake news detection are mainly devoted to extracting candidate evidence from comments (or external articles) and establishing interactive reasoning with the news itself to verify the falsehood of the news. However, they still have several drawbacks: 1) The interaction object is coarse-grained, which mainly drives the entire news to participate in interaction, but ignores the learning of potential suspicious segments in news; 2) The reasoning ways are relatively single, making it difficult to explore the various possible correlations between news and candidate evidence. To this end, we propose Unified Evidence Enhancement Inference framework (UEEI) to discover and infer high-quality evidence to reveal the false parts of news for detection. Specifically, UEEI first promotes the interaction fusion between comments and news from the perspectives of semantic and emotion, thereby learning potential suspicious fragments in news. Then, the model constructs entity-level and relationship-level retrievals to screen sufficient candidate evidence from external sources. Finally, we measure coherence between suspicious fragments and candidate evidence by multi-view reasoning, and further infer explainable evidence that discovers the false parts of news. Experiments on three public datasets confirm the effectiveness and interpretability of our UEEI.

List of keywords

Natural Language Processing -> NLP: Applications
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MTA: Social sciences

1551

An NCDE-based Framework for Universal Representation Learning of Time Series

Zihan Liu, Bowen Du, Junchen Ye, Xianqing Wen, Leilei Sun

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

Exploiting self-supervised learning (SSL) to extract the universal representations of time series could not only capture the natural properties of time series but also offer huge help to the downstream tasks. Nevertheless, existing time series representation learning (TSRL) methods face challenges in attaining universality. Indeed, existing methods relying solely on one SSL strategy (either contrastive learning (CL) or generative) often fall short in capturing rich semantic information for various downstream tasks. Moreover, time series exhibit diverse distributions and inherent characteristics, particularly with the common occurrence of missing values, posing a notable challenge for existing backbones in effectively handling such diverse time series data. To bridge these gaps, we propose CTRL, a framework for universal TSRL. For the first time, we employ Neural Controlled Differential Equation (NCDE) as the backbone for TSRL, which captures the continuous processes and exhibits robustness to missing data. Additionally, a dual-task SSL strategy, integrating both reconstruction and contrasting tasks, is proposed to enrich the semantic information of the learned representations. Furthermore, novel hard negative construction and false negative elimination mechanisms are proposed to improve sampling efficiency and reduce sampling bias in CL. Finally, extensive experiments demonstrate the superiority of CTRL in forecasting, classification, and imputation tasks, particularly its outstanding robustness to missing data.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Time series and data streams

1564

CausVSR: Causality Inspired Visual Sentiment Recognition

Xinyue Zhang, Zhaoxia Wang, Hailing Wang, Jing Xiang, Chunwei Wu, Guitao Cao

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

Visual Sentiment Recognition (VSR) is an evolving field that aims to detect emotional tendencies within visual content. Despite its growing significance, detecting emotions depicted in visual content, such as images, faces challenges, notably the emergence of misleading or spurious correlations of the contextual information. In response to these challenges, we propose a causality inspired VSR approach, called CausVSR. CausVSR is rooted in the fundamental principles of Emotional Causality theory, mimicking the human process from receiving emotional stimuli to deriving emotional states. CausVSR takes a deliberate stride toward conquering the VSR challenges. It harnesses the power of a structural causal model, intricately designed to encapsulate the dynamic causal interplay between visual content and their corresponding pseudo sentiment regions. This strategic approach allows for a deep exploration of contextual information, elevating the accuracy of emotional inference. Additionally, CausVSR utilizes a global category elicitation module, strategically employed to execute front-door adjustment techniques, effectively detecting and handling spurious correlations. Experiments, conducted on four widely-used datasets, demonstrate CausVSR’s superiority in enhancing emotion perception within VSR, surpassing existing methods.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Deep learning architectures

1566

Modeling Selective Feature Attention for Lightweight Text Matching

Jianxiang Zang, Hui Liu

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Representation-based Siamese networks have risen to popularity in lightweight text matching due to their low deployment and inference costs. While word-level attention mechanisms have been implemented within Siamese networks to improve performance, we propose Feature Attention (FA), a novel downstream block designed to enrich the modeling of dependencies among embedding features. Employing "squeeze-and-excitation" techniques, the FA block dynamically adjusts the emphasis on individual features, enabling the network to concentrate more on features that significantly contribute to the final classification. Building upon FA, we introduce a dynamic "selection" mechanism called Selective Feature Attention (SFA), which leverages a stacked BiGRU Inception structure. The SFA block facilitates multi-scale semantic extraction by traversing different stacked BiGRU layers, encouraging the network to selectively concentrate on semantic information and embedding features across varying levels of abstraction. Both the FA and SFA blocks offer a seamless integration capability with various Siamese networks, showcasing a plug-and-play characteristic. Experimental evaluations conducted across diverse text matching baselines and benchmarks underscore the indispensability of modeling feature attention and the superiority of the "selection" mechanism.

List of keywords

Natural Language Processing -> NLP: Natural language semantics
Machine Learning -> ML: Attention models
Machine Learning -> ML: Deep learning architectures
Natural Language Processing -> NLP: Embeddings

1570

By Fair Means or Foul: Quantifying Collusion in a Market Simulation with Deep Reinforcement Learning

Michael Schlechtinger, Damaris Kosack, Franz Krause, Heiko Paulheim

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

In the rapidly evolving landscape of eCommerce, Artificial Intelligence (AI) based pricing algorithms, particularly those utilizing Reinforcement Learning (RL), are becoming increasingly prevalent. This rise has led to an inextricable pricing situation with the potential for market collusion. Our research employs an experimental oligopoly model of repeated price competition, systematically varying the environment to cover scenarios from basic economic theory to subjective consumer demand preferences. We also introduce a novel demand framework that enables the implementation of various demand models, allowing for a weighted blending of different models. In contrast to existing research in this domain, we aim to investigate the strategies and emerging pricing patterns developed by the agents, which may lead to a collusive outcome. Furthermore, we investigate a scenario where agents cannot observe their competitors’ prices. Finally, we provide a comprehensive legal analysis across all scenarios. Our findings indicate that RL-based AI agents converge to a collusive state characterized by the charging of supracompetitive prices, without necessarily requiring inter-agent communication. Implementing alternative RL algorithms, altering the number of agents or simulation settings, and restricting the scope of the agents’ observation space does not significantly impact the collusive market outcome behavior.

List of keywords

AI Ethics, Trust, Fairness -> ETF: AI and law, governance, regulation
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Machine Learning -> ML: Reinforcement learning
Multidisciplinary Topics and Applications -> MTA: Economics

1578

Primal Grammars Driven Automated Induction

Adel Bouhoula, Miki Hermann

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Automated induction is a powerful method for the validation of critical systems. However, the inductive proof process faces major challenges: it is undecidable and diverges even with small examples. Previous methods have proposed ad-hoc heuristics to speculate on additional lemmas that hopefully stop the divergence. Although these methods have succeeded in proving interesting theorems, they have significant limitations: in particular, they often fail to find appropriate lemmas, and the lemmas they provide may not be valid. We present a new method that allows us to perform inductive proofs in conditional theories. This method automatically detects divergence in proof traces and derives primal grammars as well as new lemmas that schematize the divergent sequence. This new construction allows us to break the divergence and complete the proof. Our method is presented as a set of inference rules whose soundness and refutational completeness have been formally proved. Unlike previous methods, our method is fully automated and has no risk of over-generalization. Moreover, our technique for capturing and schematizing divergence represents the most general decidable schematization, with respect to description power, among all known schematizations. Our method has been implemented in C++ and successfully proved over fifty complex examples that fail with well-known theorem provers (e.g., ACL2, Isabelle, PVS, SPIKE) and related methods for handling divergence in proofs by induction. Our method represents a significant contribution to the field of automated reasoning as it can be integrated with existing automated and interactive inductive proof systems to enhance their performance. Moreover, it has the potential to substantially reduce the time needed for the verification of critical systems.

List of keywords

Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving

1593

DFMDA-Net: Dense Fusion and Multi-dimension Aggregation Network for Image Restoration

Huibin Yan, Shuoyao Wang

6 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

The U-shape (encoder-decoder) architecture, combined with effective blocks, has shown significant success in image restoration. In U-shape models, there is insufficient focus on the feature fusion problem between encoder and decoder features at the same level. Current methods often employ simplistic operations like summation or concatenation, which makes it difficult to strike a balance between performance and complexity. To address this issue, we propose a compression-in-the-middle mechanism, termed Integration-Compression-Integration (ICI), which effectively conducts dense fusion and avoids information loss. From the block design perspective, we design a multi-dimension aggregation (MDA) mechanism, capable of effectively aggregating features from both the channel and spatial dimension. Combining the IntegrationCompression-Integration feature fusion and the multi-dimension aggregation, our dense fusion and multi-dimension aggregation network (DFMDANet) achieves superior performance over state-ofthe-art algorithms on 16 benchmarking datasets for numerous image restoration tasks.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Attention models
Machine Learning -> ML: Convolutional networks

1601

Robust Losses for Decision-Focused Learning

Noah Schutte, Krzysztof Postek, Neil Yorke-Smith

12 min. talk | August 9th at 10:00 | Session: ML: Robustness

[+] More

[-] Less

Optimization models used to make discrete decisions often contain uncertain parameters that are context-dependent and estimated through prediction. To account for the quality of the decision made based on the prediction, decision-focused learning (end-to-end predict-then-optimize) aims at training the predictive model to minimize regret, i.e., the loss incurred by making a suboptimal decision. Despite the challenge of the gradient of this loss w.r.t. the predictive model parameters being zero almost everywhere for optimization problems with a linear objective, effective gradient-based learning approaches have been proposed to minimize the expected loss, using the empirical loss as a surrogate. However, empirical regret can be an ineffective surrogate because empirical optimal decisions can vary substantially from expected optimal decisions. To understand the impact of this deficiency, we evaluate the effect of aleatoric and epistemic uncertainty on the accuracy of empirical regret as a surrogate. Next, we propose three novel loss functions that approximate expected regret more robustly. Experimental results show that training two state-of-the-art decision-focused learning approaches using robust regret losses improves test-sample empirical regret in general while keeping computational time equivalent relative to the number of training epochs.

List of keywords

Machine Learning -> ML: Robustness
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Machine Learning -> ML: Regression
Machine Learning -> ML: Optimization

1614

Protecting Object Detection Models from Model Extraction Attack via Feature Space Coverage

Zeyu Li, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji

6 min. talk | August 7th at 10:00 | Session: ETF: Safety and robustness

[+] More

[-] Less

The model extraction attack is an attack pattern aimed at stealing well-trained machine learning models’ functionality or privacy information. With the gradual popularization of AI-related technologies in daily life, various well-trained models are being deployed. As a result, these models are considered valuable assets and attractive to model extraction attackers. Currently, the academic community primarily focuses on defense for model extraction attacks in the context of classification, with little attention to the more commonly used task scenario of object detection. Therefore, we propose a detection framework targeting model extraction attacks against object detection models in this paper. The framework first locates suspicious users based on feature coverage in query traffic and uses an active verification module to confirm whether the identified suspicious users are attackers. Through experiments conducted in multiple task scenarios, we validate the effectiveness and detection efficiency of the proposed method.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Computer Vision -> CV: Recognition (object detection, categorization)

1625

Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

Tiange Zhang, Qing Cai, Feng Gao, Lin Qi, Junyu Dong

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. However, most existing methods pay more attention to learning domain-adaptive inductive bias (meta-knowledge) through feature-wise manipulation or task diversity improvement while neglecting the phenomenon that deep networks tend to rely more on high-frequency cues to make the classification decision, which thus degenerates the robustness of learned inductive bias since high-frequency information is vulnerable and easy to be disturbed by noisy information. Hence in this paper, we make one of the first attempts to propose a Frequency-Aware Prompting method with mutual attention for Cross-Domain Few-Shot classification, which can let networks simulate the human visual perception of selecting different frequency cues when facing new recognition tasks. Specifically, a frequency-aware prompting mechanism is first proposed, in which high-frequency components of the decomposed source image are switched either with normal distribution sampling or zeroing to get frequency-aware augment samples. Then, a mutual attention module is designed to learn generalizable inductive bias under CD-FSL settings. More importantly, the proposed method is a plug-and-play module that can be directly applied to most off-the-shelf CD-FLS methods. Experimental results on CD-FSL benchmarks demonstrate the effectiveness of our proposed method as well as robustly improve the performance of existing CD-FLS methods. Resources at https://github.com/tinkez/FAP_CDFSC.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Multi-task and transfer learning

1627

Bridge to Non-Barrier Communication: Gloss-Prompted Fine-Grained Cued Speech Gesture Generation with Diffusion Model

Wentao Lei, Li Liu, Jun Wang

6 min. talk | August 8th at 10:00 | Session: NLP: Speech

[+] More

[-] Less

Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger movements, as well as lip movements, meanwhile the two kinds of movements need to be asynchronously aligned. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models and careful hand-crafted pre-processing to fit the models. Therefore, we propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff). Specifically, to integrate additional linguistic rules knowledge into the model. we first introduce a bridging instruction called Gloss, which is an automatically generated descriptive text to establish a direct and more delicate semantic connection between spoken language and CS gestures. Moreover, we first suggest rhythm is an important paralinguistic feature for CS to improve the communication efficacy. Therefore, we propose a novel Audio-driven Rhythmic Module (ARM) to learn rhythm that matches audio speech. Moreover, in this work, we design, record, and publish the first Chinese CS dataset with four CS cuers. Extensive experiments demonstrate that our method quantitatively and qualitatively outperforms current state-of-the-art (SOTA) methods. We will release the code and data at glossdiff.github.io/.

List of keywords

Natural Language Processing -> NLP: Speech
Computer Vision -> CV: Applications

1631

Provable Acceleration of Nesterov’s Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Xin Liu, Wei Tao, Wei Li, Dazhi Zhan, Jun Wang, Zhisong Pan

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including the heavy ball (HB) method and Nesterov’s accelerated gradient (NAG) method, are the workhorse of first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits superior performance than HB. However, current theoretical works fail to distinguish their convergence difference in training neural networks. To fill this gap, we consider the training problem of the two-layer ReLU neural network under over-parameterization and random initialization. Leveraging high-resolution dynamical systems and neural tangent kernel (NTK) theory, our result not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also provides the first theoretical guarantee for the acceleration of NAG over HB in training neural networks. Finally, we validate our theoretical results on three benchmark datasets.

List of keywords

Machine Learning -> ML: Theory of deep learning
Machine Learning -> ML: Optimization

1635

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, Xuan Song

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Spatiotemporal forecasting techniques are significant for various domains such as transportation, energy, and weather. Accurate prediction of spatiotemporal series remains challenging due to the complex spatiotemporal heterogeneity. In particular, current end-to-end models are limited by input length and thus often fall into spatiotemporal mirage, i.e., similar input time series followed by dissimilar future values and vice versa. To address these problems, we propose a novel self-supervised pre-training framework Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE) that employs two decoupled masked autoencoders to reconstruct spatiotemporal series along the spatial and temporal dimensions. Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream predictors with arbitrary architectures to augment their performances. A series of quantitative and qualitative evaluations on four widely used benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to validate the state-of-the-art performance of STD-MAE. Codes are available at https://github.com/Jimmy-7664/STD-MAE.

List of keywords

Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

1645

Let’s Start Over: Retraining with Selective Samples for Generalized Category Discovery

Zhimao Peng, Enguang Wang, Xialei Liu, Ming-Ming Cheng

6 min. talk | August 8th at 10:00 | Session: ML: Clustering

[+] More

[-] Less

Generalized Category Discovery (GCD) presents a realistic and challenging problem in open-world learning. Given a par- tially labeled dataset, GCD aims to categorize unlabeled data by leveraging visual knowledge from the labeled data, where the unlabeled data includes both known and unknown classes. Existing methods based on parametric/non-parametric classi- fiers attempt to generate pseudo-labels/relationships for the unlabeled data to enhance representation learning. However, the lack of ground-truth labels for novel classes often leads to noisy pseudo-labels/relationships, resulting in suboptimal representation learning. This paper introduces a novel method using Nearest Neighbor Distance-aware Label Consistency sample selection. It creates class-consistent subsets for novel class sample clusters from the current GCD method, acting as “pseudo-labeled sets” to mitigate representation bias. We propose progressive supervised representation learning with selected samples to optimize the trade-off between quantity and purity in each subset. Our method is versatile and appli- cable to various GCD methods, whether parametric or non- parametric. We conducted extensive experiments on multiple generic and fine-grained image classification datasets to eval- uate the effectiveness of our approach. The results demon- strate the superiority of our method in achieving improved performance in generalized category discovery tasks.

List of keywords

Machine Learning -> ML: Clustering
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Classification
Computer Vision -> CV: Recognition (object detection, categorization)

1653

Convexity Certificates for Symbolic Tensor Expressions

Paul G. Rump, Niklas Merk, Julien Klaus, Maurice Wenig, Joachim Giesen

6 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization

[+] More

[-] Less

Knowing that a function is convex ensures that any local minimum is also a global minimum. Here, we implement an approach to certify the convexity of twice-differentiable functions by certifying that their second-order derivative is positive semidefinite. Both the computation of the second-order derivative and the certification of positive semidefiniteness are done symbolically. Previous implementations of this approach assume that the function to be minimized takes scalar or vector inputs, meaning that the second-order derivative is at most a matrix. However, the input of many machine learning problems is naturally given in the form of matrices or higher order tensors, in which case the second-order derivative becomes a tensor of at least fourth order. The familiar linear algebra notations and known rules for determining whether a matrix is positive semidefinite are not sufficient to deal with these higher order expressions. Here, we present a formal language for tensor expressions that allows us to generalize semidefiniteness to higher-order tensors and thereby certify the convexity of a broader set of functions.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Machine Learning -> ML: Optimization
Machine Learning -> ML: Symbolic methods

1688

Attribution Quality Metrics with Magnitude Alignment

Chase Walker, Dominic Simon, Kenny Chen, Rickard Ewetz

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

Attribution algorithms play an instrumental role in human interpretation of AI models. The methods measure the importance of the input features to the model output decision, which can be displayed as an attribution map for image classifiers. Perturbation tests are the state-of-the-art approach to evaluate the quality of an attribution map. Unfortunately, we observe that perturbation tests fail to consider attribution magnitude, which translates into inconsistent quality scores. In this paper, we propose Magnitude Aligned Scoring (MAS), a new attribution quality metric that measures the alignment between the magnitude of the attributions and the model response. In particular, the metric accounts for both the relative ordering and the magnitude of the pixels within an attribution. In the experimental evaluation, we compare the MAS metric with existing metrics across a wide range of models, datasets, attributions, and evaluations. The results demonstrate that the MAS metric is 4x more sensitive to attribution changes, 2x more consistent, and 1.6x more invariant to baseline modifications. Our code and the referenced appendix are publicly available via https://github.com/chasewalker26/Magnitude-Aligned-Scoring.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Computer Vision -> CV: Interpretability and transparency
Machine Learning -> ML: Explainable/Interpretable machine learning

1700

MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement

Zifeng Wang, Chufan Gao, Cao Xiao, Jimeng Sun

6 min. talk | August 9th at 11:30 | Session: MTA: Health and medicine

[+] More

[-] Less

Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement” pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Life sciences

1703

Towards Robust Multi-Label Learning against Dirty Label Noise

Yuhai Zhao, Yejiang Wang, Zhengkui Wang, Wen Shan, Miaomiao Huang, Meixia Wang, Min Huang, Xingwei Wang

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

In multi-label learning, one of the major challenges is that the data are associated with label noise including the random noisy labels (e.g., data encoding errors) and noisy labels created by annotators (e.g., missing, extra, or error label), where noise is promoted by different structures (e.g., gaussian, sparse or subjective). Existing methods are tailored to handle noise with one specific structure. However, they lack of consideration of the fact that the data are always with dirty noisy labels, simutaneously gaussian, sparse and subjective, in real applications. In this paper, we formalize the multi-label learning with dirty noise as a new learning problem, namely Noisy Multi-label Learning (NML). To solve the NML problem, we decompose a corrupted label matrix as the noise matrix plus a true label matrix (maybe high-rank). For the noise matrix, a mixed norm penalty is developed as regularizer for dirty noise distribution. Under this norm, the conditions required for exact noise recovery are provided theoretically. For the true label matrix that is not necessarily low-rank, we apply a non-linear mapping to ensure its low-rankness such that the high-order label correlation can be utilized. Experimental results show that the proposed method outperforms the state-of-the-art methods significantly.

List of keywords

Machine Learning -> ML: Multi-label learning
Machine Learning -> ML: Optimization
Machine Learning -> ML: Weakly supervised learning

1708

EMOTE: An Explainable Architecture for Modelling the Other through Empathy

Manisha Senadeera, Thommen Karimpanal George, Stephan Jacobs, Sunil Gupta, Santu Rana

12 min. talk | August 6th at 11:30 | Session: ML: Multiagent Reinforcement Learning

[+] More

[-] Less

Empathy allows us to assume others are like us and have goals analogous to our own. This can also at times be applied to multi-agent games – e.g. Agent 1’s attraction to green balls is analogous to Agent 2’s attraction to red balls. Drawing inspiration from empathy, we propose EMOTE, a simple and explainable inverse reinforcement learning (IRL) approach designed to model another agent’s action-value function and from it, infer a unique reward function. This is done by referencing the learning agent’s own action value function, removing the need to maintain independent action-value estimates for the modelled agents whilst simultaneously addressing the ill-posed nature of IRL by inferring a unique reward function. We experiment on minigrid environments showing EMOTE: (a) produces more consistent reward estimates relative to other IRL baselines (b) is robust in scenarios with composite reward and action-value functions (c) produces human-interpretable states, helping to explain how the agent views other agents.

List of keywords

Machine Learning -> ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Reinforcement learning

1716

Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning

Jihyeon Seong, Sekwang Oh, Jaesik Choi

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,’ resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extreme values, including both abrupt changes and noise. In this paper, we propose Trend Point Detection formulated as a Markov Decision Process (MDP), a novel approach to identifying essential points that should be reflected in the trend, departing from approximations. We term these essential points as Dynamic Trend Points (DTPs) and extract trends by interpolating them. To identify DTPs, we utilize Reinforcement Learning (RL) within a discrete action space and a forecasting sum-of-squares loss function as a reward, referred to as the Dynamic Trend Filtering network (DTF-net). DTF-net integrates flexible noise filtering, preserving critical original subsequences while removing noise as required for other subsequences. We demonstrate that DTF-net excels at capturing abrupt changes compared to other trend filtering algorithms and enhances forecasting performance, as abrupt changes are predicted rather than smoothed out.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Reinforcement learning

1752

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Junjie Zhang, Tianci Hu, Xiaoshui Huang, Yongshun Gong, Dan Zeng

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges. The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field. Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs. A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models. To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs. Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks. Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total. Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions. Codes are available at https://github.com/Inshsang/3DBench.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Scene analysis and understanding

1758

Self-adaptive Extreme Penalized Loss for Imbalanced Time Series Prediction

Yiyang Wang, Yuchen Han, Yuhan Guo

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Forecasting time series in imbalanced data presents a significant research challenge that requires considerable attention. Although there are specialized techniques available to tackle imbalanced time series prediction, existing approaches tend to prioritize extreme predictions at the expense of compromising the forecasting accuracy of normal samples. We in this paper propose an extreme penalized loss function that relaxes the constraint on overestimating extreme events, thereby imposing great penalties on both normal and underestimating extreme events. In addition, we provide a self-adaptive way for setting the hyperparameters of the loss function. Then, both the proposed loss function and an attention module are integrated with LSTM networks in a decomposition-based framework. Extensive experiments conducted on real-world datasets demonstrate the superiority of our framework compared to other state-of-the-art approaches for both time series prediction and block maxima prediction tasks.

List of keywords

Machine Learning -> ML: Time series and data streams

1762

Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-based Methods

Chuni Liu, Boyuan Ma, Xiaojuan Ban, Yujie Xie, Hao Wang, Weihua Xue, Jingchao Ma, Ke Xu

12 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Topological consistency plays a crucial role in the task of boundary segmentation for reticular images, such as cell membrane segmentation in neuron electron microscopic images, grain boundary segmentation in material microscopic images and road segmentation in aerial images. In these fields, topological changes in segmentation results have a serious impact on the downstream tasks, which can even exceed the misalignment of the boundary itself. To enhance the topology accuracy in segmentation results, we propose the Skea-Topo Aware loss, which is a novel loss function that takes into account the shape of each object and topological significance of the pixels. It consists of two components. First, a skeleton-aware weighted loss improves the segmentation accuracy by better modeling the object geometry with skeletons. Second, a boundary rectified term effectively identifies and emphasizes topological critical pixels in the prediction errors using both foreground and background skeletons in the ground truth and predictions. Experiments prove that our method improves topological consistency by up to 7 points in VI compared to 13 state-of-art methods, based on objective and subjective assessments across three different boundary segmentation datasets. The code is available at https://github.com/clovermini/Skea_topo.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Biomedical image analysis

1768

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

Jianqiang Xia, Dianxi Shi, Ke Song, Linna Song, Xiaolei Wang, Songchang Jin, Chenran Zhao, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao

6 min. talk | August 8th at 10:00 | Session: CV: Motion and tracking

[+] More

[-] Less

Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network’s ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restricts the tracking speed. To overcome these problems, we propose a unified single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone through joint feature extraction, fusion and relation modeling. With this structure, the network can not only extract the fusion features of templates and search regions under the interaction of modalities, but also significantly improve tracking speed through the single-stage fusion tracking paradigm. Furthermore, we introduce a novel feature selection mechanism based on modality reliability to mitigate the influence of invalid modalities for final prediction. Extensive experiments on three mainstream RGB-T tracking benchmarks show that our method achieves the new state-of-the-art while achieving the fastest tracking speed of 84.2FPS. Code is available at https://github.com/xiajianqiang/USTrack.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Video analysis and understanding

1790

WPML3CP: Wasserstein Partial Multi-Label Learning with Dual Label Correlation Perspectives

Ximing Li, Yuanchao Dai, Bing Wang, Changchun Li, Renchu Guan, Fangming Gu, Jihong Ouyang

6 min. talk | August 8th at 10:00 | Session: ML: Weakly supervised learning

[+] More

[-] Less

Partial multi-label learning (PMLL) refers to a weakly-supervised classification problem, where each instance is associated with a set of candidate labels, covering its ground-truth labels but also with irrelevant ones. The current methodology of PMLL is to estimate the ground-truth confidences of candidate labels, i.e., the likelihood of a candidate label being a ground-truth one, and induce the multi-label predictor with them, rather than the candidate labels. In this paper, we aim to estimate precise ground-truth confidences by leveraging precise label correlations, which are also required to estimate. To this end, we propose to capture label correlations from both measuring and modeling perspectives. Specifically, we measure the loss between ground-truth confidences and predictions by employing the Wasserstein distance involving label correlations; and form a label correlation-aware regularization to constraint predictive parameters. The two techniques are coupled to promote precise estimations of label correlations. Upon these ideas, we propose a novel PMLL method, namely Wasserstein Partial Multi-Label Learning with dual Label Correlation Perspectives (WPML3CP). We conduct extensive experiments on several benchmark datasets. Empirical results demonstrate that WPML3CP can outperform the existing PMLL baselines.

List of keywords

Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Self-supervised Learning

1799

FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning

Yinlin Zhu, Xunkai Li, Zhengyu Wu, Di Wu, Miao Hu, Rong-Hua Li

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Subgraph federated learning (subgraph-FL) is a new distributed paradigm that facilitates the collaborative training of graph neural networks (GNNs) by multi-client subgraphs. Unfortunately, a significant challenge of subgraph-FL arises from subgraph heterogeneity, which stems from node and topology variation, causing the impaired performance of the global GNN. Despite various studies, they have not yet thoroughly investigated the impact mechanism of subgraph heterogeneity. To this end, we decouple node and topology variation, revealing that they correspond to differences in label distribution and structure homophily. Remarkably, these variations lead to significant differences in the class-wise knowledge reliability of multiple local GNNs, misguiding the model aggregation with varying degrees. Building on this insight, we propose topology-aware data-free knowledge distillation technology (FedTAD), enhancing reliable knowledge transfer from the local model to the global model. Extensive experiments on six public datasets consistently demonstrate the superiority of FedTAD over state-of-the-art baselines.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Federated learning

1803

Enhancing Length Generalization for Attention Based Knowledge Tracing Models with Linear Biases

Xueyi Li, Youheng Bai, Teng Guo, Zitao Liu, Yaying Huang, Xiangyu Zhao, Feng Xia, Weiqi Luo, Jian Weng

6 min. talk | August 6th at 11:30 | Session: MTA: Multidisciplinary Topics and Applications (1/2)

[+] More

[-] Less

Knowledge tracing (KT) is the task of predicting students’ future performance based on their historical learning interaction data. With the rapid advancement of attention mechanisms, many attention based KT models are developed. However, existing attention based KT models exhibit performance drops as the number of student interactions increases beyond the number of interactions on which the KT models are trained. We refer to this as the length generalization of KT model. In this paper, we propose stableKT to enhance length generalization that is able to learn from short sequences and maintain high prediction performance when generalizing on long sequences. Furthermore, we design a multi-head aggregation module to capture the complex relationships between questions and the corresponding knowledge components (KCs) by combining dot-product attention and hyperbolic attention. Experimental results on three public educational datasets show that our model exhibits robust capability of length generalization and outperforms all baseline models in terms of AUC. To encourage reproducible research, we make our data and code publicly available at https://pykt.org.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Education
Humans and AI -> HAI: Computer-aided education

1810

InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models

Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song

6 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music’s distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and incorporate it in the semantic space to improve melodic harmony while editing. For accommodating extended musical pieces, InstructME employs a chunk transformer, enabling it to discern long-term temporal dependencies within music sequences. We tested InstructME in instrument-editing, remixing, and multi-round editing. Both subjective and objective evaluations indicate that our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony. Demo samples are available at https://musicedit.github.io

List of keywords

Multidisciplinary Topics and Applications -> MTA: Arts and creativity
Multidisciplinary Topics and Applications -> MTA: Other

1819

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

Jiacong Hu, Jingwen Ye, Zunlei Feng, Jiazhen Yang, Shunyu Liu, Xiaotian Yu, Lingxiang Jia, Mingli Song

6 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model’s robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model’s robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature’s pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature’s capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Classification

1823

AMO-aware Aggregates in Answer Set Programming

Mario Alviano, Carmine Dodaro, Salvatore Fiorentino, Marco Maratea

6 min. talk | August 7th at 11:30 | Session: KRR: Logic programming

[+] More

[-] Less

Aggregates such as sum and count are among the most frequently used linguistic extensions of Answer Set Programming (ASP). At-most-one (AMO) constraints are a specific form of aggregates that excludes the simultaneous truth of multiple elements in a set. This article unleashes a powerful propagation strategy in case groups of elements in an aggregate are also involved in AMO constraints. In fact, the combined knowledge given by aggregates and AMO constraints significantly increases the effectiveness of search space pruning, resulting in sensible performance gains.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

1828

Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion

Yiming Sun, Bing Cao, Pengfei Zhu, Qinghua Hu

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Infrared and visible image fusion aim to integrate modality strengths for visually enhanced, informative images. Visible imaging in real-world scenarios is susceptible to dynamic environmental brightness fluctuations, leading to texture degradation. Existing fusion methods lack robustness against such brightness perturbations, significantly compromising the visual fidelity of the fused imagery. To address this challenge, we propose the Brightness Adaptive multimodal dynamic fusion framework (BA-Fusion), which achieves robust image fusion despite dynamic brightness fluctuations. Specifically, we introduce a Brightness Adaptive Gate (BAG) module, which is designed to dynamically select features from brightness-related channels for normalization, while preserving brightness-independent structural information within the source images. Furthermore, we propose a brightness consistency loss function to optimize the BAG module. The entire framework is tuned via alternating training strategies. Extensive experiments validate that our method surpasses state-of-the-art methods in preserving multi-modal image information and visual fidelity, while exhibiting remarkable robustness across varying brightness levels. Our code is available: https://github.com/SunYM2020/BA-Fusion.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Multimodal learning

1862

Concentration Tail-Bound Analysis of Coevolutionary and Bandit Learning Algorithms

Per Kristian Lehre, Shishen Lin

6 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as bandit and evolutionary algorithms. Drift refers roughly to the expected progress towards the optimum per iteration. This paper considers the problem of deriving concentration tail-bounds on the runtime of algorithms. It provides a novel drift theorem that gives precise exponential tail-bounds given positive, weak, zero and even negative drift. Previously, such exponential tail bounds were missing in the case of weak, zero, or negative drift. Our drift theorem can be used to prove a strong concentration of the runtime/regret of algorithms in AI. For example, we prove that the regret of the RWAB bandit algorithm is highly concentrated, while previous analyses only considered the expected regret. This means that the algorithm obtains the optimum within a given time frame with high probability, i.e. a form of algorithm reliability. Moreover, our theorem implies that the time needed by the co-evolutionary algorithm RLS-PD to obtain a Nash equilibrium in a Bilinear max-min-benchmark problem is highly concentrated. However, we also prove that the algorithm forgets the Nash equilibrium, and the time until this occurs is highly concentrated. This highlights a weakness in the RLS-PD which should be addressed by future work.

List of keywords

Search -> S: Evolutionary computation
Search -> S: Heuristic search
Search -> S: Other

1872

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

Xiang Chen, Duanzheng Song, Honghao Gui, Chenxi Wang, Ningyu Zhang, Yong Jiang, Fei Huang, Chengfei Lyu, Dan Zhang, Huajun Chen

6 min. talk | August 9th at 10:00 | Session: NLP: Resources and evaluation

[+] More

[-] Less

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors’ explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce TRUTH-TRIANGULATOR which synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence.

List of keywords

Natural Language Processing -> NLP: Resources and evaluation
Natural Language Processing -> NLP: Applications

1883

Two-stage Semi-supervised Speaker Recognition with Gated Label Learning

Xingmei Wang, Jiaxiang Meng, Kong Aik Lee, Boquan Li, Jinghan Liu

6 min. talk | August 8th at 10:00 | Session: NLP: Speech

[+] More

[-] Less

Speaker recognition technologies have been successfully applied in diverse domains, benefiting from the advance of deep learning. Nevertheless, current efforts are still subject to the lack of labeled data. Such issues have been attempted in computer vision, through semi-supervised learning (SSL) that assigns pseudo labels for unlabeled data, undertaking the role of labeled ones. Through our empirical evaluations, the state-of-the-art SSL methods show unsatisfactory performance in speaker recognition tasks, due to the imbalance between the quantity and quality of pseudo labels. Therefore, in this work, we propose a two-stage SSL framework, with the aim to address the data scarcity challenge. We first construct an initial contrastive learning network, where the encoder outputs the embedding representation of utterances. Furthermore, we construct an iterative holistic semi-supervised learning network that involves a clustering strategy to assign pseudo labels, and a gated label learning (GLL) strategy to further select reliable pseudo-label data. Systematical evaluations show that our proposed framework achieves superior performance in speaker recognition than the state-of-the-art methods, matching the performance of supervised learning.

List of keywords

Natural Language Processing -> NLP: Speech
Machine Learning -> ML: Semi-supervised learning

1886

A Fast Algorithm for MaxSAT above Half Number of Clauses

Junqiang Peng, Mingyu Xiao

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

We study the following parameterization of the MaxSAT problem: Given a CNF formula F with m clauses, decide whether at least m/2 + μ clauses in F could be satisfied, where μ is the excess of the number of satisfied clauses over the trivial lower bound m/2 and is taken as the parameter. This perspective is known as the "above guarantee" parameterization. Since its introduction by Mahajan and Raman [1999], the analysis of parameterization above guarantee has become a highly active and fruitful line of research. In this paper, we develop a new algorithm with runtime O*(2.1479^μ), significantly improving the previous best upper bound O*(5.4064^μ) for this important problem. Here, the O* notation omits polynomial factors.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Search -> S: Combinatorial search and optimisation

1888

InstructEdit: Instruction-Based Knowledge Editing for Large Language Models

Ningyu Zhang, Bozhong Tian, Siyuan Cheng, Xiaozhuan Liang, Yi Hu, Kouying Xue, Yanjie Gou, Xi Chen, Huajun Chen

6 min. talk | August 9th at 11:30 | Session: NLP: Natural Language Processing (3/3)

[+] More

[-] Less

Knowledge editing for large language models can offer an efficient solution to alter a model’s behavior without negatively impacting the overall performance. However, the current approaches encounter issues with limited generalizability across tasks, necessitating one distinct editor for each task, significantly hindering the broader applications. To address this, we take the first step to analyze the multi-task generalization issue in knowledge editing. Specifically, we develop an instruction-based editing technique, termed InstructEdit, which facilitates the editor’s adaptation to various task performances simultaneously using simple instructions. With only one unified editor for each LLM, we empirically demonstrate that InstructEdit can improve the editor’s control, leading to an average 14.86% increase in Reliability in multi-task editing setting. Furthermore, experiments involving holdout unseen task illustrate that InstructEdit consistently surpass previous strong baselines. To further investigate the underlying mechanisms of instruction-based knowledge editing, we analyze the principal components of the editing gradient directions, which unveils that instructions can help control optimization direction with stronger OOD generalization.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Applications

1919

Subgraph Pooling: Tackling Negative Transfer on Graphs

Zehong Wang, Zheyuan Zhang, Chuxu Zhang, Yanfang Ye

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Transfer learning aims to enhance performance on a target task by using knowledge from related tasks. However, when the source and target tasks are not closely aligned, it can lead to reduced performance, known as negative transfer. Unlike in image or text data, we find that negative transfer could commonly occur in graph-structured data, even when source and target graphs have semantic similarities. Specifically, we identify that structural differences significantly amplify the dissimilarities in the node embeddings across graphs. To mitigate this, we bring a new insight in this paper: for semantically similar graphs, although structural differences lead to significant distribution shift in node embeddings, their impact on subgraph embeddings could be marginal. Building on this insight, we introduce Subgraph Pooling (SP) by aggregating nodes sampled from a k-hop neighborhood and Subgraph Pooling++ (SP++) by a random walk, to mitigate the impact of graph structural differences on knowledge transfer. We theoretically analyze the role of SP in reducing graph discrepancy and conduct extensive experiments to evaluate its superiority under various settings. The proposed SP methods are effective yet elegant, which can be easily applied on top of any backbone Graph Neural Networks (GNNs). Our code and data are available at: https://github.com/Zehong-Wang/Subgraph-Pooling.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning

1920

Game Transformations That Preserve Nash Equilibria or Best-Response Sets

Emanuel Tewolde, Vincent Conitzer

6 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

In this paper, we investigate under which conditions normal-form games are (guaranteed to be) strategically equivalent. First, we show for N -player games (N >= 3) that (A) it is NP-hard to decide whether a given strategy is a best response to some strategy profile of the opponents, and that (B) it is co-NP-hard to decide whether two games have the same best-response sets. Combining that with known results from the literature, we move our attention to equivalence-preserving game transformations. It is a widely used fact that a positive affine (linear) transformation of the utility payoffs neither changes the best-response sets nor the Nash equilibrium set. We investigate which other game transformations also possess either of the following two properties when being applied to an arbitrary N-player game (N >= 2): (i) The Nash equilibrium set stays the same; (ii) The best-response sets stay the same. For game transformations that operate player-wise and strategy-wise, we prove that (i) implies (ii) and that transformations with property (ii) must be positive affine. The resulting equivalence chain highlights the special status of positive affine transformations among all the transformation procedures that preserve key game-theoretic characteristics.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> General

1944

Eliminating the Cross-Domain Misalignment in Text-guided Image Inpainting

Muqi Huang, Chaoyue Wang, Yong Luo, Lefei Zhang

12 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Text-guided image inpainting has rapidly garnered prominence as a task in user-directed image synthesis, aiming to complete the occluded image regions following the textual prompt provided. However, current methods usually grapple with issues arising from the disparity between low-level pixel data and high-level semantic descriptions, which results in inpainted sections not harmonizing with the original image (either structurally or texturally). In this study, we introduce a Structure-Aware Inpainting Learning scheme and an Asymmetric Cross Domain Attention to address these cross-domain misalignment challenges. The proposed structure-aware learning scheme employs features of an intermediate modality as structure guidance to bridge the gap between text information and low-level pixels. Meanwhile, asymmetric cross-domain attention enhances the texture consistency between inpainted and unmasked regions. Our experiments show exceptional performance on leading datasets such as MS-COCO and Open Images, surpassing state-of-the-art text-guided image inpainting methods. Code is released at: https://github.com/MucciH/ECDM-inpainting

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Vision, language and reasoning

1946

Fraud Risk Mitigation in Real-Time Payments: A Strategic Agent-Based Analysis

Katherine Mayo, Nicholas Grabill, Michael P. Wellman

12 min. talk | August 9th at 10:00 | Session: MAS: Applications

[+] More

[-] Less

Whereas standard financial mechanisms for payment may take days to finalize, real-time payments (RTPs) provide immediate processing and final receipt of funds. The speed of settlement benefits customers, but raises vulnerability to fraud. We seek to understand how bank nodes may strategically mitigate fraud risk in RTPs, through investment in fraud detection and restricting payments eligible for real-time processing. To study this, we introduce an agent-based model of the payment network supporting both real-time and standard payments, and define a game among banks and fraudsters. Using empirical game-theoretic analysis, we identify Nash equilibria in nine game configurations defined by network attributes. Our analysis finds that as banks become more liable for fraud, they continue to allow RTPs but are more likely to employ both restrictions and a high level of fraud detection. Fraudsters, in response, switch from targeting only RTPs to attempting fraud with any type of payment and tend to exploit banks where they have historically been most successful. We also conduct a strategic feature gains assessment to further understand the benefit offered by each of the bank’s risk mitigation measures, which confirms the importance of selective RTP restrictions. Finally, we find that in equilibrium bank strategic decisions negatively affect fraudsters while minimally impacting customers.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Multidisciplinary Topics and Applications -> MTA: Economics
Multidisciplinary Topics and Applications -> MTA: Finance

1949

Concept-Level Causal Explanation Method for Brain Function Network Classification

Jinduo Liu, Feipeng Wang, Junzhong Ji

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Using deep models to classify brain functional networks (BFNs) for the auxiliary diagnosis and treatment of brain diseases has become increasingly popular. However, the unexplainability of deep models has seriously hindered their applications in computer-aided diagnosis. In addition, current explanation methods mostly focus on natural images, which cannot be directly used to explain the deep model for BFN classification. In this paper, we propose a concept-level causal explanation method for BFN classification called CLCEM. First, CLCEM employs the causal learning method to extract concepts that are meaningful to humans from BFNs. Second, it aggregates the same concepts to obtain the contribution of each concept to the model output. Finally, CLCEM adds the contribution of each concept to make a diagnosis. The experimental results show that our CLCEM can not only accurately identify brain regions related to specific brain diseases but also make decisions based on the concepts of these brain regions, which enables humans to understand the decision-making process without performance degradation.

List of keywords

Humans and AI -> HAI: Brain sciences
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Data Mining -> DM: Networks
Machine Learning -> ML: Causality

1951

Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

6 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex hierarchically dependent tasks, especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on hierarchical text classification (HTC) which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5% over the previous state-of-the-art (SOTA) baselines under few-shot settings (1->16), while remaining SOTA hierarchical consistency performance.

List of keywords

Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Text classification

1958

Feedback-Based Adaptive Crossover-Rate in Evolutionary Computation

Xiaoyuan Guan, Tianyi Yang, Chunliang Zhao, Yuren Zhou

6 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

We propose a novel approach to improve multi-objective evolutionary algorithms by modifying crossover operations. Our approach uses a modifiable cross distribution and virtual point to rebalance the probability distribution of all crossover options. This design reduces runtime for typical pseudo-Boolean functions. Experiments and analysis show our approach effectively optimizes bi-objective problems COCZ and LOTZ in Θ(n) time during crossover, outperforming conventional crossover multi-objective evolutionary algorithms (C-MOEA) which require O(n log n) steps. For the tri-objective problem Hierarchical-COCZ, our approach guarantees an expected runtime of Θ(n2 log n), while C-MOEA needs at least Ω(n2 log n) and at most O(n2 log2 n) steps.

List of keywords

Search -> S: Evolutionary computation
Machine Learning -> ML: Evolutionary learning

1959

Cross-View Diversity Embedded Consensus Learning for Multi-View Clustering

Chong Peng, Kai Zhang, Yongyong Chen, Chenglizhao Chen, Qiang Cheng

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view clustering (MVC) has garnered significant attention in recent studies. In this paper, we propose a novel MVC method, named CCL-MVC. The novel method constructs a cross-order neighbor tensor of multi-view data to recover a low-rank essential tensor, preserves noise-free, comprehensive, and complementary cross-order relationships among the samples. Furthermore, it constructs a consensus representation matrix by fusing the low-rank essential tensor with auto-adjusted cross-view diversity embedding, fully exploiting both consensus and discriminative information of the data. An effective optimization algorithm is developed, which is theoretically guaranteed to converge. Extensive experimental results confirm the effectiveness of the proposed method.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering

1967

On the Logic of Theory Change Iteration of KM-Update, Revised

Liangda Fang, Tong Zhu, Quanlong Guan, Junming Qiu, Zhao-Rong Lai, Weiqi Luo, Hai Wan

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Belief revision and update, two significant types of belief change, both focus on how an agent modifies her beliefs in presence of new information. The most striking difference between them is that the former studies the change of beliefs in a static world while the latter concentrates on a dynamically-changing world. The famous AGM and KM postulates were proposed to capture rational belief revision and update, respectively. However, both of them are too permissive to exclude some unreasonable changes in the iteration. In response to this weakness, the DP postulates and its extensions for iterated belief revision were presented. Furthermore, Ferme and Goncalves integrated these postulates in belief update. Unfortunately, some redundant components are included in the definitions of belief states and the faithful assignments for semantic characterizations. Moreover, their approach does not meet the desired property of iterated belief update. They also do not discuss the rationale of any DP postulate within the update context. This paper is intended to fix these deficiencies of Ferme and Goncalves’s approach. Firstly, we present a modification of the original KM postulates based on belief states, and propose the notion of faithful collective assignments of belief states to partial preorders. Subsequently, we migrate several well-known postulates for iterated belief revision to iterated belief update. Moreover, we provide the exact semantic characterizations based on partial preorders for each of the proposed postulates. Finally, we analyze the compatibility between the above iterated postulates and the KM postulates for belief update.

List of keywords

Knowledge Representation and Reasoning -> KRR: Belief change
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief

1975

PACIA: Parameter-Efficient Adapter for Few-Shot Molecular Property Prediction

Shiguang Wu, Yaqing Wang, Quanming Yao

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Molecular property prediction (MPP) plays a crucial role in biomedical applications, but it often encounters challenges due to a scarcity of labeled data. Existing works commonly adopt gradient-based strategy to update a large amount of parameters for task-level adaptation. However, the increase of adaptive parameters can lead to overfitting and poor performance. Observing that graph neural network (GNN) performs well as both encoder and predictor, we propose PACIA, a parameter-efficient GNN adapter for few-shot MPP. We design a unified adapter to generate a few adaptive parameters to modulate the message passing process of GNN. We then adopt a hierarchical adaptation mechanism to adapt the encoder at task-level and the predictor at query-level by the unified GNN adapter. Extensive results show that PACIA obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Few-shot learning

1982

Instance-Level Metalearning for Outlier Detection

Long Vu, Peter Kirchner, Charu C. Aggarwal, Horst Samulowitz

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

A machine learning task can be viewed as a sequential pipeline of different algorithmic choices, including data preprocessing, model selection, and hyper-parameter tuning. Automated machine learning selects this sequence in an automated manner. While such approaches are natural in supervised settings, they remain challenging for unsupervised tasks such as outlier detection because of the lack of availability of label-centric feedback. In this paper, we present an instance-level metalearning approach for outlier detection. This approach learns how outlier instances are related to normal points in many labeled data sets to create a supervised meta-model. This meta-model is then used on a new (unlabeled) data set to predict outliers. We show the robustness of our approach on several benchmarks from the OpenML repository.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Meta-learning

1993

Natural Language-centered Inference Network for Multi-modal Fake News Detection

Qiang Zhang, Jiawei Liu, Fanrui Zhang, Jingyi Xie, Zheng-Jun Zha

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

The proliferation of fake news with image and text in the internet has triggered widespread concern. Existing research has made important contributions in cross-modal information interaction and fusion, but fails to fundamentally address the modality gap among news image, text, and news-related external knowledge representations. In this paper, we propose a novel Natural Language-centered Inference Network (NLIN) for multi-modal fake news detection by aligning multi-modal news content with the natural language space and introducing an encoder-decoder architecture to fully comprehend the news in-context. Specifically, we first unify multi-modal news content into textual modality by converting news images and news-related external knowledge into plain textual content. Then, we design a multi-modal feature reasoning module, which consists of a multi-modal encoder, a unified-modal context encoder and an inference decoder with prompt phrase. This framework not only fully extracts the latent representation of cross-modal news content, but also utilizes the prompt phrase to stimulate the powerful in-context learning ability of the pre-trained large language model to reason about the truthfulness of the news content. In addition, to support the research in the field of multi-modal fake news detection, we produce a challenging large scale, multi-platform, multi-domain multi-modal Chinese Fake News Detection (CFND) dataset. Extensive experiments show that our CFND dataset is challenging and the proposed NLIN outperforms state-of-the-art methods.

List of keywords

Data Mining -> DM: Mining text, web, social media
Multidisciplinary Topics and Applications -> MTA: News and media

1997

Kernel Readout for Graph Neural Networks

Jiajun Yu, Zhihao Wu, Jinyu Cai, Adele Lu Jia, Jicong Fan

6 min. talk | August 9th at 11:30 | Session: DM: Mining graphs (3/3)

[+] More

[-] Less

Graph neural networks (GNNs) for graph classification or representation learning require a pooling operation to convert the nodes’ embeddings of each graph to a vector as the graph-level representation and the operation has a significant impact on model accuracy. The paper presents a novel graph pooling method called Kernel Readout (KerRead). KerRead maps the node embeddings from the sample space with limited nodes to an augmented sample space with infinite nodes, and then calculates the inner product between some learnable adaptive centers and the augmented node embeddings, which forms a final graph-level feature vector. We apply the proposed strategy to six supervised and two unsupervised graph neural networks such as GCN, GIN, GUNet, InfoGraph, and GraphCL, and the experiments on eight benchmark datasets show that the proposed readout outperforms classical pooling methods such as Sum and seven state-of-the-art pooling methods such as SRead and Janossy GRU. Code and Appendix are both available at https://github.com/jiajunCAU/KerRead.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning

2005

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou

6 min. talk | August 8th at 11:30 | Session: ETF: AI Ethics, Trust, Fairness (2/2)

[+] More

[-] Less

While in-processing fairness approaches show promise in mitigating bias predictions, their potential impact on privacy leakage remains under-explored. We aim to address this gap by assessing the privacy risks of fairness-enhanced binary classifiers with membership inference attacks (MIAs). Surprisingly, our results reveal that these fairness interventions exhibit increased resilience against existing attacks, indicating that enhancing fairness does not necessarily lead to privacy compromises. However, we find current attack methods are ineffective as they typically degrade into simple threshold models with limited attack effectiveness. Following this observation, we discover a novel threat dubbed Fairness Discrepancy Membership Inference Attacks (FD-MIA) that exploits prediction discrepancies between fair and biased models. This attack reveals more potent vulnerabilities and poses significant privacy risks to model privacy. Extensive experiments across multiple datasets, attack methods, and representative fairness approaches confirm our findings and demonstrate the efficacy of the proposed attack method. Our study exposes the overlooked privacy threats in fairness studies, advocating for thorough evaluations of potential security vulnerabilities before model deployments.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

2006

Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang

6 min. talk | August 9th at 10:00 | Session: ETF: Trustworthy AI

[+] More

[-] Less

With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs’ internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Natural Language Processing -> NLP: Applications

2007

SDformer: Transformer with Spectral Filter and Dynamic Attention for Multivariate Time Series Long-term Forecasting

Ziyu Zhou, Gengyu Lyu, Yiming Huang, Zihao Wang, Ziyu Jia, Zhen Yang

12 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Transformer has gained widespread adoption in modeling time series due to the exceptional ability of its self-attention mechanism in capturing long-range dependencies. However, when processing time series data with numerous variates, the vanilla self-attention mechanism tends to distribute attention weights evenly and smoothly, causing row-homogenization in attention maps and further hampering time series forecasting. To tackle this issue, we propose an advanced Transformer architecture entitled SDformer, which designs two novel modules, Spectral-Filter-Transform (SFT) and Dynamic-Directional-Attention (DDA), and integrates them into the encoder of Transformer to achieve more intensive attention allocation. Specifically, the SFT module utilizes the Fast Fourier Transform to select the most prominent frequencies, along with a Hamming Window to smooth and denoise the filtered series data; The DDA module applies a specialized kernel function to the query and key vectors projected from the denoised data, concentrating this innovative attention mechanism more effectively on the most informative variates to obtain a sharper attention distribution. These two modules jointly enable attention weights to be more salient among numerous variates, which in turn enhances the attention’s ability to capture multivariate correlations, improving the performance in forecasting. Extensive experiments on public datasets demonstrate its superior performance over other state-of-the-art models. Code is available at https://github.com/zhouziyu02/SDformer.

List of keywords

Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Attention models

2024

VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Shijie Fang, Qianhan Feng, Tong Lin

6 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is a universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification error rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.

List of keywords

Machine Learning -> ML: Semi-supervised learning

2026

Balancing Multimodal Learning via Online Logit Modulation

Daoming Zong, Chaoyue Ding, Baoxiang Li, Jiakui Li, Ken Zheng

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

Multimodal learning is provably superior to unimodal learning. However, in practice, the best-performing unimodal networks often outperform jointly trained multimodal networks. This phenomenon can be attributed to the varying convergence and generalization rates across different modalities, leading to the dominance of one modality and causing underfitting of other modalities in simple multimodal joint training. To mitigate this issue, we propose two key ingredients: i) disentangling the learning of unimodal features and multimodal interaction through an intermediate representation fusion block; ii) modulating the logits of different modalities via dynamic coefficients during training to align their magnitudes with the target values, referred to as online logit modulation (OLM). Remarkably, OLM is model-agnostic and can be seamlessly integrated with most existing multimodal training frameworks. Empirical evidence shows that our approach brings significant enhancements over baselines on a wide range of multimodal tasks, covering video, audio, text, image, and depth modalities.

List of keywords

Machine Learning -> ML: Optimization
Computer Vision -> CV: Multimodal learning
Machine Learning -> ML: Applications
Machine Learning -> ML: Attention models

2029

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

Yiqun Chen, Hangyu Mao, Jiaxin Mao, Shiguang Wu, Tianle Zhang, Bin Zhang, Wei Yang, Hongxing Chang

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint Q-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual Q-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent’s local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamless integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

2034

HeterGCL: Graph Contrastive Learning Framework on Heterophilic Graph

Chenhao Wang, Yong Liu, Yan Yang, Wei Li

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Graph Contrastive Learning (GCL) has attracted significant research attention due to its self-supervised ability to learn robust node representations. Unfortunately, most methods primarily focus on homophilic graphs, rendering them less effective for heterophilic graphs. In addition, the complexity of node interactions in heterophilic graphs poses considerable challenges to augmentation schemes, coding architectures, and contrastive designs for traditional GCL. In this work, we propose HeterGCL, a novel graph contrastive learning framework with structural and semantic learning to explore the true potential of GCL on heterophilic graphs. Specifically, We abandon the random augmentation scheme that leads to the destruction of the graph structure, instead introduce an adaptive neighbor aggregation strategy (ANA) to extract topology-supervised signals from neighboring nodes at different distances and explore the structural information with an adaptive local-to-global contrastive loss. In the semantic learning module, we jointly consider the original nodes’ features and the similarity between nodes in the latent feature space to explore hidden associations between nodes. Experimental results on homophilic and heterophilic graphs demonstrate that HeterGCL outperforms existing self-supervised and semi-supervised baselines across various downstream tasks.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Self-supervised Learning

2043

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

Guozheng Li, Peng Wang, Wenjun Ke, Yikai Guo, Ke Ji, Ziyu Shang, Jiajun Liu, Zijie Xu

12 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from training examples, and (2) enabling LLMs exhibit strong ICL abilities in RE. On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations. On the other hand, ICL with an LLM achieves poor performance in RE while RE is different from language modeling in nature or the LLM is not large enough. In this work, we propose a novel recall-retrieve-reason RE framework that synergizes LLMs with retrieval corpora (training examples) to enable relevant retrieving and reliable in-context reasoning. Specifically, we distill the consistently ontological knowledge from training datasets to let LLMs generate relevant entity pairs grounded by retrieval corpora as valid queries. These entity pairs are then used to retrieve relevant training examples from the retrieval corpora as demonstrations for LLMs to conduct better ICL via instruction tuning. Extensive experiments on different LLMs and RE datasets demonstrate that our method generates relevant and valid entity pairs and boosts ICL abilities of LLMs, achieving competitive or new state-of-the-art performance on sentence-level RE compared to previous supervised fine-tuning methods and ICL-based methods.

List of keywords

Natural Language Processing -> NLP: Information extraction

2045

ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction

Bo Qian, Zhenhuan Wei, Jiashuo Li, Xing Wei

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Efficiently estimating the full-body pose with minimal wearable devices presents a worthwhile research direction. Despite significant advancements in this field, most current research neglects to explore full-body avatar estimation under low-quality signal conditions, which is prevalent in practical usage. To bridge this gap, we summarize three scenarios that may be encountered in real-world applications: standard scenario, instantaneous data-loss scenario, and prolonged data-loss scenario, and propose a new evaluation benchmark. The solution we propose to address data-loss scenarios is integrating the full-body avatar pose estimation problem with motion prediction. Specifically, we present ReliaAvatar, a real-time, reliable avatar animator equipped with predictive modeling capabilities employing a dual-path architecture. ReliaAvatar operates effectively, with an impressive performance rate of 109 frames per second (fps). Extensive comparative evaluations on widely recognized benchmark datasets demonstrate ReliaAvatar’s superior performance in both standard and low data-quality conditions. The code is available at https://github.com/MIV-XJTU/ReliaAvatar.

List of keywords

Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-computer interaction
Humans and AI -> HAI: Personalization and user modeling
Robotics -> ROB: Human robot interaction

2051

Allocating Mixed Goods with Customized Fairness and Indivisibility Ratio

Bo Li, Zihao Li, Shengxin Liu, Zekai Wu

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We consider the problem of fairly allocating a combination of divisible and indivisible goods. While fairness criteria like envy-freeness (EF) and proportionality (PROP) can always be achieved for divisible goods, only their relaxed versions, such as the “up to one” relaxations EF1 and PROP1, can be satisfied when the goods are indivisible. The “up to one” relaxations require the fairness conditions to be satisfied provided that one good can be completely eliminated or added in the comparison. In this work, we bridge the gap between the two extremes and propose “up to a fraction” relaxations for the allocation of mixed divisible and indivisible goods. The fraction is determined based on the proportion of indivisible goods, which we call the indivisibility ratio. The new concepts also introduce asymmetric conditions that are customized for individuals with varying indivisibility ratios. We provide both upper and lower bounds on the fractions of the modified item in order to satisfy the fairness criterion. Our results are tight up to a constant for EF and asymptotically tight for PROP.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division

2053

Imperio: Language-Guided Backdoor Attacks for Arbitrary Model Control

Ka-Ho Chow, Wenqi Wei, Lei Yu

6 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

Natural language processing (NLP) has received unprecedented attention. While advancements in NLP models have led to extensive research into their backdoor vulnerabilities, the potential for these advancements to introduce new backdoor threats remains unexplored. This paper proposes Imperio, which harnesses the language understanding capabilities of NLP models to enrich backdoor attacks. Imperio provides a new model control experience. Demonstrated through controlling image classifiers, it empowers the adversary to manipulate the victim model with arbitrary output through language-guided instructions. This is achieved using a language model to fuel a conditional trigger generator, with optimizations designed to extend its language understanding capabilities to backdoor instruction interpretation and execution. Our experiments across three datasets, five attacks, and nine defenses confirm Imperio’s effectiveness. It can produce contextually adaptive triggers from text descriptions and control the victim model with desired outputs, even in scenarios not encountered during training. The attack reaches a high success rate without compromising the accuracy of clean inputs and exhibits resilience against representative defenses. Supplementary materials are available at https://khchow.com/Imperio.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

2055

Cross-View Contrastive Fusion for Enhanced Molecular Property Prediction

Yan Zheng, Song Wu, Junyu Lin, Yazhou Ren, Jing He, Xiaorong Pu, Lifang He

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Machine learning based molecular property prediction has been a hot topic in the field of computer aided drug discovery (CADD). However, current MPP methods face two prominent challenges: 1) single-view MPP methods do not sufficiently exploit the complementary information of molecular data across multiple views, generally producing suboptimal performance, and 2) most existing multi-view MPP methods ignore the disparities in data quality among different views, inadvertently introducing the risk of models being overshadowed by inferior views. To address the above challenges, we introduce a novel cross-view contrastive fusion for enhanced molecular property prediction method (MolFuse). First, we extract intricate molecular semantics and structures from both sequence and graph views to leverage the complementarity of multi-view data. Then, MolFuse employs two distinct graphs, the atomic graph and chemical bond graph, to enhance the representation of the molecular graph, allow us to integrate both the fundamental backbone attributes and the nuanced shape characteristics. Notably, we incorporate a dual learning mechanism to refine the initial feature representations, and global features are obtained by maximizing the coherence among diverse view-specific molecular representations for the downstream task. The overall learning processes are combined into a unified optimization problem for iterative training. Experiments on multiple benchmark datasets demonstrate the superiority of our MolFuse.

List of keywords

Machine Learning -> ML: Multi-view learning
Data Mining -> DM: Mining graphs
Multidisciplinary Topics and Applications -> MTA: Bioinformatics

2072

Langshaw: Declarative Interaction Protocols Based on Sayso and Conflict

Munindar P. Singh, Samuel H. Christie V., Amit K. Chopra

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

Current languages for specifying multiagent protocols either over-constrain protocol enactments or complicate capturing their meanings. We propose Langshaw, a declarative protocol language based on (1) sayso, a new construct that captures who has priority over setting each attribute, and (2) nono and nogo, two constructs to capture conflicts between actions. Langshaw combines flexibility with an information model to express meaning. We give a formal semantics for Langshaw, procedures for determining the safety and liveness of a protocol, and a method to generate a message-oriented protocol (embedding needed coordination) suitable for flexible asynchronous enactment.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Agent-based and Multi-agent Systems -> General

2089

Vulnerabilities of Single-Round Incentive Compatibility in Auto-bidding: Theory and Evidence from ROI-Constrained Online Advertising Markets

Juncheng Li, Pingzhong Tang

6 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

Most of the work in the auction design literature assumes that bidders behave rationally based on the information available for every individual auction, and the revelation principle enables designers to restrict their efforts to incentive compatible (IC) mechanisms. However, in today’s online advertising markets, one of the most important real-life applications of auction design, the data and computational power required to bid optimally are only available to the platform, and an advertiser can only participate by setting performance objectives and constraints for its proxy auto-bidder provided by the platform. The prevalence of auto-bidding necessitates a review of auction theory. In this paper, we examine the markets through the lens of ROI-constrained value-maximizing campaigns. We show that second price auction exhibits many undesirable properties (computational hardness, non-monotonicity, instability of bidders’ utilities, and interference in A/B testing) and loses its dominant theoretical advantages in single-item scenarios. In addition, we make it clear how IC and its runner-up-winner interdependence contribute to each property. We hope that our work could bring new perspectives to the community and benefit practitioners to attain a better grasp of real-world markets.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MTA: Economics

2093

Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation

Xingyu Wu, Yan Zhong, Jibin Wu, Bingbing Jiang, Kay Chen Tan

12 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Algorithm selection, a critical process of automated machine learning, aims to identify the most suitable algorithm for solving a specific problem prior to execution. Mainstream algorithm selection techniques heavily rely on problem features, while the role of algorithm features remains largely unexplored. Due to the intrinsic complexity of algorithms, effective methods for universally extracting algorithm information are lacking. This paper takes a significant step towards bridging this gap by introducing Large Language Models (LLMs) into algorithm selection for the first time. By comprehending the code text, LLM not only captures the structural and semantic aspects of the algorithm, but also demonstrates contextual awareness and library function understanding. The high-dimensional algorithm representation extracted by LLM, after undergoing a feature selection module, is combined with the problem representation and passed to the similarity calculation module. The selected algorithm is determined by the matching degree between a given problem and different algorithms. Extensive experiments validate the performance superiority of the proposed model and the efficacy of each key module. Furthermore, we present a theoretical upper bound on model complexity, showcasing the influence of algorithm representation and feature selection modules. This provides valuable theoretical guidance for the practical implementation of our method.

List of keywords

Machine Learning -> ML: Automated machine learning
Search -> S: Algorithm portfolios and configuration
Machine Learning -> ML: Applications
Natural Language Processing -> NLP: Language models

2119

Seed Selection in the Heterogeneous Moran Process

Petros Petsinis, Andreas Pavlogiannis, Josef Tkadlec, Panagiotis Karras

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

The Moran process is a classic stochastic process that models the rise and takeover of novel traits in network-structured populations. In biological terms, a set of mutants, each with fitness m ∈ (0, ∞) invade a population of residents with fitness 1. Each agent reproduces at a rate proportional to its fitness and each offspring replaces a random network neighbor. The process ends when the mutants either fixate (take over the whole population) or go extinct. The fixation probability measures the success of the invasion. To account for environmental heterogeneity, we study a generalization of the Standard process, called the Heterogeneous Moran process. Here, the fitness of each agent is determined both by its type (resident/mutant) and the node it occupies. We study the natural optimization problem of seed selection: given a budget k, which k agents should initiate the mutant invasion to maximize the fixation probability? We show that the problem is strongly inapproximable: it is NP-hard to distinguish between maximum fixation probability 0 and 1. We then focus on mutant-biased networks, where each node exhibits at least as large mutant fitness as resident fitness. We show that the problem remains NP-hard, but the fixation probability becomes submodular, and thus the optimization problem admits a greedy (1 − 1/e)-approximation. An experimental evaluation of the greedy algorithm along with various heuristics on real-world data sets corroborates our results.

List of keywords

Data Mining -> DM: Networks
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Search -> S: Evolutionary computation

2126

Score-CDM: Score-Weighted Convolutional Diffusion Model for Multivariate Time Series Imputation

Shunyang Zhang, Senzhang Wang, Hao Miao, Hao Chen, Changjun Fan, Jian Zhang

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Multivariant time series (MTS) data are usually incomplete in real scenarios, and imputing the incomplete MTS is practically important to facilitate various time series mining tasks. Recently, diffusion model-based MTS imputation methods have achieved promising results by utilizing CNN or attention mechanisms for temporal features learning. However, it is hard to adaptively trade off the diverse effects of local and global temporal features by simply combining CNN and attention. To address this issue, we propose a Score-weighted Convolutional Diffusion Model (Score-CDM for short), whose backbone consists of a Score-weighted Convolution Module (SCM) and an Adaptive Reception Module (ARM). SCM adopts a score map to capture the global temporal features in the time domain, while ARM uses a Spectral2Time Window Block (S2TWB) to convolve the local time series data in the spectral domain. Benefiting from the time convolution properties of Fast Fourier Transformation, ARM can adaptively change the receptive field of the score map, and thus effectively balance the local and global temporal features. We conduct extensive evaluations on three real MTS datasets of different domains, and the result verifies the effectiveness of the proposed Score-CDM.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

2127

Cooperation and Control in Delegation Games

Oliver Sourbut, Lewis Hammond, Harriet Wood

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

Many settings of interest involving humans and machines – from virtual personal assistants to autonomous vehicles – can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals’ behalf. We refer to these multi-principal, multi-agent scenarios as delegation games. In such games, there are two important failure modes: problems of control (where an agent fails to act in line their principal’s preferences) and problems of cooperation (where the agents fail to work well together). In this paper we formalise and analyse these problems, further breaking them down into issues of alignment (do the players have similar preferences?) and capabilities (how competent are the players at satisfying those preferences?). We show – theoretically and empirically – how these measures determine the principals’ welfare, how they can be estimated using limited observations, and thus how they might be used to help us design more aligned and cooperative AI systems.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Game Theory and Economic Paradigms -> GTEP: Other
Humans and AI -> HAI: Human-AI collaboration

2128

RealDex: Towards Human-like Grasping for Robotic Dexterous Hand

Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma

6 min. talk | August 6th at 11:30 | Session: ROB: Robotics (1/2)

[+] More

[-] Less

In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data. Utilizing a teleoperation system, we seamlessly synchronize human-robot hand poses in real time. This collection of human-like motions is crucial for training dexterous hands to mimic human movements more naturally and precisely. RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios. Moreover, we introduce a cutting-edge dexterous grasping motion generation framework, which aligns with human experience and enhances real-world applicability through effectively utilizing Multimodal Large Language Models. Extensive experiments have demonstrated the superior performance of our method on RealDex and other open datasets. The dataset and associated code are available at https://4dvlab.github.io/RealDex_page/.

List of keywords

Robotics -> ROB: Learning in robotics
Robotics -> ROB: Manipulation
Robotics -> ROB: Robotics and vision

2140

Facility Location Problems with Capacity Constraints: Two Facilities and Beyond

Gennaro Auricchio, Zihe Wang, Jie Zhang

6 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

In this paper, we investigate the Mechanism Design aspects of the m-Capacitated Facility Location Problem (m-CFLP) on a line. We focus on two frameworks. In the first framework, the number of facilities is arbitrary, all facilities have the same capacity, and the number of agents is equal to the total capacity of all facilities. In the second framework, we aim to place two facilities, each with a capacity of at least half of the total agents. For both of these frameworks, we propose truthful mechanisms with bounded approximation ratios with respect to the Social Cost (SC) and the Maximum Cost (MC). When m>2, the result sharply contrasts with the impossibility results known for the classic m-Facility Location Problem, where capacity constraints are not considered. Furthermore, all our mechanisms are optimal with respect to the MC and optimal or nearly optimal with respect to the SC among anonymous mechanisms. For both frameworks, we provide a lower bound on the approximation ratio that any truthful and deterministic mechanism can achieve with respect to the SC and MC.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Resource allocation

2144

Robust Reward Placement under Uncertainty

Petros Petsinis, Kaichen Zhang, Andreas Pavlogiannis, Jingbo Zhou, Panagiotis Karras

6 min. talk | August 9th at 11:30 | Session: PS: Planning and Scheduling (2/2)

[+] More

[-] Less

We consider a problem of placing generators of rewards to be collected by randomly moving agents in a network. In many settings, the precise mobility pattern may be one of several possible, based on parameters outside our control, such as weather conditions. The placement should be robust to this uncertainty, to gain a competent total reward across possible networks. To study such scenarios, we introduce the Robust Reward Placement problem (RRP). Agents move randomly by a Markovian Mobility Model with a predetermined set of locations whose connectivity is chosen adversarially from a known set Π of candidates. We aim to select a set of reward states within a budget that maximizes the minimum ratio, among all candidates in Π, of the collected total reward over the optimal collectable reward under the same candidate. We prove that RRP is NP-hard and inapproximable, and develop Ψ-Saturate, a pseudo-polynomial time algorithm that achieves an ϵ-additive approximation by exceeding the budget constraint by a factor that scales as O(ln |Π|/ϵ). In addition, we present several heuristics, most prominently one inspired by a dynamic programming algorithm for the max–min 0–1 KNAPSACK problem. We corroborate our theoretical analysis with an experimental evaluation on synthetic and real data.

List of keywords

Planning and Scheduling -> PS: Planning under uncertainty
Data Mining -> DM: Networks
Planning and Scheduling -> PS: Markov decisions processes
Search -> S: Combinatorial search and optimisation

2165

A Deep Reinforcement Learning Approach to Balance Viewport Prediction and Video Transmission in 360° Video Streaming

Guanghui Zhang, Jing Guo

6 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

360° video streaming has seen tremendous growth in past years. However, our measurement reveals a dilemma that severely limits QoE. On the one hand, viewport prediction requires the shortest possible prediction distance for high predicting accuracy; On the other hand, video transmission requires more buffered data to compensate for bandwidth fluctuations otherwise substantial playback rebuffering would be incurred. Since no existing method can break this dilemma, the QoE optimization was naturally bottlenecked. This work tackles this challenge by developing QUTA – a novel learning-based streaming system. Specifically, our measurement shows that three kinds of internal streaming parameters have significant impacts on the prediction distance, namely, download pause, data rate threshold, and playback rate. On top of this, we design a new long-term-planning (LTP) learning method that tunes the parameters dynamically based on the network and streaming context. Evaluations with large-scale streaming trace data show that QUTA not only improves the prediction accuracy and QoE by up to 68.4% but also exhibits strong robustness.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Transportation

2184

Comparing Ways of Obtaining Candidate Orderings from Approval Ballots

Théo Delemazure, Chris Dong, Dominik Peters, Magdalena Tydrichova

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

To understand and summarize approval preferences and other binary evaluation data, it is useful to order the items on an axis which explains the data. In a political election using approval voting, this could be an ideological left-right axis such that each voter approves adjacent candidates, an analogue of single-peakedness. In a perfect axis, every approval set would be an interval, which is usually not possible, and so we need to choose an axis that gets closest to this ideal. The literature has developed algorithms for optimizing several objective functions (e.g., minimize the number of added approvals needed to get a perfect axis), but provides little help with choosing among different objectives. In this paper, we take a social choice approach and compare 5 different axis selection rules axiomatically, by studying the properties they satisfy. We establish some impossibility theorems, and characterize (within the class of scoring rules) the rule that chooses the axes that maximize the number of votes that form intervals, using the axioms of ballot monotonicity and resistance to cloning. Finally, we study the behavior of the rules on data from French election surveys, on the votes of justices of the US Supreme Court, and on synthetic data.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

2188

Determining Winners in Elections with Absent Votes

Qishen Han, Amelie Marian, Lirong Xia

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

An important question in elections is determining whether a candidate can be a winner when some votes are absent. We study this determining winner with absent votes (WAV) problem with elections that take top-truncated ballots. We show that the WAV problem is NP-complete for single transferable vote, Maximin, and Copeland, and propose a special case of positional scoring rule such that the problem can be computed in polynomial time. Our results for top-truncated rankings differ from the results in full rankings as their hardness results still hold when the number of candidates or the number of missing votes are bounded, while we show that the problem can be solved in polynomial time in either case.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

2224

NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning

Nathaniel Weir, Peter Clark, Benjamin Van Durme

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Our goal is to develop a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e.g., Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.

List of keywords

Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
Natural Language Processing -> NLP: Question answering
Search -> S: Other

2226

Hypergraph Self-supervised Learning with Sampling-efficient Signals

Fan Li, Xiaoyang Wang, Dawei Cheng, Wenjie Zhang, Ying Zhang, Xuemin Lin

12 min. talk | August 7th at 11:30 | Session: ML: Self-supervised Learning

[+] More

[-] Less

Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimilar pairs, causing training bias. (2) They often require a large number of negative samples, resulting in expensive computational costs. To address the above issues, we propose SE-HSSL, a hypergraph SSL framework with three sampling-efficient self-supervised signals. Specifically, we introduce two sampling-free objectives leveraging the canonical correlation analysis as the node-level and group-level self-supervised signals. Additionally, we develop a novel hierarchical membership-level contrast objective motivated by the cascading overlap relationship in hypergraphs, which can further reduce membership sampling bias and improve the efficiency of sample utilization. Through comprehensive experiments on 7 real-world hypergraphs, we demonstrate the superiority of our approach over the state-of-the-art method in terms of both effectiveness and efficiency.

List of keywords

Machine Learning -> ML: Self-supervised Learning
Data Mining -> DM: Mining graphs

2228

Predictive Modeling with Temporal Graphical Representation on Electronic Health Records

Jiayuan Chen, Changchang Yin, Yuanlong Wang, Ping Zhang

6 min. talk | August 9th at 11:30 | Session: MTA: Health and medicine

[+] More

[-] Less

Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient’s EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient’s EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient’s health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Data Mining -> DM: Applications

2243

Towards Sharper Generalization Bounds for Adversarial Contrastive Learning

Wen Wen, Han Li, Tieliang Gong, Hong Chen

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

Recently, the enhancement on the adversarial robustness of machine learning algorithms has gained significant attention across various application domains. Given the widespread label scarcity issue in real-world data, adversarial contrastive learning (ACL) has been proposed to adversarially train robust models using unlabeled data. Despite the empirical success, its generalization behavior remains poorly understood and far from being well-characterized. This paper aims to address this issue from a learning theory perspective. We establish novel high-probability generalization bounds for the general Lipschitz loss functions. The derived bounds scale O(log(k)) with respect to the number of negative samples k, which improves the existing linear dependency bounds. Our results are generally applicable to many prediction models, including linear models and deep neural networks. In particular, we obtain an optimistic generalization bound O(1/n) under the smoothness assumption of the loss function on the sample size n. To the best of our knowledge, this is the first fast-rate bound valid for ACL. Empirical evaluations on real-world datasets verify our theoretical findings.

List of keywords

Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Self-supervised Learning

2249

FastScene: Text-Driven Fast Indoor 3D Scene Generation via Panoramic Gaussian Splatting

Yikun Ma, Dandan Zhan, Zhi Jin

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Scene analysis and understanding

2255

Motion-Aware Heatmap Regression for Human Pose Estimation in Videos

Inpyo Song, Jongmin Lee, Moonwook Ryu, Jangwon Lee

6 min. talk | August 7th at 11:30 | Session: CV: Biometrics, face, gesture and pose recognition

[+] More

[-] Less

We present an approach to solving 2D human pose estimation in videos. The problem of human pose estimation in videos differs from estimating human poses in static images since videos contain a lot of motion related information. Thus, we investigate how to utilize by the information of the human body movements across in a sequence of video frames for estimating human poses in videos. To do this, we introduce a novel heatmap regression method what we call motion-aware heatmap regression. Our approach computes motion vectors in joint keypoints from adjacent frames. We then design a new style of heatmap that we call Motion-Aware Heatmaps to reflect the motion uncertainty of each joint point. Unlike traditional heatmaps, our motion-aware heatmaps not only consider the current joint locations but also account how joints move over time. Furthermore, we introduce a simple yet effective framework designed to incorporate motion information into heatmap regression. We evaluate our motion-aware heatmap regression on PoseTrack(2018, 21) and Sub-JHMDB datasets. Our results validate that the proposed motion-aware heatmaps significantly improve the precision of human pose estimation in videos, particularly in challenging scenarios such as videos like sports game footage with substantial human motions. (Code and related materials are available at https://github.com/Songinpyo/MTPose.)

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Video analysis and understanding

2260

Continual Multi-View Clustering with Consistent Anchor Guidance

Chao Zhang, Deng Xu, Xiuyi Jia, Chunlin Chen, Huaxiong Li

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view clustering (MVC) has recently attracted much attention. Most existing approaches are designed for fixed multi-view data, and cannot deal with the common streaming data in real world. In this paper, we address this problem by proposing a consistent Anchor guided Continual MVC (ACMVC) method in a two-stage way. In initial learning stage, a low-rank anchor graph based model is constructed. In continual learning stage, to leverage the historical knowledge, the multi-level anchor information is reused to refine the model via adding consistency regularization. It not only provides prior knowledge to enhance the exploration on current data, but also captures the similarity relationship between previous and current data, enabling a comprehensive exploitation on streaming data. The proposed model can be optimized efficiently with linear time and space complexity. Experiments demonstrate the effectiveness and efficiency of our method compared with some state-of-the-art approaches.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Unsupervised learning
Data Mining -> DM: Mining data streams

2262

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, Bin Liu

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.

List of keywords

Machine Learning -> ML: Reinforcement learning
Natural Language Processing -> NLP: Language models
Uncertainty in AI -> UAI: Sequential decision making

2264

Learning Hierarchy-Enhanced POI Category Representations Using Disentangled Mobility Sequences

Hongwei Jia, Meng Chen, Weiming Huang, Kai Zhao, Yongshun Gong

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Points of interest (POIs) carry a wealth of semantic information of varying locations in cities and thus have been widely used to enable various location-based services. To understand POI semantics, existing methods usually model contextual correlations of POI categories in users’ check-in sequences and embed categories into a latent space based on the word2vec framework. However, such an approach does not fully capture the underlying hierarchical relationship between POI categories and can hardly integrate the category hierarchy into various deep sequential models. To overcome this shortcoming, we propose a Semantically Disentangled POI Category Embedding Model (SD-CEM) to generate hierarchy-enhanced category representations using disentangled mobility sequences. Specifically, first, we construct disentangled mobility sequences using human mobility data based on the semantics of POIs. Then we utilize the POI category hierarchy to initialize a hierarchy-enhanced representation for each category in the disentangled sequences, employing an attention mechanism. Finally, we optimize these category representations by incorporating both the masked category prediction task and the next category prediction task. To evaluate the effectiveness of SD-CEM, we conduct comprehensive experiments using two check-in datasets covering three tasks. Experimental results demonstrate that SD-CEM outperforms several competitive baselines, highlighting its substantial improvement in performance as well as the understanding of learned category representations.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

2280

Protecting Split Learning by Potential Energy Loss

Fei Zheng, Chaochao Chen, Lingjuan Lyu, Xinyi Fu, Xing Fu, Weiqiang Wang, Xiaolin Zheng, Jianwei Yin

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings more ‘complicated’, by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.

List of keywords

Machine Learning -> ML: Federated learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Multidisciplinary Topics and Applications -> MTA: Security and privacy

2282

EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost

Jaeseung Heo, Seungbeom Lee, Sungsoo Ahn, Dongwoo Kim

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Data augmentation plays a critical role in improving model performance across various domains, but it becomes challenging with graph data due to their complex and irregular structure. To address this issue, we propose EPIC (Edit Path Interpolation via learnable Cost), a novel interpolation-based method for augmenting graph datasets. To interpolate between two graphs lying in an irregular domain, EPIC leverages the concept of graph edit distance, constructing an edit path that represents the transformation process between two graphs via edit operations. Moreover, our method introduces a context-sensitive cost model that accounts for the importance of specific edit operations formulated through a learning framework. This allows for a more nuanced transformation process, where the edit distance is not merely count-based but reflects meaningful graph attributes. With randomly sampled graphs from the edit path, we enrich the training set to enhance the generalization capability of classification models. Experimental evaluations across several benchmark datasets demonstrate that our approach outperforms existing augmentation techniques in many tasks.

List of keywords

Machine Learning -> ML: Sequence and graph learning

2284

Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

Pengfei Shi, Wei Zhong, Xinyu Zhang, Ningtao Wang, Xing Fu, Weiqiang Wang, Yin Jin

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Estimating the conditional average treatment effects (CATE) is very important in causal inference and has a wide range of applications across many fields. In the estimation process of CATE, the unconfoundedness assumption is typically required to ensure the identifiability of the regression problems. When estimating CATE using high-dimensional data, there have been many variable selection methods and neural network approaches based on representation learning, while these methods do not provide a way to verify whether the subset of variables after dimensionality reduction or the learned representations still satisfy the unconfoundedness assumption during the estimation process, which can lead to ineffective estimates of the treatment effects. Additionally, these methods typically use data from only the treatment or control group when estimating the regression functions for each group. This paper proposes a novel neural network approach named CrossNet to learn a sufficient representation for the features, based on which we then estimate the CATE, where cross indicates that in estimating the regression functions, we used data from their own group as well as cross-utilized data from another group. Numerical simulations and empirical results demonstrate that our method outperforms the competitive approaches.

List of keywords

Machine Learning -> ML: Regression
Machine Learning -> ML: Causality
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Supervised Learning

2309

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

Chao Li, Yujing Hu, Shangdong Yang, Tangjie Lv, Changjie Fan, Wenbin Li, Chongjie Zhang, Yang Gao

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning

2312

Modeling Personalized Retweeting Behaviors for Multi-Stage Cascade Popularity Prediction

Mingyang Zhou, Yanjie Lin, Gang Liu, Zuwen Li, Hao Liao, Rui Mao

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Predicting the size of message cascades is critical in various applications, such as online advertising and early detection of rumors. However, most existing deep learning approaches rely on cascade observation, which hinders accurate cascade prediction before message posting. Besides, these approaches overlook personalized retweeting behaviors that reflect users’ inclination to retweeting specific types of information. In this study, we propose a universal cascade prediction framework, namely Cascade prediction regarding Multiple Stage (CasMS), that effectively predicts cascade popularity across message generation stage as well as short-term and long-term stages. Unlike previous methods, our approach not only captures users’ personalized retweeting behaviors but also incorporates temporal cascade features. We perform the experiments in datasets collected ourselves as well as public datasets. The results show that our method significantly surpasses existing approaches in predicting the cascade during the message generation stage and different time periods in the cascade dynamics.

List of keywords

Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Recommender systems
Machine Learning -> ML: Applications

2319

Multi-Granularity Graph-Convolution-Based Method for Weakly Supervised Person Search

Haichun Tai, De Cheng, Jie Li, Nannan Wang, Xinbo Gao

12 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

One-step Weakly Supervised Person Search (WSPS) jointly performs pedestrian detection and person Re-IDentification (ReID) only with bounding box annotations, which makes the traditional person ReID problem more suitable and efficient for real-world applications. However, this task is very challenging due to the following reasons: 1) large feature gap between person ReID and general object detection tasks when learning shared representations; 2) difficult pseudo identity estimation for each person image with unrefined raw detection and dramatic scale changes. To address above issues, we propose a multi-granularity graph convolution framework to jointly optimize the aligned task features, as well as to assist the pseudo label estimation. Specifically, the multi-granularity feature alignment module (MFA) in the designed two-branch framework, employs cluster-level bi-directional interaction of various granularity information to narrow down the large feature gap. Further, upon the MFA module, we introduce the multi-granularity graph-convolution-based pseudo-label estimation module, to enhance feature representations for distinguishing diverse identities. Extensive experimental results demonstrate the effectiveness of the proposed method, and show superior performances to state-of-the art methods by a large margin on CUHK-SYSU and PRW datasets.

List of keywords

Computer Vision -> CV: Representation learning
Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Recognition (object detection, categorization)

2320

P2P: Transforming from Point Supervision to Explicit Visual Prompt for Object Detection and Segmentation

Guangqian Guo, Dian Shao, Chenguang Zhu, Sha Meng, Xuan Wang, Shan Gao

6 min. talk | August 8th at 10:00 | Session: ML: Weakly supervised learning

[+] More

[-] Less

Point-supervised vision tasks, including detection and segmentation, aiming to learn a network that transforms from points to pseudo labels, have attracted much attention in recent years. However, the lack of precise object size and boundary annotations in the point-supervised condition results in a large performance gap between point- and fully-supervised methods. In this paper, we propose a novel iterative learning framework, Point to Prompt (P2P), for point-supervised object detection and segmentation, with the key insight of transforming from point supervision to explicit visual prompt of the foundation model. The P2P is formulated as an iterative refinement process of two stages: Semantic Explicit Prompt Generation (SEPG) and Prompt Guided Spatial Refinement (PGSR). Specifically, SEPG serves as a prompt generator for generating semantic-explicit prompts from point input via a group-based learning strategy. In the PGSR stage, prompts guide the visual foundation model to further refine the object regions, by leveraging the outstanding generalization ability of the foundation model. The two stages are iterated multiple times to improve the quality of predictions progressively. Experimental results on multiple datasets demonstrate that P2P achieves SOTA performance in both detection and segmentation tasks, further narrowing the performance gap with fully-supervised methods. The source code and supplementary material can be found at https://github.com/guangqian-guo/P2P.

List of keywords

Machine Learning -> ML: Weakly supervised learning
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation

2322

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

Shiyin Dong, Mingrui Zhu, Kun Cheng, Nannan Wang, Xinbo Gao

12 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potential synergies between generative and discriminative models. In this paper, we propose Vermouth, a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an Adapted-Expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. We emphasize that there is no necessity for incorporating a heavyweight or intricate decoder to transform diffusion models into potent representation learners. Extensive comparative evaluations against tailored discriminative models showcase the efficacy of our approach on zero-shot sketch-based image retrieval (ZS-SBIR), few-shot classification, and open-vocabulary (OV) semantic segmentation tasks. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.

List of keywords

Computer Vision -> CV: Representation learning
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

2331

A Dataset and Model for Realistic License Plate Deblurring

Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu

6 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we introduce the first large-scale license plate deblurring dataset named License Plate Blur (LPBlur), captured by a dual-camera system and processed through a post-processing pipeline to avoid misalignment issues. Then, we propose a License Plate Deblurring Generative Adversarial Network (LPDGAN) to tackle the license plate deblurring: 1) a Feature Fusion Module to integrate multi-scale latent codes; 2) a Text Reconstruction Module to restore structure through textual modality; 3) a Partition Discriminator Module to enhance the model’s perception of details in each letter. Extensive experiments validate the reliability of the LPBlur dataset for both model training and testing, showcasing that our proposed model outperforms other state-of-the-art motion deblurring methods in realistic license plate deblurring scenarios. The dataset and code are available at https://github.com/haoyGONG/LPDGAN.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Applications
Computer Vision -> CV: Image and video synthesis and generation

2348

LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

Wentao Jiang, Jing Zhang, Di Wang, Qiming Zhang, Zengmao Wang, Bo Du

6 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

Due to spatial redundancy in remote sensing images, sparse tokens containing rich information are usually involved in self-attention (SA) to reduce the overall token numbers within the calculation, avoiding the high computational cost issue in Vision Transformers. However, such methods usually obtain sparse tokens by hand-crafted or parallel-unfriendly designs, posing a challenge to reach a better balance between efficiency and performance. Different from them, this paper proposes to use learnable meta tokens to formulate sparse tokens, which effectively learn key information meanwhile improving the inference speed. Technically, the meta tokens are first initialized from image tokens via cross-attention. Then, we propose Dual Cross-Attention (DCA) to promote information exchange between image tokens and meta tokens, where they serve as query and key (value) tokens alternatively in a dual-branch structure, significantly reducing the computational complexity compared to self-attention. By employing DCA in the early stages with dense visual tokens, we obtain the hierarchical architecture LeMeViT with various sizes. Experimental results in classification and dense prediction tasks show that LeMeViT has a significant 1.7 × speedup, fewer parameters, and competitive performance compared to the baseline models, and achieves a better trade-off between efficiency and performance. The code is released at https://github.com/ViTAE-Transformer/LeMeViT.

List of keywords

Computer Vision -> CV: Representation learning
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation

2354

Distribution-Independent Cell Type Identification for Single-Cell RNA-seq Data

Yuyao Zhai, Liang Chen, Minghua Deng

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Automatic cell type annotation aims to transfer the label knowledge from label-abundant reference data to label-scarce target data, which makes encouraging progress in single-cell RNA-seq data analysis. While previous works have focused on classifying close-set cells and detecting open-set cells during testing, it is still essential to be able to classify unknown cell types as human beings. Additionally, few efforts have been devoted to addressing the challenge of common long-tail dilemma in cell type annotation data. Therefore, in this paper, we propose an innovative distribution-independent universal cell type identification framework called scDET from the perspective of autonomously equilibrated dual-consultative contrastive learning. Our model can generate fine-grained predictions for both close-set and open-set cell types in a long-tailed open-world environment. scDET consists of a contrastive-learning branch and a pseudo-labeling branch, which work collaboratively to provide interactive supervision. Specifically, the contrastive-learning branch provides reliable distribution estimation to regularize the predictions of the pseudo-labeling branch, which in turn guides itself through self-balanced knowledge transfer and a designed novel soft contrastive loss. Extensive experimental results on various evaluation datasets demonstrate the superior performance of scDET over other state-of-the-art single-cell clustering and annotation methods.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Other

2360

Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates positive and negative events for complementary information extraction, followed by mutual supplementation and refinement. Particularly, we introduce Feature Fusion Modules (FFM) and Feature Exchange Modules (FEM). FFM is designed for the fusion of contextual information within neighboring event streams, leveraging the coupling relationship between positive and negative events to alleviate the misleading of noises in the respective branches. FEM efficiently promotes the fusion and exchange of information between positive and negative branches, enabling superior local information enhancement and global information complementation. Experimental results demonstrate that our approach achieves over 17% and 31% improvement on synthetic and real datasets, accompanied by a 2.3x acceleration. Furthermore, we evaluate our method on two downstream event-driven applications, i.e., object recognition and video reconstruction, achieving remarkable results that outperform existing methods. Our code and Supplementary Material are available at https://github.com/Lqm26/RMFNet.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Other

2365

Parameterized Complexity of Kidney Exchange Revisited

Úrsula Hébert-Johnson, Daniel Lokshtanov, Chinmay Sonar, Vaishali Surianarayanan

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

As of January 2023, there are more than 90,000 people on the national transplant waiting list in need of a kidney in the United States. These patients often have a friend or family member who is willing to donate, but whose kidney type might not be compatible. To help match these patients to suitable donors, patient-donor compatibility can be modeled as a directed graph. Specifically, in the Kidney Exchange problem, the input is a directed graph G, a subset B of vertices (altruistic donors), and two integers l_p and l_c. An altruistic donor is a donor who is not paired with a patient, and the remaining vertices are patient-donor pairs. Whenever a donor is compatible with a patient from a patient-donor pair, we place a directed edge from the donor vertex to the patient-donor pair. Here the donor vertex can be either altruistic or non-altruistic. The goal is to find a collection of vertex-disjoint cycles and paths covering the maximum number of patients such that each cycle has length at most l_c, and such that each path has length at most l_p and begins at a vertex in B. The path and cycle lengths are bounded so that the surgeries for a given path or cycle can be performed simultaneously. Kidney Exchange has received a great deal of attention in recent years. We contribute to this line of work by closing two open problems from IJCAI ’18 and IJCAI ’22: "Is Kidney Exchange FPT when parameterized by (i) the treewidth (omega) of G and (ii) the number of vertex types in G?” Two vertices have the same vertex type if they have the same in- and out-neighborhoods. We show that Kidney Exchange is FPT parameterized by the number of vertex types. On the other hand, we show W[1]-hardness with respect to omega. We also design a randomized 4^t * n^O(1)-time algorithm parameterized by t, the number of patients helped, significantly improving upon the previous state of the art, which was 161^t * n^O(1).

List of keywords

Agent-based and Multi-agent Systems -> MAS: Resource allocation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Computational social choice

2371

FreqFormer: Frequency-aware Transformer for Lightweight Image Super-resolution

Tao Dai, Jianping Wang, Hang Guo, Jinmin Li, Jinbao Wang, Zexuan Zhu

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Transformer-based models have been widely and successfully used in various low-vision visual tasks, and have achieved remarkable performance in single image super-resolution (SR). Despite the significant progress in SR, Transformer-based SR methods (e.g., SwinIR) still suffer from the problems of heavy computation cost and low-frequency preference, while ignoring the reconstruction of rich high-frequency information, hence hindering the representational power of Transformers. To address these issues, in this paper, we propose a novel Frequency-aware Transformer (FreqFormer) for lightweight image SR. Specifically, a Frequency Division Module (FDM) is first introduced to separately handle high- and low-frequency information in a divide-and-conquer manner. Moreover, we present Frequency-aware Transformer Block (FTB) to extracting both spatial frequency attention and channel transposed attention to recover high-frequency details. Extensive experimental results on public datasets demonstrate the superiority of our FreqFormer over state-of-the-art SR methods in terms of both quantitative metrics and visual quality. Code and models are available at https://github.com/JPWang-CS/FreqFormer.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Interpretability and transparency
Computer Vision -> CV: Machine learning for vision

2383

Machine Unlearning via Null Space Calibration

Huiqiang Chen, Tianqing Zhu, Xin Yu, Wanlei Zhou

6 min. talk | August 9th at 10:00 | Session: ETF: Trustworthy AI

[+] More

[-] Less

Machine unlearning aims to enable models to forget specific data instances when receiving deletion requests. Current research centers on efficient unlearning to erase the influence of data from the model and neglects the subsequent impacts on the remaining data. Consequently, existing unlearning algorithms degrade the model’s performance after unlearning, known as over-unlearning. This paper addresses this critical yet under-explored issue by introducing machine Unlearning via Null Space Calibration (UNSC), which can accurately unlearn target samples without over-unlearning. On the contrary, by calibrating the decision space during unlearning, UNSC can significantly improve the model’s performance on the remaining samples. In particular, our approach hinges on confining the unlearning process to a specified null space tailored to the remaining samples, which is augmented by strategically pseudo-labeling the unlearning samples. Comparison against several established baselines affirms the superiority of our approach.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Accountability
AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues
AI Ethics, Trust, Fairness -> ETF: Other

2388

Encoding Auxiliary Information to Restore Compressed Point Cloud Geometry

Gexin Liu, Jiahao Zhu, Dandan Ding, Zhan Ma

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

The standardized Geometry-based Point Cloud Compression (G-PCC) suffers from limited coding performance and low-quality reconstruction. To address this, we propose AuxGR, a performance-complexity tradeoff solution for point cloud geometry restoration: leveraging auxiliary bitstream to enhance the quality of G-PCC compressed point cloud geometry. This auxiliary bitstream efficiently encapsulates spatio-temporal information. For static coding, we perform paired information embedding (PIE) on the G-PCC decoded frame by employing target convolutions from its original counterpart, producing an auxiliary bitstream containing abundant original information. For dynamic coding, in addition to PIE, we propose temporal information embedding (TIE) to capture motion information between the previously restored and the current G-PCC decoded frames. TIE applies target kNN attention between them, which ensures the temporal neighborhood construction for each point and implicitly represents motions. Due to the similarity across temporal frames, only the residuals between TIE and PIE outputs are compressed as auxiliary bitstream. Experimental results demonstrate that AuxGR notably outperforms existing methods in both static and dynamic coding scenarios. Moreover, our framework enables the flexible incorporation of auxiliary information under computation constraints, which is attractive to real applications.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

2397

OTOcc: Optimal Transport for Occupancy Prediction

Pengteng Li, Ying He, F. Richard Yu, Pinhao Song, Xingchen Zhou, Guang Zhou

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

The autonomous driving community is highly interested in 3D occupancy prediction due to its outstanding geometric perception and object recognition capabilities. However, previous methods are limited to existing semantic conversion mechanisms for solving sparse ground truths problem, causing excessive computational demands and sub-optimal voxels representation. To tackle the above limitations, we propose OTOcc, a novel 3D occupancy prediction framework that models semantic conversion from 2D pixels to 3D voxels as Optimal Transport (OT) problem, offering accurate semantic mapping to adapt to sparse scenarios without attention or depth estimation. Specifically, the unit transportation cost between each demander (voxel) and supplier (pixel) pair is defined as the weighted occupancy prediction loss. Then, we utilize the Sinkhorn-Knopp Iteration to find the best mapping matrices with minimal transportation costs. To reduce the computational cost, we propose a block reading technique with multi-perspective feature representation, which also brings fine-grained scene understanding. Extensive experiments show that OTOcc not only has the competitive prediction performance but also has about more than 4.58% reduction in computational overhead compared to state-of-the-art methods.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision

2403

SCAT: A Time Series Forecasting with Spectral Central Alternating Transformers

Chengjie Zhou, Chao Che, Pengfei Wang, Qiang Zhang

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Time series forecasting has essential applications across various domains. For instance, forecasting power time series can optimize energy usage and bolster grid stability and reliability. Existing models based on transformer architecture are limited to classical design, ignoring the impact of spatial information and noise on model architecture design. Therefore, we propose an atypical design of Transformer-based models for multivariate time series forecasting. This design consists of two critical components: (i) spectral clustering center of time series employed as the focal point for attention computation; (ii) alternating attention mechanism wherein each query transformer is compatible with spectral clustering centers, executing attention at the sequence level instead of the token level. The alternating design has a two-fold benefit: firstly, it eliminates the uncertainty noise present in the dependent variable sequence of the channel input, and secondly, it incorporates the Euclidean distance to mitigate the impact of extreme values on the attention matrix, thereby aligning predictions more closely to the sequence’s natural progression. Experiments on ten real-world datasets, encompassing Wind, Electricity, Weather, and others, demonstrate that our Spectral Central Alternating Transformer (SCAT) outperforms state-of-the-art methods (SOTA) by an average of 42.8% in prediction performance in power time series forecasting.

List of keywords

Machine Learning -> ML: Time series and data streams

2414

Laying the Foundations for Solving FOND HTN Problems: Grounding, Search, Heuristics (and Benchmark Problems)

Mohammad Yousefi, Pascal Bercher

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

Building upon recent advancements in formalising Fully Observable Non-Deterministic (FOND) Hierarchical Task Network (HTN) planning, we present the first approach to find strong solutions for HTN problems with uncertainty in action outcomes. We present a search algorithm, along with a compilation that relaxes a FOND HTN problem to a deterministic one. This allows the utilisation of existing grounders and heuristics from the deterministic HTN planning literature.

List of keywords

Planning and Scheduling -> PS: Hierarchical planning
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Planning under uncertainty
Planning and Scheduling -> PS: Search in planning and scheduling

2426

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from expressiveness degradation or lack of versatility. To address this issue, we propose AnchorGT, a novel attention architecture for GTs with global receptive field and almost linear complexity, which serves as a flexible building block to improve the scalability of a wide range of GT models. Inspired by anchor-based GNNs, we employ structurally important k-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors, while retaining the global receptive field for all nodes. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures and structural encodings, resulting in reduced computational overhead without sacrificing performance. In addition, we theoretically prove that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures. Our experiments on three state-of-the-art GT models demonstrate that their AnchorGT variants can achieve similar results while being faster and significantly more memory efficient.

List of keywords

Machine Learning -> ML: Sequence and graph learning

2434

Diffusion Mask-Driven Visual-language Tracking

Guangtong Zhang, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shuxiang Song

6 min. talk | August 8th at 10:00 | Session: CV: Motion and tracking

[+] More

[-] Less

Most existing visual-language trackers greatly rely on the initial language descriptions on a target object to extract their multi-modal features. However, the initial language descriptions are often inaccurate in a highly time-varying video sequence and thus greatly deteriorate their tracking performance due to the low quality of extracted multi-modal features. To address this challenge, we propose a Diffusion Mask-Driven Visual-language Tracker (DMTrack) based on a diffusion model. Confronting the issue of low-quality multi-modal features due to inaccurate language descriptions, we leverage the diffusion model to capture high-quality semantic information from multi-modal features and transform it into target mask features. During the training phase, we further enhance the diffusion model’s perception of pixel-level features by calculating the loss between the target mask features and the ground truth masks. Additionally, we perform joint localization of the target using both target mask features and visual features, instead of relying solely on multi-modal features for localization. Through extensive experiments on four tracking benchmarks (i.e., LaSOT, TNL2K, LaSOText, and OTB-Lang), we validate that our proposed Diffusion Mask-Driven Visual-language Tracker can improve the robustness and effectiveness of the model.

List of keywords

Computer Vision -> CV: Motion and tracking

2447

Unified Unsupervised Salient Object Detection via Knowledge Transfer

Yao Yuan, Wutao Liu, Pan Gao, Qun Dai, Jie Qin

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) mechanism to extract saliency cues from a pre-trained deep network. This mechanism starts with easy samples and progressively moves towards harder ones, to avoid initial interference caused by hard samples. Afterwards, the obtained saliency cues are utilized to train a saliency detector, and we employ a Self-rectify Pseudo-label Refinement (SPR) mechanism to improve the quality of pseudo-labels. Finally, an adapter-tuning method is devised to transfer the acquired saliency knowledge, leveraging shared knowledge to attain superior transferring performance on the target tasks. Extensive experiments on five representative SOD tasks confirm the effectiveness and feasibility of our proposed method. Code and supplement materials are available at https://github.com/I2-Multimedia-Lab/A2S-v3.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding
Machine Learning -> ML: Unsupervised learning

2454

How to Learn Domain-Invariant Representations for Visual Reinforcement Learning: An Information-Theoretical Perspective

Shuo Wang, Zhihao Wu, Jinwen Wang, Xiaobo Hu, Youfang Lin, Kai Lv

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

Despite the impressive success in visual control challenges, Visual Reinforcement Learning (VRL) policies have struggled to generalize to other scenarios. Existing works attempt to empirically improve the generalization capability, lacking theoretical support. In this work, we explore how to learn domain-invariant representations for VRL from an information-theoretical perspective. Specifically, we identify three Mutual Information (MI) terms. These terms highlight that a robust representation should preserve domain invariant information (return and dynamic transition) under significant observation perturbation. Furthermore, we relax the MI terms to derive three components for implementing a practical Mutual Information-based Invariant Representation (MIIR) algorithm for VRL. Extensive experiments demonstrate that MIIR achieves state-of-the-art generalization performance and the best sample efficiency in the DeepMind Control suite, Robotic Manipulation, and Carla.

List of keywords

Computer Vision -> CV: Embodied vision: Active agents, simulation

2462

Improved Parallel Algorithm for Non-Monotone Submodular Maximization under Knapsack Constraint

Tan D. Tran, Canh V. Pham, Dung T. K. Ha, Phuong N. H. Pham

6 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

This work proposes an efficient parallel algorithm for non-monotone submodular maximization under a knapsack constraint problem over the ground set of size n. Our algorithm improves the best approximation factor of the existing parallel one from 8 to 7 with O(log n) adaptive complexity. The key idea of our approach is to create an alternate threshold algorithmic framework. This new strategy alternately constructs two disjoint candidate solutions within a constant number of sequence rounds. Then, the algorithm boosts solution quality without sacrificing the adaptive complexity. Extensive experimental studies on three applications, Revenue Maximization, Image Summarization, and Maximum Weighted Cut, show that our algorithm not only significantly increases solution quality but also requires comparative adaptivity to state-of-the-art algorithms.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
Data Mining -> DM: Big data and scalability

2476

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang

6 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via imitation, the mixed feature with ambiguous semantics hinders the distillation. To this end, we propose an asymmetric distillation framework by feeding teacher model extra raw data to enlarge the benefit of teacher. Moreover, a joint mutual information loss and a selective relabel strategy are utilized to alleviate the influence of hard mixed samples. Our method successfully mitigates the decline in open-set and outperforms SOTAs by 2%~3% AUROC on the Tiny-ImageNet dataset and experiments on large-scale dataset ImageNet-21K demonstrate the generalization of our method.

List of keywords

Computer Vision -> CV: Representation learning
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning

2484

CMACE: CMAES-based Counterfactual Explanations for Black-box Models

Xudong Yin, Yao Yang

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

Explanatory Artificial Intelligence plays a vital role in machine learning, due to its widespread application in decision-making scenarios, e.g., credit lending. Counterfactual Explanation (CFE) is a new kind of explanatory method that involves asking “what if ”, i.e. what would have happened if model inputs slightly change. To answer the question, Counterfactual Explanation aims at finding a minimum perturbation in model inputs leading to a different model decision. Compared with model-agnostic approaches, model-specific CFE approaches designed only for specific type of models usually have better performance in finding optimal counterfactual perturbations, owing to access to the inner workings of models. To deal with this dilemma, this work first proposes CMAES-based Counterfactual Explanations (CMACE): an effective model-agnostic counterfactual generating approach based on Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and a warm starting scheme that provides good initialization of the counterfactual’s mean and covariance parameters for CMA-ES taking advantage of prior information of training samples. CMACE significantly outperforms another state-of-art (SOTA) model-agnostic approach (Bayesian Counterfactual Generator, BayCon) with various experimental settings. Extensive experiments also demonstrate that CMACE is superior to a SOTA model-specific approach (Flexible Optimizable Counterfactual Explanations for Tree Ensembles, FOCUS) that is designed for tree-based models using gradient-based optimization.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Optimization
Machine Learning -> ML: Trustworthy machine learning

2486

Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression

Tianxiao Cao, Lu Sun, Canh Hao Nguyen, Hiroshi Mamitsuka

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Compressing deep neural networks is of great importance for real-world applications on resource-constrained devices. Tensor decomposition is one promising answer that retains the functionality and most of the expressive power of the original deep models by replacing the weights with their decomposed cores. Decomposition with optimal ranks can achieve a good compression-accuracy trade-off, but it is expensive to optimize due to its discrete and combinatorial nature. A common practice is to set all ranks equal and tune one hyperparameter, but it may significantly harm the flexibility and generalization. In this paper, we propose a novel automatic rank selection method for deep model compression that allows learning model weights and decomposition ranks simultaneously. We propose to penalize the ℓ0 (quasi-)norm of the slices of decomposed tensor cores during model training. To avoid combinatorial optimization, we develop a probabilistic formulation and apply an approximate Bernoulli gate to each of the slices of tensor cores, which can be implemented in an end-to-end and scalable framework via gradient descent. It enables the automatic rank selection to be incorporated with arbitrary tensor decompositions and neural network layers such as linear layers, convolutional layers, and embedding layers. Comprehensive experiments on various tasks, including image classification, text sentiment classification, and neural machine translation, demonstrate the superior effectiveness of the proposed method over baselines.

List of keywords

Machine Learning -> ML: Matrix/tensor methods
Machine Learning -> ML: Learning sparse models

2495

A Complete Landscape of EFX Allocations on Graphs: Goods, Chores and Mixed Manna

Yu Zhou, Tianze Wei, Minming Li, Bo Li

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item may be liked or disliked by its endpoint agents. In our problem, an agent has an arbitrary valuation over her incident items such that the items she likes have non-negative marginal values to her and those she dislikes have non-positive marginal values. We provide a complete study of the four notions of EFX for mixed manna in the literature, which differ by whether the removed item can have zero marginal value. We prove that an allocation that satisfies the notion of EFX where the virtually-removed item could always have zero marginal value may not exist and determining its existence is NP-complete, while one that satisfies any of the other three notions always exists and can be computed in polynomial time. We also prove that an orientation (i.e., a special allocation where each edge must be allocated to one of its endpoint agents) that satisfies any of the four notions may not exist, and determining its existence is NP-complete.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

2504

G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in kinematically infeasible predictions. To address these issues, in this paper, we propose G2LTraj, a plug-and-play global-to-local generation approach for trajectory prediction. Specifically, we generate a series of global key steps that uniformly cover the entire future time range. Subsequently, the local intermediate steps between the adjacent key steps are recursively filled in. In this way, we prevent the accumulated error from propagating beyond the adjacent key steps. Moreover, to boost the kinematical feasibility, we not only introduce the spatial constraints among key steps but also strengthen the temporal constraints among the intermediate steps. Finally, to ensure the optimal granularity of key steps, we design a selectable granularity strategy that caters to each predicted trajectory. Our G2LTraj significantly improves the performance of seven existing trajectory predictors across the ETH, UCY and nuScenes datasets. Experimental results demonstrate its effectiveness. Code will be available at https://github.com/Zhanwei-Z/G2LTraj.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Computer Vision -> CV: Motion and tracking
Machine Learning -> ML: Time series and data streams

2505

TFCD: Towards Multi-modal Sarcasm Detection via Training-Free Counterfactual Debiasing

Zhihong Zhu, Xianwei Zhuang, Yunyan Zhang, Derong Xu, Guimin Hu, Xian Wu, Yefeng Zheng

12 min. talk | August 7th at 10:00 | Session: NLP: Sentiment analysis, stylistic analysis, and argument mining

[+] More

[-] Less

Multi-modal sarcasm detection (MSD), which aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic, has garnered widespread attention. Recent approaches focus on designing sophisticated architectures or mechanisms to extract sarcastic cues from entire or local image and text features. Nevertheless, a long-overlooked issue is that current MSD task invariably suffers from unintended dataset biases, especially the statistical label bias and sarcasmless word bias. Concretely, such harmful biases are confounders that may mislead existing models to learn spurious correlations, significantly limiting models’ performance. To tackle this issue, this paper proposes a Training-Free Counterfactual Debiasing framework TFCD, which first formulates the causalities among variables in MSD via a tailored causal graph. Then, TFCD extracts biases from the conventionally-trained model by generating counterfactual utterances and contexts and mitigates them using element-wise subtraction. Extensive experiments on two benchmarks demonstrate the effectiveness of the proposed TFCD. Remarkably, TFCD requires neither data balancing nor model modifications, and thus can be seamlessly integrated into diverse state-of-the-art approaches and achieve considerable improvement margins.

List of keywords

Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining

2519

Partial Optimal Transport Based Out-of-Distribution Detection for Open-Set Semi-Supervised Learning

Yilong Ren, Chuanwen Feng, Xike Xie, S. Kevin Zhou

6 min. talk | August 8th at 11:30 | Session: ML: Semi-supervised learning

[+] More

[-] Less

Semi-supervised learning (SSL) is a machine learning paradigm that utilizes both labeled and unlabeled data to enhance the performance of learning tasks. However, SSL methods operate under the assumption that the label spaces of labeled and unlabeled data are identical, which may not hold in open-world applications. In such scenarios, the unlabeled data may contain novel categories that were not presented in the labeled training data, essentially outliers. This specific challenge is referred to as the Open-set Semi-supervised Learning (OSSL) problem. In OSSL, a pivotal concern is the detection of out-of-distribution (OOD) samples within unlabeled data. Existing methods often struggle to provide effective OOD detection strategies, especially when dealing with datasets comprising a large number of training categories. In response to this challenge, we model the OOD detection problem in OSSL as a partial optimal transport (POT) problem. With POT theory, we devise a mass score function to measure the likelihood of a sample being an outlier, which enables a binary classifier for OOD detection. Further, we put forward an OOD loss, enabling the seamless integration of the binary classifier and off-the-shelf SSL methods under OSSL settings, all within an end-to-end training framework. We extensively evaluate our proposal under various datasets and OSSL configurations, consistently demonstrating the superior performance of our proposal. Codes are available at https://github.com/ryl0427/Code_for_POT_OSSL.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Optimization
Machine Learning -> ML: Robustness

2536

MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering

Yihao Ding, Kaixuan Ren, Jiabin Huang, Siwen Luo, Soyeon Caren Han

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD), particularly with lengthy textual content. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending the hierarchical semantic relations among multiple pages to locate multimodal components. The paper introduces PDF-MVQA, tailored for research journal articles, encompassing multiple pages and multimodal retrieval. Our approach aims to retrieve entire paragraphs containing answers or visually rich document entities like tables and figures. The main contribution is introducing a comprehensive PDF Document VQA dataset, allowing the examination of semantically hierarchical layout structures in text-dominant documents. We also present new VRD-QA frameworks to grasp textual contents and relations among document layouts simultaneously, extending page-level understanding to the entire multi-page document. We aim to enhance the capabilities of existing vision-and-language models in handling challenges posed by text-dominant documents in VRD-QA. Code and Appendix are in https://github.com/adlnlp/pdfmvqa

List of keywords

Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Resources and evaluation
Natural Language Processing -> NLP: Information extraction

2537

A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification

Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphaël C.-W. Phan, Adeel Razi, David L. Dowe

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states.

List of keywords

Machine Learning -> ML: Learning graphical models
Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Adversarial machine learning
Multidisciplinary Topics and Applications -> MTA: Health and medicine

2540

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Multimodal learning
Robotics -> ROB: Perception
Robotics -> ROB: Manipulation

2565

Graph Attention Network with High-Order Neighbor Information Propagation for Social Recommendation

Fei Xiong, Haoran Sun, Guixun Luo, Shirui Pan, Meikang Qiu, Liang Wang

6 min. talk | August 9th at 11:30 | Session: DM: Mining graphs (3/3)

[+] More

[-] Less

In recommender systems, graph neural networks (GNN) can integrate interactions between users and items with their attributes, which makes GNN-based methods more powerful. However, directly stacking multiple layers in a graph neural network can easily lead to over-smoothing, hence recommendation systems based on graph neural networks typically underutilize higher-order neighborhoods in their learning. Although some heterogeneous graph random walk methods based on meta-paths can achieve higher-order aggregation, the focus is predominantly on the nodes at the ends of the paths. Moreover, these methods require manually defined meta-paths, which limits the model’s expressiveness and flexibility. Furthermore, path encoding in graph neural networks usually focuses only on the sequence leading to the target node. However, real-world interactions often do not follow this strict sequence, limiting the predictive performance of sequence-based network models. These problems prevent GNN-based methods from being fully effective. We propose a Graph Attention network with Information Propagation path aggregation for Social Recommendation (GAIPSRec). Firstly, we propose a universal heterogeneous graph sampling framework that does not require manually defining meta-paths for path sampling, thereby offering greater flexibility. Moreover, our method takes into account all nodes on the aggregation path and is capable of learning information from higher-order neighbors without succumbing to over-smoothing. Finally, our method utilizes a gate mechanism to fuse sequential and non-sequential dependence in encoding path instances, allowing a more holistic view of the data. Extensive experiments on real-world datasets show that our proposed GAIPSRec improves the performance significantly and outperforms state-of-the-art methods.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Recommender systems

2594

C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning

Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Vision-Language Instruction Tuning (VLIT) is a critical training phase for Large Vision-Language Models (LVLMs). With the improving capabilities of open-source LVLMs, researchers have increasingly turned to generate VLIT data by using open-source LVLMs and achieved significant progress. However, such data generation approaches are bottlenecked by the following challenges: 1) Since multi-modal models tend to be influenced by prior language knowledge, directly using LVLMs to generate VLIT data would inevitably lead to low content relevance between generated data and images. 2) To improve the ability of the models to generate VLIT data, previous methods have incorporated an additional training phase to boost the generative capacity. This process hurts the generalization of the models to unseen inputs (i.e., “exposure bias” problem). In this paper, we propose a new Content Correlated VLIT data generation via Contrastive Learning (C3L). Specifically, we design a new content relevance module which enhances the content relevance between VLIT data and images by computing Image Instruction Correspondence Scores S(I2C). Moreover, a contrastive learning module is introduced to further boost the VLIT data generation capability of the LVLMs. A large number of automatic measures on four benchmarks show the effectiveness of our method.

List of keywords

Computer Vision -> CV: Vision, language and reasoning
Computer Vision -> CV: Multimodal learning
Natural Language Processing -> NLP: Language models

2608

FineFMPL: Fine-grained Feature Mining Prompt Learning for Few-Shot Class Incremental Learning

Hongbo Sun, Jiahuan Zhou, Xiangteng He, Jinglin Xu, Yuxin Peng

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Few-Shot Class Incremental Learning (FSCIL) aims to continually learn new classes with few training samples without forgetting already learned old classes. Existing FSCIL methods generally fix the backbone network in incremental sessions to achieve a balance between suppressing forgetting old classes and learning new classes. However, the fixed backbone network causes insufficient learning of new classes from a few samples. Benefiting from the powerful visual and textual understanding ability of Vision-Language (VL) pre-training models, we propose a Fine-grained Feature Mining Prompt Learning (FineFMPL) approach to adapt the VL model to FSCIL, which comprehensively learns and memorizes fine-grained discriminative information of emerging classes. Concretely, the visual probe prompt is firstly proposed to guide the image encoder of VL model to extract global-level coarse-grained features and object-level fine-grained features, and visual prototypes are preserved based on image patch significance, which contains the discriminative characteristics exclusive to the class. Secondly, the textual context prompt is constructed by cross-modal mapping of visual prototypes, feeding into the text encoder of VL model to memorize the class information as textual prototypes. Finally, integrating visual and textual prototypes based on fine-grained feature mining into the model improves the recognition performance of all classes in FSCIL. Extensive experiments on three benchmark datasets demonstrate that our FineFMPL achieves new state-of-the-art. The code is available at https://github.com/PKU-ICST-MIPL/FineFMPL_IJCAI2024.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)

2616

Joint Multimodal Aspect Sentiment Analysis with Aspect Enhancement and Syntactic Adaptive Learning

Linlin Zhu, Heli Sun, Qunshu Gao, Tingzhou Yi, Liang He

6 min. talk | August 7th at 10:00 | Session: NLP: Sentiment analysis, stylistic analysis, and argument mining

[+] More

[-] Less

As an important task in sentiment analysis, joint multimodal aspect sentiment analysis (JMASA) has received increasing attention in recent years. However, previous approaches either i) directly fuse multimodal data without fully exploiting the correlation between multimodal input data, or ii) equally utilize the dependencies of words in the text for sentiment analysis, ignoring the differences in the importance of different words. To address these limitations, we propose a joint multimodal sentiment analysis method based on Aspect Enhancement and Syntactic Adaptive Learning (AESAL). Specifically, we construct an aspect enhancement pre-training task to enable the model to fully learn the correlation of aspects between multimodal input data. In order to capture the differences in the importance of different words in the text, we design a syntactic adaptive learning mechanism. First, we construct different syntactic dependency graphs based on the distance between words to learn global and local information in the text. Second, we use a multi-channel adaptive graph convolutional network to maintain the uniqueness of each modality while fusing the correlations between different modalities. Experimental results on benchmark datasets show that our method outperforms state-of-the-art methods.

List of keywords

Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Computer Vision -> CV: Multimodal learning

2617

Pluggable Watermarking of Deepfake Models for Deepfake Detection

Han Bao, Xuhong Zhang, Qinying Wang, Kangming Liang, Zonghui Wang, Shouling Ji, Wenzhi Chen

6 min. talk | August 9th at 10:00 | Session: ETF: Trustworthy AI

[+] More

[-] Less

Deepfake model misuse poses major security concerns. Existing passive and active Deepfake detection methods both suffer from a lack of generalizability and robustness. In this study, we propose a pluggable and efficient active model watermarking framework for Deepfake detection. This approach facilitates the embedding of identification watermarks across a variety of Deepfake generation models, enabling easy extraction by authorities for detection purposes. Specifically, our method leverages the universal convolutional structure in generative model decoders. It employs convolutional kernel sparsification for adaptive watermark embedding positioning and introduces convolutional kernel normalization to seamlessly integrate watermark parameters with those of the generative model. For watermark extraction, we jointly train a watermark extractor based on a Deepfake detection model and use BCH encoding to identify watermark images effectively. Finally, we apply our approach to eight major types of Deepfake generation models. Experiments show our method successfully detects Deepfakes with an average accuracy exceeding 94% even in heavy lossy channels. This approach operates independently of the generation model’s training without affecting the original model’s performance. Furthermore, our model requires training a very limited number of parameters, and it is resilient against three major adaptive attacks. The source code can be found at https://github.com/GuaiZao/Pluggable-Watermarking

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

2622

PEACH: Pretrained-Embedding Explanation across Contextual and Hierarchical Structure

Feiqi Cao, Soyeon Caren Han, Hyunsuk Chung

12 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

In this work, we propose a novel tree-based explanation technique, PEACH (Pretrained-embedding Explanation Across Contextual and Hierarchical Structure), that can explain how text-based documents are classified by using any pretrained contextual embeddings in a tree-based human-interpretable manner. Note that PEACH can adopt any contextual embeddings of the PLMs as a training input for the decision tree. Using the proposed PEACH, we perform a comprehensive analysis of several contextual embeddings on nine different NLP text classification benchmarks. This analysis demonstrates the flexibility of the model by appling several PLM contextual embeddings, its attribute selections, scaling, and clustering methods. Furthermore, we show the utility of explanations by visualising the feature selection and important trend of text classification via human-interpretable word-cloud-based trees, which clearly identify model mistakes and assist in dataset debugging. Besides interpretability, PEACH outperforms or is similar to those from pretrained models. Code and Appendix are in https://github.com/adlnlp/peach.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Knowledge Representation and Reasoning -> KRR: Other

2656

An Efficient Prototype-Based Clustering Approach for Edge Pruning in Graph Neural Networks to Battle Over-Smoothing

Yuyang Huang, Wenjing Lu, Yang Yang

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Topology augmentation is a popular strategy to address the issue of over-smoothing in graph neural networks (GNNs). To prevent potential distortion of node representations, an essential principle is to enhance the separability between embeddings of nodes from different classes while preserving smoothness among nodes of the same class. However, differentiating between inter-class and intra-class edges becomes arduous when class labels are unavailable or the graph is partially labeled. While clustering offers an alternative for identifying closely connected groups of nodes, traditional clustering methods face challenges when applied to GNNs in terms of accuracy, efficiency, adaptability, and scalability to diverse graphs. To address these limitations, we introduce ClusterDrop, which uses learnable prototypes for efficient clustering and incorporates supervised signals to enhance accuracy and adaptability across different graphs. Experiments on six datasets with varying graph structures demonstrate its effectiveness in alleviating over-smoothing and enhancing GNN performance.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks

2666

Incorporating Schema-Aware Description into Document-Level Event Extraction

Zijie Xu, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Ke Ji, Xiye Chen, Chenxiao Wu

6 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Document-level event extraction (DEE) aims to extract the structured event information from a given document, facing two critical challenges: (1) event arguments always scatter across sentences (arguments-scattering); (2) multiple events can co-occur in one document (multi-event). Most recent studies mainly follow two simplified settings to ease the challenges: one simplifies DEE with the no-trigger-words design (NDEE), and the other focuses on event argument extraction (DEAE), a sub-task of DEE. However, the former excludes trigger extraction and suffers from error propagation in the sub-tasks. The latter relies heavily on the gold triggers as prerequisites and struggles to distinguish multiple arguments playing the same role in different events. To address the limitations above, we propose a novel joint trigger and argument extraction paradigm SEELE to enhance the DEE model via incorporating SchEma-awarE descriptions into Document-Level Event extraction. Specifically, the schema-aware descriptions are leveraged from two aspects: (1) guiding the attention mechanism among event-aware tokens across sentences, which relieves arguments-scattering without error propagation; (2) performing the fine-grained contrastive learning to distinguish different events, which mitigates multi-event without gold triggers. Extensive experiments show the superiority of SEELE, achieving notable improvements (2.1% to 9.7% F1) on three NDEE datasets and competitive performance on two DEAE datasets. Our code is available at https://github.com/TheoryRhapsody/SEELE.

List of keywords

Natural Language Processing -> NLP: Information extraction

2669

Bridging the Gap between General and Down-Closed Convex Sets in Submodular Maximization

Loay Mualem, Murad Tukan, Moran Feldman

12 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

Optimization of DR-submodular functions has experienced a notable surge in significance in recent times, marking a pivotal development within the domain of non-convex optimization. Motivated by real-world scenarios, some recent works have delved into the maximization of non-monotone DR-submodular functions over general (not necessarily down-closed) convex set constraints. Up to this point, these works have all used the minimum L-infinity norm of any feasible solution as a parameter. Unfortunately, a recent hardness result due to Mualem and Feldman shows that this approach cannot yield a smooth interpolation between down-closed and non-down-closed constraints. In this work, we suggest novel offline and online algorithms that provably provide such an interpolation based on a natural decomposition of the convex body constraint into two distinct convex bodies: a down-closed convex body and a general convex body. We also empirically demonstrate the superiority of our proposed algorithms across three offline and two online applications.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Machine Learning -> ML: Online learning
Search -> S: Combinatorial search and optimisation

2671

CLIP-FSAC: Boosting CLIP for Few-Shot Anomaly Classification with Synthetic Anomalies

Zuo Zuo, Yao Wu, Baoqiang Li, Jiahao Dong, You Zhou, Lei Zhou, Yanyun Qu, Zongze Wu

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Few-shot anomaly classification (FSAC) is a vital task in manufacturing industry. Recent methods focus on utilizing CLIP in zero/few normal shot anomaly detection instead of custom models. However, there is a lack of specific text prompts in anomaly classification and most of them ignore the modality gap between image and text. Meanwhile, there is distribution discrepancy between the pre-trained and the target data. To provide a remedy, in this paper, we propose a method to boost CLIP for few-normal-shot anomaly classification, dubbed CLIP-FSAC, which contains two-stage of training and alternating fine-tuning with two modality-specific adapters. Specifically, in the first stage, we train image adapter with text representation output from text encoder and introduce an image-to-text tuning to enhance multi-modal interaction and facilitate a better language-compatible visual representation. In the second stage, we freeze the image adapter to train the text adapter. Both of them are constrained by fusion-text contrastive loss. Comprehensive experiment results are provided for evaluating our method in few-normal-shot anomaly classification, which outperforms the state-of-the-art method by 12.2%, 10.9%, 10.4% AUROC on VisA for 1, 2, and 4-shot settings.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Recognition (object detection, categorization)

2674

Learning Big Logical Rules by Joining Small Rules

Céline Hocquette, Andreas Niskanen, Rolf Morel, Matti Järvisalo, Andrew Cropper

6 min. talk | August 7th at 11:30 | Session: KRR: Logic programming

[+] More

[-] Less

A major challenge in inductive logic programming is learning big rules. To address this challenge, we introduce an approach where we join small rules to learn big rules. We implement our approach in a constraint-driven system and use constraint solvers to efficiently join rules. Our experiments on many domains, including game playing and drug design, show that our approach can (i) learn rules with more than 100 literals, and (ii) drastically outperform existing approaches in terms of predictive accuracies.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Symbolic methods

2677

Learning Logic Programs by Discovering Higher-Order Abstractions

Céline Hocquette, Sebastijan Dumancic, Andrew Cropper

6 min. talk | August 7th at 11:30 | Session: KRR: Logic programming

[+] More

[-] Less

We introduce the higher-order refactoring problem, where the goal is to compress a logic program by discovering higher-order abstractions, such as map, filter, and fold. We implement our approach in Stevie, which formulates the refactoring problem as a constraint optimisation problem. Our experiments on multiple domains, including program synthesis and visual reasoning, show that refactoring can improve the learning performance of an inductive logic programming system, specifically improving predictive accuracies by 27% and reducing learning times by 47%. We also show that Stevie can discover abstractions that transfer to multiple domains.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Symbolic methods

2679

LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient, making them well-suited for low-power edge devices. However, the pursuit of accuracy in current studies leads to large, long-timestep SNNs, conflicting with the resource constraints of these devices. In order to design lightweight and efficient SNNs, we propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process. Spatially, we present a novel Compressive Convolution block (CompConv) to expand the search space to support pruning and mixed-precision quantization. Temporally, we are the first to propose a compressive timestep search to identify the optimal number of timesteps under specific computation cost constraints. Finally, we formulate a joint optimization to simultaneously learn the architecture parameters and spatial-temporal compression strategies to achieve high performance while minimizing memory and computation costs. Experimental results on CIFAR-10, CIFAR-100, and Google Speech Command datasets demonstrate our proposed LitE-SNNs can achieve competitive or even higher accuracy with remarkably smaller model sizes and fewer computation costs.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Cognitive systems

2681

Theoretical Study on Multi-objective Heuristic Search

Shawn Skyler, Shahaf Shperberg, Dor Atzmon, Ariel Felner, Oren Salzman, Shao-Hung Chan, Han Zhang, Sven Koenig, William Yeoh, Carlos Hernandez Ulloa

12 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

This paper provides a theoretical study on Multi-Objective Heuristic Search. We first classify states in the state space into must-expand, maybe-expand, and never-expand states and then transfer these definitions to nodes in the search tree. We then formalize a framework that generalizes A* to Multi-Objective Search. We study different ways to order nodes under this framework and their relation to traditional tie-breaking policies and provide theoretical findings. Finally, we study and empirically compare different ordering functions.

List of keywords

Search -> S: Heuristic search
Search -> S: Other
Search -> General

2684

Cross-Domain Feature Augmentation for Domain Generalization

Yingnan Liu, Yingtian Zou, Rui Qiao, Fusheng Liu, Mong Li Lee, Wynne Hsu

6 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Machine learning for vision

2699

Semantics for Non-Flat Assumption-Based Argumentation, Revisited

Jesse Heyninck, Ofer Arieli

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

Assumption-based argumentation (ABA) is an argumentative formalism that allows for reasoning on the basis of defeasible assumptions and strict rules. Standard semantics for this formalism sometimes give rise to problematic behaviour in the presence of rules with assumptions in their heads. In this paper, we introduce a six-valued labelling semantics that overcomes these shortcomings while preserving all the usual properties of the standard Dung-style three-valued semantics for ABA frameworks, including existence of the complete semantics, uniqueness of the grounded semantics and preservation of the computational complexity of all main reasoning processes.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

2704

CompetEvo: Towards Morphological Evolution from Competition

Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

Training an agent to adapt to specific tasks through co-optimization of morphology and control has widely attracted attention. However, whether there exists an optimal configuration and tactics for agents in a multiagent competition scenario is still an issue that is challenging to definitively conclude. In this context, we propose competitive evolution (CompetEvo), which co-evolves agents’ designs and tactics in confrontation. We build arenas consisting of three animals and their evolved derivatives, placing agents with different morphologies in direct competition with each other. The results reveal that our method enables agents to evolve a more suitable design and strategy for fighting compared to fixed-morph agents, allowing them to obtain advantages in combat scenarios. Moreover, we demonstrate the amazing and impressive behaviors that emerge when confrontations are conducted under asymmetrical morphs.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Robotics -> ROB: Learning in robotics
Search -> S: Evolutionary computation

2707

TIM: An Efficient Temporal Interaction Module for Spiking Transformer

Sicheng Shen, Dongcheng Zhao, Guobin Shen, Yi Zeng

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

Spiking Neural Networks (SNNs), as the third generation of neural networks, have gained prominence for their biological plausibility and computational efficiency, especially in processing diverse datasets. The integration of attention mechanisms, inspired by advancements in neural network architectures, has led to the development of Spiking Transformers. These have shown promise in enhancing SNNs’ capabilities, particularly in the realms of both static and neuromorphic datasets. Despite their progress, a discernible gap exists in these systems, specifically in the Spiking Self Attention (SSA) mechanism’s effectiveness in leveraging the temporal processing potential of SNNs. To address this, we introduce the Temporal Interaction Module (TIM), a novel, convolution-based enhancement designed to augment the temporal data processing abilities within SNN architectures. TIM’s integration into existing SNN frameworks is seamless and efficient, requiring minimal additional parameters while significantly boosting their temporal information handling capabilities. Through rigorous experimentation, TIM has demonstrated its effectiveness in exploiting temporal information, leading to state-of-the-art performance across various neuromorphic datasets. The code is available at https://github.com/BrainCog-X/Brain-Cog/tree/main/examples/TIM.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Humans and AI -> HAI: Cognitive systems

2710

Learning Embeddings for Sequential Tasks Using Population of Agents

Mridul Mahajan, Georgios Tzannetos, Goran Radanovic, Adish Singla

12 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

We present an information-theoretic framework to learn fixed-dimensional embeddings for tasks in reinforcement learning. We leverage the idea that two tasks are similar if observing an agent’s performance on one task reduces our uncertainty about its performance on the other. This intuition is captured by our information-theoretic criterion which uses a diverse agent population as an approximation for the space of agents to measure similarity between tasks in sequential decision-making settings. In addition to qualitative assessment, we empirically demonstrate the effectiveness of our techniques based on task embeddings by quantitative comparisons against strong baselines on two application scenarios: predicting an agent’s performance on a new task by observing its performance on a small quiz of tasks, and selecting tasks with desired characteristics from a given set of options.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Representation learning

2717

MOSER: Learning Sensory Policy for Task-specific Viewpoint via View-conditional World Model

Shenghua Wan, Hai-Hang Sun, Le Gan, De-Chuan Zhan

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Reinforcement learning from visual observations is a challenging problem with many real-world applications. Existing algorithms mostly rely on a single observation from a well-designed fixed camera that requires human knowledge. Recent studies learn from different viewpoints with multiple fixed cameras, but this incurs high computation and storage costs and may not guarantee the coverage of the optimal viewpoint. To alleviate these limitations, we propose a straightforward View-conditional Partially Observable Markov Decision Processes (VPOMDPs) assumption and develop a new method, the MOdel-based SEnsor controlleR (MOSER). MOSER jointly learns a view-conditional world model (VWM) to simulate the environment, a sensory policy to control the camera, and a motor policy to complete tasks. We design intrinsic rewards from the VWM without additional modules to guide the sensory policy to adjust the camera parameters. Experiments on locomotion and manipulation tasks demonstrate that MOSER autonomously discovers task-specific viewpoints and significantly outperforms most baseline methods.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Model-based and model learning reinforcement learning
Machine Learning -> ML: Partially observable reinforcement learning and POMDPs

2718

FBLG: A Local Graph Based Approach for Handling Dual Skewed Non-IID Data in Federated Learning

Yi Xu, Ying Li, Haoyu Luo, Xiaoliang Fan, Xiao Liu

12 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

In real-world situations, federated learning often needs to process non-IID (non-independent and identically distributed) data with multiple skews, causing inadequate model performance. Existing federated learning methods mainly focus on addressing the problem with a single skew of non-IID, and hence the performance of global models can be degraded when faced with dual skewed non-IID data caused by heterogeneous label distributions and sample sizes among clients. To address the problem with dual skewed non-IID data, in this paper, we propose a federated learning algorithm based on local graph, named FBLG. Specifically, to address the label distribution skew, we firstly construct a local graph based on clients’ local losses and Jensen-Shannon (JS) divergence, so that similar clients can be selected for aggregation to ensure a highly consistent global model. Afterwards, to address the sample size skew, we design the objective function to favor clients with more samples as models trained with more samples tend to carry more useful information. Experiments on four datasets with dual skewed non-IID data demonstrate FBLG outperforms nine baseline methods and achieves up to 9% improvement in accuracy. Simultaneously, both theoretical analysis and experiments show FBLG can converge quickly.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Optimization
Machine Learning -> ML: Supervised Learning

2729

Reconfigurability-Aware Selection for Contrastive Active Domain Adaptation

Zeyu Zhang, Chun Shen, Shuai Lü, Shaojie Zhang

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Active domain adaptation (ADA) aims to label a small portion of target samples to drastically improve the adaptation performance. The existing ADA methods mostly rely on the output of domain discriminator or the original prediction probability to design sample selection strategies and do not fully explore the semantic information of source and target domain features, which may lead to selecting the valueless target samples. Moreover, most of them require complex network structures (such as introducing additional domain discriminator, multiple classifiers, or loss predictors) and multiple query functions. In this work, we propose a concise but effective ADA method called Reconfigurability-Aware Selection for Contrastive active domain adaptation (RASC). With the reconfigurability-aware sample selection strategy, RASC can select the most valuable target samples for annotation in the presence of domain shift. To better utilize the selected target samples, we further design a contrastive learning-based gradual active domain adaptation framework. In addition, we propose a variant of RASC called RASC-Ob, which uses a simpler sample annotation method and supplements the learning of misclassified samples. Extensive experimental results on multiple benchmarks demonstrate the superiority of RASC.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning

2744

Temporal Knowledge Graph Extrapolation via Causal Subhistory Identification

Kai Chen, Ye Wang, Xin Song, Siwei Chen, Han Yu, Aiping Li

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

Temporal knowledge graph extrapolation has become a prominent area of study interest in recent years. Numerous methods for extrapolation have been put forth, mining query-relevant information from history to generate forecasts. However, existing approaches normally do not discriminate between causal and non-causal effects in reasoning; instead, they focus on analyzing the statistical correlation between the future events to be predicted and the historical data given, which may be deceptive and hinder the model’s capacity to learn real causal information that actually affects the reasoning conclusions. To tackle it, we propose a novel approach called Causal Subhistory Identification (CSI), which focuses on extracting the causal subhistory for reasoning purposes from a large amount of historical data. CSI can improve the clarity and transparency of the reasoning process and more effectively convey the logic behind conclusions by giving priority to the causal subhistory and eliminating non-causal correlations. Extensive experiments demonstrate the remarkable potential of our CSI in the following aspects: superiority, improvement, explainability, and robustness.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Data Mining -> DM: Knowledge graphs and knowledge base completion
Natural Language Processing -> NLP: Applications

2752

Navigating Continual Test-time Adaptation with Symbiosis Knowledge

Xu Yang, Moqi Li, Jie Yin, Kun Wei, Cheng Deng

6 min. talk | August 8th at 11:30 | Session: ML: Unsupervised learning

[+] More

[-] Less

Continual test-time domain adaptation seeks to adapt the source pre-trained model to a continually changing target domain without incurring additional data acquisition or labeling costs. Unfortunately, existing mainstream methods may result in a detrimental cycle. This is attributed to noisy pseudo-labels caused by the domain shift, which immediately negatively impacts the model’s knowledge. The long-term accumulation of these negative effects exacerbates the model’s difficulty in generalizing to future domain shifts and contributes to catastrophic forgetting. To address these challenges, this paper introduces a Dual-stream Network that independently optimizes different parameters in each stream to capture symbiotic knowledge from continual domains, thereby ensuring generalization while enhancing instantaneous discrimination. Furthermore, to prevent catastrophic forgetting, a weighted soft parameter alignment method is designed to leverage knowledge from the source model. Finally, efforts are made to calibrate and explore reliable supervision signals to mitigate instantaneous negative optimization. These include label calibration with prior knowledge, label selection using self-adaptive confidence thresholds, and a soft-weighted contrastive module for capturing potential semantics. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on several benchmark datasets.

List of keywords

Machine Learning -> ML: Unsupervised learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

2758

Equilibria in Two-Stage Facility Location with Atomic Clients

Simon Krogmann, Pascal Lenzner, Alexander Skopalik, Marc Uetz, Marnix C. Vos

12 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

We consider competitive facility location as a two-stage multi-agent system with two types of clients. For a given host graph with weighted clients on the vertices, first facility agents strategically select vertices for opening their facilities. Then, the clients strategically select which of the opened facilities in their neighborhood to patronize. Facilities want to attract as much client weight as possible, clients want to minimize congestion on the chosen facility. All recently studied versions of this model assume that clients can split their weight strategically. We consider clients with unsplittable weights, but allow mixed strategies. So clients may randomize over which facility to patronize. Besides modeling a natural client behavior, this subtle change yields drastic changes, e.g., for a given facility placement, qualitatively different client equilibria are possible. As our main result, we show that pure subgame perfect equilibria always exist if all client weights are identical. For this, we use a novel potential function argument, employing a hierarchical classification of the clients and sophisticated rounding in each step. In contrast, for non-identical clients, we show that deciding the existence of even approximately stable states is computationally intractable. On the positive side, we give a tight bound of 2 on the price of anarchy which implies high social welfare of equilibria, if they exist.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

2759

Angluin-Style Learning of Deterministic Büchi and Co-Büchi Automata

Yong Li, Sven Schewe, Qiyi Tang

12 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

While recently developed Angluin-style learning algorithms for omega-automata have much in common with her classic DFA learning algorithm, there is a huge difference in the cost of the equivalence queries about the target automata. For omega-regular languages, the target is to learn nondeterministic Buchi automata (NBAs) through the vehicle of Families of DFAs (FDFAs). While the cost of equivalence queries is usually idealised as constant in learning, it makes a practical difference that the language equivalence checking about the learned NBAs is computationally hard. We develop efficient techniques for the cases, where we learn deterministic Buchi automata (DBAs) or deterministic co-Buchi automata (DCAs). This is based on the observation that some classes of FDFAs can be used to learn DBAs for DBA recognisable languages, rather than having to resort to nondeterministic ones. We believe that the restriction to DBAs and DCAs in equivalence queries also makes our algorithm more appealing to realistic applications, as the operations are cheap—NL—for DBAs and DCAs.

List of keywords

Machine Learning -> ML: Active learning
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Model-based and model learning reinforcement learning

2765

Compilation and Fast Model Counting beyond CNF

Alexis de Colnet, Stefan Szeider, Tianwei Zhang

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Circuits in deterministic decomposable negation normal form (d-DNNF) are representations of Boolean functions that enable linear-time model counting. This paper strengthens our theoretical knowledge of what classes of functions can be efficiently transformed, or compiled, into d-DNNF. Our main contribution is the fixed-parameter tractable (FPT) compilation of conjunctions of specific constraints parameterized by incidence treewidth. This subsumes the known result for CNF. The constraints in question are all functions representable by constant-width ordered binary decision diagrams (OBDDs) for all variable orderings. For instance, this includes parity constraints and cardinality constraints with constant threshold. The running time of the FPT compilation is singly exponential in the incidence treewidth but hides large constants in the exponent. To balance that, we give a more efficient FPT algorithm for model counting that applies to a sub-family of the constraints and does not require compilation.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge compilation

2768

The Orthogonality of Weight Vectors: The Key Characteristics of Normalization and Residual Connections

Zhixing Lu, Yuanyuan Sun, Zhihao Yang, Qin Zhou, Hongfei Lin

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

Normalization and residual connections find extensive application within the intricate architecture of deep neural networks, contributing significantly to their heightened performance. Nevertheless, the precise factors responsible for this elevated performance have remained elusive. Our theoretical investigations have unveiled a noteworthy revelation: the utilization of normalization and residual connections results in an enhancement of the orthogonality within the weight vectors of deep neural networks. This, in turn, induces the Gram matrix of neural network weights to exhibit a pronounced tendency towards strict diagonal dominance, thereby amplifying the neural network’s capacity for feature learning. Meanwhile, we have designed the parameters independence index (PII) to precisely characterize the orthogonality of parameter vectors. In tandem with our theoretical findings, we undertook empirical validations through experiments conducted on prevalent network models, including fully connected networks (FNNs), convolutional neural networks (CNNs), Transformers, pre-trained language models(PLMs) and large language models (LLMs) composed of Transformers. Finally, we have found that a fine-tuning technique (LoRA) preserves the orthogonality of parameter vectors, a revelation that carries importance within the framework of fine-tuning techniques for LLMs.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Deep learning architectures

2773

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant challenge to fine-tuning KGE models efficiently. To address this issue, we propose a fast CKGE framework (FastKGE), incorporating an incremental low-rank adapter (IncLoRA) mechanism to efficiently acquire new knowledge while preserving old knowledge. Specifically, to mitigate catastrophic forgetting, FastKGE isolates and allocates new knowledge to specific layers based on the fine-grained influence between old and new KGs. Subsequently, to accelerate fine-tuning, FastKGE devises an efficient IncLoRA mechanism, which embeds the specific layers into incremental low-rank adapters with fewer training parameters. Moreover, IncLoRA introduces adaptive rank allocation, which makes the LoRA aware of the importance of entities and adjusts its rank scale adaptively. We conduct experiments on four public datasets and two new datasets with a larger initial scale. Experimental results demonstrate that FastKGE can reduce training time by 34%-49% while still achieving competitive link prediction performance against state-of-the-art models on four public datasets (average MRR score of 21.0% vs. 21.1%). Meanwhile, on two newly constructed datasets, FastKGE saves 51%-68% training time and improves link prediction performance by 1.5%.

List of keywords

Data Mining -> DM: Knowledge graphs and knowledge base completion

2778

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Bangbang Zhou, Yadong Qu, Zixiao Wang, Zicheng Li, Boqiang Zhang, Hongtao Xie

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Multimodal learning

2780

Improved Approximation of Weighted MMS Fairness for Indivisible Chores

Fangxiao Wang, Bo Li, Pinyan Lu

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We study how to fairly allocate a set of indivisible chores among n agents who may have different weights corresponding to their involvement in completing these chores. We found that some of the existing fairness notions may place agents with lower weights at a disadvantage, which motivates us to explore weighted maximin share fairness (WMMS). While it is known that a WMMS allocation may not exist, no non-trivial approximation has been discovered thus far. In this paper, we first design a simple sequential picking algorithm that solely relies on the agents’ ordinal rankings of the items, which achieves an approximation ratio of O(log n). Then, for the case involving two agents, we improve the approximation ratio to (√3+1)/2 ≈1.366, and prove that it is optimal. We also consider an online setting when the items arrive one after another and design an O(√n)-competitive online algorithm given the valuations are normalized

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division

2782

Online Learning with Off-Policy Feedback in Adversarial MDPs

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Papini, Alberto Maria Metelli, Nicola Gatti

12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

In this paper, we face the challenge of online learning in adversarial Markov decision processes with off-policy feedback. In this setting, the learner chooses a policy, but, differently from the traditional on-policy setting, the environment is explored by means of a different, fixed, and possibly unknown policy (named colleague’s policy). The off-policy feedback presents an additional issue that is not present in traditional settings: the learner is charged with the regret of its chosen policy but it observes only the rewards gained by the colleague’s policy. First, we present a lower-bound for the setting we propose, which shows that the optimal dependency of the sublinear regret is w.r.t. the dissimilarity between the optimal policy in hindsight and the colleague’s policy. Then, we propose novel algorithms that, by employing pessimistic estimators—commonly adopted in the off-line reinforcement learning literature—ensure sublinear regret bounds depending on the desired dissimilarity, even when the colleague’s policy is unknown.

List of keywords

Machine Learning -> ML: Online learning
Machine Learning -> ML: Reinforcement learning

2785

Optimizing Viscous Democracy

Ben Armstrong, Shiri Alouf-Heffetz, Nimrod Talmon

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

Viscous democracy is a generalization of liquid democracy, a social choice framework in which voters may transitively delegate their votes. In viscous democracy, a "viscosity" factor decreases the weight of a delegation the further it travels, reducing the chance of excessive weight flowing between ideologically misaligned voters. We demonstrate that viscous democracy often significantly improves the quality of group decision-making over liquid democracy. We first show that finding optimal delegations within a viscous setting is NP-hard. However, simulations allow us to explore the practical effects of viscosity. Across social network structures, competence distributions, and delegation mechanisms we find high viscosity reduces the chance of “super-voters” attaining large amounts of weight and increases the number of voters that are able to affect the outcome of elections. This, in turn, improves group accuracy as a whole. As a result, we argue that viscosity should be considered a core component of liquid democracy.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Multidisciplinary Topics and Applications -> MTA: Web and social networks

2802

Reschedule Diffusion-based Bokeh Rendering

Shiyue Yan, Xiaoshi Qiu, Qingmin Liao, Jing-Hao Xue, Shaojun Liu

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Bokeh rendering for images shot with small apertures has drawn much attention in practice. Very recently people start to explore diffusion models for bokeh rendering, aiming to leverage the models’ surging power of image generation. However, we can clearly observe two big issues with the images rendered by diffusion models: large fluctuation and severe color deviation. To address these issues, we propose in this paper a prior-aware sampling approach, which can adaptively control the noise scale through learned priors, and a prior-aware noise scheduling strategy, which can greatly reduce the number of inference steps without sacrificing performance. Extensive experiments show that our method can effectively alleviate the fluctuation problem of sampling results while ensuring similar color styles to the input image. In addition, our method outperforms state-of-the-art methods, sometimes even with only two steps of sampling. Our code is available at https://github.com/Loeiii/Reschedule-Diffusion-based-Bokeh-Rendering.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Humans and AI -> HAI: Applications
Machine Learning -> ML: Applications

2809

Cross-Scale Domain Adaptation with Comprehensive Information for Pansharpening

Meiqi Gong, Hao Zhang, Hebaixu Wang, Jun Chen, Jun Huang, Xin Tian, Jiayi Ma

6 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

Deep learning-based pansharpening methods typically use simulated data at the reduced-resolution scale for training. It limits their performance when generalizing the trained model to the full-resolution scale due to incomprehensive information utilization of panchromatic (PAN) images at the full-resolution scale and low generalization ability. In this paper, we adopt two targeted strategies to address the above two problems. On the one hand, we introduce a cross-scale comprehensive information capture module, which improves the information utilization of the original PAN image through fully-supervised reconstruction. On the other hand, we pioneer a domain adaptation strategy to tackle the problem of low generalization across different scales. Considering the instinct domain gap between different scales, we leverage the maximum mean discrepancy loss and the inherent pixel-level correlations between features at different scales to reduce the scale variance, thus boosting the generalization ability of our model. Experiments on various satellites demonstrate the superiority of our method over the state-of-the-arts in terms of information retention. Our code is publicly available at https://github.com/Meiqi-Gong/SDIPS.

List of keywords

Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Computational photography

2821

Learning Pareto Set for Multi-Objective Continuous Robot Control

Tianye Shu, Ke Shang, Cheng Gong, Yang Nan, Hisao Ishibuchi

12 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

For a control problem with multiple conflicting objectives, there exists a set of Pareto-optimal policies called the Pareto set instead of a single optimal policy. When a multi-objective control problem is continuous and complex, traditional multi-objective reinforcement learning (MORL) algorithms search for many Pareto-optimal deep policies to approximate the Pareto set, which is quite resource-consuming. In this paper, we propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space using a single hypernet. The learned hypernet can directly generate various well-trained policy networks for different user preferences. We compare our method with two state-of-the-art MORL algorithms on seven multi-objective continuous robot control problems. Experimental results show that our method achieves the best overall performance with the least training parameters. An interesting observation is that the Pareto set is well approximated by a curved line or surface in a high-dimensional parameter space. This observation will provide insight for researchers to design new MORL algorithms.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Optimization
Robotics -> ROB: Learning in robotics

2823

Towards Geometric Normalization Techniques in SE(3) Equivariant Graph Neural Networks for Physical Dynamics Simulations

Ziqiao Meng, Liang Zeng, Zixing Song, Tingyang Xu, Peilin Zhao, Irwin King

12 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

SE(3) equivariance is a fundamental property that is highly desirable to maintain in physical dynamics modeling. This property ensures neural outputs to remain robust when the inputs are translated or rotated. Recently, there have been several proposals for SE(3) equivariant graph neural networks (GNNs) that have shown promising results in simulating particle dynamics. However, existing works have neglected an important issue that current SE(3) equivariant GNNs cannot scale to large particle systems. Although some simple normalization techniques are already in use to stabilize the training dynamics of equivariant graph networks, they actually break the SE(3) equivariance of the architectures. In this work, we first show the numerical instability of training equivariant GNNs on large particle systems and then analyze some existing normalization strategies adopted in modern works. We propose a new normalization layer called GeoNorm, which can satisfy the SE(3) equivariance and simultaneously stabilize the training process. We conduct comprehensive experiments on N-body system simulation tasks with larger particle system sizes. The experimental results demonstrate that GeoNorm successfully preserves the SE(3) equivariance compared to baseline techniques and stabilizes the training dynamics of SE(3) equivariant GNNs on large systems.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Geometric learning
Machine Learning -> ML: Sequence and graph learning

2836

FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization

Zhiyuan Ning, Chunlin Tian, Meng Xiao, Wei Fan, Pengyang Wang, Li Li, Pengfei Wang, Yuanchun Zhou

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovatively recasts the client selection process as a generative task. Drawing inspiration from the methodologies used in large language models, FedGCS efficiently encodes abundant decision-making knowledge within a continuous representation space, enabling efficient gradient-based optimization to search for optimal client selection that will be finally output via generation. The framework comprises four steps: (1) automatic collection of diverse “selection-score” pair data using classical client selection methods; (2) training an encoder-evaluator-decoder framework on this data to construct a continuous representation space; (3) employing gradient-based optimization in this space for optimal client selection; (4) generating the final optimal client selection via using beam search for the well-trained decoder. FedGCS outperforms traditional methods by being more comprehensive, generalizable, and efficient, simultaneously optimizing for model performance, latency, and energy consumption. The effectiveness of FedGCS is proven through extensive experimental analyses.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Applications
Data Mining -> DM: Other
Machine Learning -> ML: Applications

2851

Extremal Separation Problems for Temporal Instance Queries

Jean Christoph Jung, Vladislav Ryzhikov, Frank Wolter, Michael Zakharyaschev

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

The separation problem for a class Q of database queries is to find a query in Q that distinguishes between a given set of ‘positive’ and ‘negative’ data examples. Separation provides explanations of examples and underpins the query-by-example paradigm to support database users in constructing and refining queries. As the space of all separating queries can be large, it is helpful to succinctly represent this space by means of its most specific (logically strongest) and general (weakest) members. We investigate this extremal separation problem for classes of instance queries formulated in linear temporal logic LTL with the operators conjunction, ‘next’, and ‘eventually’. Our results range from tight complexity bounds for verifying and counting extremal separators to algorithms computing them.

List of keywords

Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Multidisciplinary Topics and Applications -> MTA: Databases

2852

Boosting Single Positive Multi-label Classification with Generalized Robust Loss

Yanxi Chen, Chunxiao Li, Xinyang Dai, Jinhuan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and robust losses, mostly leading to unacceptable false negatives. To address this issue, we first propose a generalized loss framework based on expected risk minimization to provide soft pseudo labels, and point out that the former losses can be seamlessly converted into our framework. In particular, we design a novel robust loss based on our framework, which enjoys flexible coordination between false positives and false negatives, and can additionally deal with the imbalance between positive and negative samples. Extensive experiments show that our approach can significantly improve SPML performance and outperform the vast majority of state-of-the-art methods on all the four benchmarks. Our code is available at https://github.com/yan4xi1/GRLoss.

List of keywords

Machine Learning -> ML: Multi-label learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Weakly supervised learning

2859

SketchEdit: Editing Freehand Sketches at the Stroke-Level

Tengjie Li, Shikui Tu, Lei Xu

6 min. talk | August 9th at 11:30 | Session: ML: Generative models

[+] More

[-] Less

Recent sketch synthesis methods have demonstrated the capability of generating lifelike outcomes. However, these methods directly encode the entire sketches making it challenging to decouple the strokes from the sketches and have difficulty in controlling local sketch synthesis, e.g., stroke editing. Besides, the sketch editing task encounters the issue of accurately positioning the edited strokes, because users may not be able to draw on the exact position and the same stroke may appear in various locations in different sketches. We propose SketchEdit to realize flexible editing of sketches at the stroke-level for the first time. To tackle the challenge of decoupling strokes, SketchEdit divides a drawing sequence of a sketch into a series of strokes based on the pen state, aligns the stroke segments to have the same starting position, and learns the embeddings of every stroke by a proposed stroke encoder. Moreover, we overcome the problem of stroke placement via a diffusion process, which progressively generates the locations for the strokes to be synthesized, using the stroke features as the guiding condition. Experiments demonstrate that SketchEdit is effective for stroke-level sketch editing and sketch reconstruction. The source code is publicly available at https://github.com/CMACH508/SketchEdit/.

List of keywords

Machine Learning -> ML: Generative models
Computer Vision -> CV: Applications
Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Representation learning

2864

Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, Wenchao Meng

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose AnomalyLLM, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network’s feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5% in the UCR dataset.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Unsupervised learning

2865

The Transformation Logics

Alessandro Ronca

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

We introduce a new family of temporal logics designed to finely balance the trade-off between expressivity and complexity. Their key feature is the possibility of defining operators of a new kind that we call transformation operators. Some of them subsume existing temporal operators, while others are entirely novel. Of particular interest are transformation operators based on semigroups. They enable logics to harness the richness of semigroup theory, and we show them to yield logics capable of creating hierarchies of increasing expressivity and complexity which are non-trivial to characterise in existing logics. The result is a genuinely novel and yet unexplored landscape of temporal logics, each of them with the potential of matching the trade-off between expressivity and complexity required by specific applications.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

2875

First-Order Progression beyond Local-Effect and Normal Actions

Daxin Liu, Jens Claßen

6 min. talk | August 9th at 11:30 | Session: KRR: Reasoning about actions

[+] More

[-] Less

One of the fundamental problems in reasoning about action is progression, which is to update a knowledge base according to the effects of an action into another knowledge base that retains all proper information. The problem is notoriously challenging, as in general, it requires second-order logic. Efforts have been made to find fragments where progression is first-order definable. Liu and Lakemeyer showed that for actions that have only local effects, progression is always first-order definable. They also generalized the result to so-called normal actions, that allow for non-local effects, as long as the affected fluent predicates only depend on local-effect ones, under certain restrictions on the knowledge base. In addition, they showed that for so-called proper+ knowledge bases, progression for normal actions can be efficient under reasonable assumptions. In this paper, we consider a larger class of theories, called the acyclic ones, that strictly subsumes normal actions. In such theories, dependencies between non-local effect fluent predicates are allowed, as long as they do not contain any cycles. We prove progression to be equally first-order definable for this class. Furthermore, under similar but stronger assumptions than those made by Liu and Lakemeyer, we show that progression is efficient as well.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions

2878

BATON: Aligning Text-to-Audio Model Using Human Preference Feedback

Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Qinmei Xu, Zunnan Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

6 min. talk | August 8th at 10:00 | Session: NLP: Speech

[+] More

[-] Less

With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, the first framework specifically designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON comprises three key stages: Firstly, we curated a dataset containing both prompts and the corresponding generated audio, which was then annotated based on human feedback. Secondly, we introduced a reward model using the constructed dataset, which can mimic human preference by assigning rewards to input text-audio pairs. Finally, we employed the reward model to fine-tune an off-the-shelf text-to-audio model. The experiment results demonstrate that our BATON can significantly improve the generation quality of the original text-to-audio models, concerning audio integrity, temporal relationship, and alignment with human preference. Project page is available at https://baton2024.github.io.

List of keywords

Machine Learning -> ML: Generative models
Multidisciplinary Topics and Applications -> MTA: Arts and creativity
Natural Language Processing -> NLP: Speech

2879

Truthful Interval Covering

Argyrios Deligkas, Aris Filos-Ratsikas, Alexandros A. Voudouris

6 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

We initiate the study of a novel problem in mechanism design without money, which we term Truthful Interval Covering (TIC). An instance of TIC consists of a set of agents each associated with an individual interval on a line, and the objective is to decide where to place a covering interval to minimize the total social or egalitarian cost of the agents, which is determined by the intersection of this interval with their individual ones. This fundamental problem can model situations of provisioning a public good, such as the use of power generators to prevent or mitigate load shedding in developing countries. In the strategic version of the problem, the agents wish to minimize their individual costs, and might misreport the position and/or length of their intervals to achieve that. Our goal is to design truthful mechanisms to prevent such strategic misreports and achieve good approximations to the best possible social or egalitarian cost. We consider the fundamental setting of known intervals with equal lengths and provide tight bounds on the approximation ratios achieved by truthful deterministic mechanisms. For the social cost, we also design a randomized truthful mechanism that outperforms all possible deterministic ones. Finally, we highlight a plethora of natural extensions of our model for future work, as well as some natural limitations of those settings.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Computational social choice

2885

Revisiting Causal Discovery from a Complexity-Theoretic Perspective

Robert Ganian, Viktoriia Korchemna, Stefan Szeider

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Causal discovery seeks to unveil causal relationships (represented as a so-called causal graph) from observational data. This paper investigates the complex relationship between the graph structure and the efficiency of constraint-based causal discovery algorithms. Our main contributions include (i) a near-tight characterization of which causal graphs admit a small d-separating set for each pair of vertices and thus can potentially be efficiently recovered by a constraint-based causal discovery algorithm, (ii) the explicit construction of a sequence of causal graphs on which the influential PC algorithm might need exponential time, although there is a small d-separating set between every pair of variables, and (iii) the formulation of a new causal discovery algorithm which achieves fixed-parameter running time by considering the maximum number of edge-disjoint paths between variables in the (undirected) super-structure as the parameter. A distinguishing feature of our investigation is that it is carried out within a more fine-grained model which more faithfully captures the infeasibility of performing accurate independence tests for large sets of conditioning variables.

List of keywords

Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

2901

Vision-fused Attack: Advancing Aggressive and Stealthy Adversarial Text against Neural Machine Translation

Yanni Xue, Haojie Hao, Jiakai Wang, Qiang Sheng, Renshuai Tao, Yu Liang, Pu Feng, Xianglong Liu

6 min. talk | August 9th at 11:30 | Session: NLP: Natural Language Processing (3/3)

[+] More

[-] Less

While neural machine translation (NMT) models achieve success in our daily lives, they show vulnerability to adversarial attacks. Despite being harmful, these attacks also offer benefits for interpreting and enhancing NMT models, thus drawing increased research attention. However, existing studies on adversarial attacks are insufficient in both attacking ability and human imperceptibility due to their sole focus on the scope of language. This paper proposes a novel vision-fused attack (VFA) framework to acquire powerful adversarial text, i.e., more aggressive and stealthy. Regarding the attacking ability, we design the vision-merged solution space enhancement strategy to enlarge the limited semantic solution space, which enables us to search for adversarial candidates with higher attacking ability. For human imperceptibility, we propose the perception-retained adversarial text selection strategy to align the human text-reading mechanism. Thus, the finally selected adversarial text could be more deceptive. Extensive experiments on various models, including large language models (LLMs) like LLaMA and GPT-3.5, strongly support that VFA outperforms the comparisons by large margins (up to 81%/14% improvements on ASR/SSIM).

List of keywords

Natural Language Processing -> NLP: Machine translation and multilinguality
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Trustworthy machine learning

2906

Enhanced DouDiZhu Card Game Strategy Using Oracle Guiding and Adaptive Deep Monte Carlo Method

Qian Luo, Tien Ping Tan, Daochen Zha, Tianqiao Zhang

6 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

Deep Reinforcement Learning (DRL) exhibits significant advancements in games with both perfect and imperfect information, such as Go, Chess, Texas Hold’em, and Dota2. However, DRL encounters considerable challenges when tackling card game DouDiZhu because of the imperfect information, large state-action space, and the sparse reward issue. This paper presents OADMCDou, which combines Oracle Guiding and Adaptive Deep Monte Carlo Method to address the challenges in DouDiZhu. Oracle Guiding trains an Oracle agent with both imperfect and perfect information, gradually reducing reliance on imperfect information to transition to a standard agent. Adaptive Deep Monte Carlo uses gradient weight clipping and constrains the magnitude of updates to prevent extreme policy updates. We conduct extensive experiments to evaluate the effectiveness of the proposed methods, demonstrating OADMCDou’s superior performance over the state-of-the-art DouDiZhu AI, DouZero. This superiority over DouZero is reflected in two metrics: a 95% confidence interval of 0.104 ± 0.041 for performance, and a 28.6% reduction in loss.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Entertainment
Multidisciplinary Topics and Applications -> MTA: Computer games
Multidisciplinary Topics and Applications -> MTA: Game playing
Agent-based and Multi-agent Systems -> MAS: Other

2920

Zero-shot High-fidelity and Pose-controllable Character Animation

Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Machine Learning -> ML: Multi-modal learning

2924

Evaluation of Project Performance in Participatory Budgeting

Niclas Boehmer, Piotr Faliszewski, Łukasz Janeczko, Dominik Peters, Grzegorz Pierczyński, Šimon Schierreich, Piotr Skowron, Stanisław Szufa

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

We study ways of evaluating the performance of losing projects in participatory budgeting (PB) elections by seeking actions that would make them win. We focus on lowering their costs, obtaining additional approvals, and removing approvals for competing projects: The larger a change is needed, the less successful is the given project. We seek efficient algorithms for computing our measures and we analyze them experimentally, focusing on GreedyAV, Phragmen, and Equal-Shares PB rules.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

2925

Geometry-Guided Conditional Adaptation for Surrogate Models of Large-Scale 3D PDEs on Arbitrary Geometries

Jingyang Deng, Xingjian Li, Haoyi Xiong, Xiaoguang Hu, Jinwen Ma

6 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

Deep learning surrogate models aim to accelerate the solving of partial differential equations (PDEs) and have achieved certain promising results. Although several main-stream models through neural operator learning have been applied to delve into PDEs on varying geometries, they were designed to map the complex geometry to a latent uniform grid, which is still challenging to learn by the networks with general architectures. In this work, we rethink the critical factors of PDE solutions and propose a novel model-agnostic framework, called 3D Geometry-Guided Conditional adaptation (3D-GeoCA), for solving PDEs on arbitrary 3D geometries. Starting with a 3D point cloud geometry encoder, 3D-GeoCA can extract the essential and robust representations of any kind of geometric shapes, which conditionally guides the adaptation of hidden features in the surrogate model. We conduct experiments on two public computational fluid dynamics datasets, the Shape-Net Car and Ahmed-Body dataset, using several surrogate models as the backbones with various point cloud geometry encoders to simulate corresponding large-scale Reynolds Average Navier-Stokes equations. Equipped with 3D-GeoCA, these backbone models can reduce the L-2 error by a large margin. Moreover, this 3D-GeoCA is model-agnostic so that it can be applied to any surrogate model.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Supervised Learning

2929

MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization

Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu

12 min. talk | August 7th at 11:30 | Session: S: Combinatorial search and optimisation

[+] More

[-] Less

Neural Combinatorial Optimization (NCO) is an emerging domain where deep learning techniques are employed to address combinatorial optimization problems as a standalone solver. Despite their potential, existing NCO methods often suffer from inefficient search space exploration, frequently leading to local optima entrapment or redundant exploration of previously visited states. This paper introduces a versatile framework, referred to as Memory-Augmented Reinforcement for Combinatorial Optimization (MARCO), that can be used to enhance both constructive and improvement methods in NCO through an innovative memory module. MARCO stores data collected throughout the optimization trajectory and retrieves contextually relevant information at each state. This way, the search is guided by two competing criteria: making the best decision in terms of the quality of the solution and avoiding revisiting already explored solutions. This approach promotes a more efficient use of the available optimization budget. Moreover, thanks to the parallel nature of NCO models, several search threads can run simultaneously, all sharing the same memory module, enabling an efficient collaborative exploration. Empirical evaluations, carried out on the maximum cut, maximum independent set and travelling salesman problems, reveal that the memory module effectively increases the exploration, enabling the model to discover diverse, higher-quality solutions. MARCO achieves good performance in a low computational cost, establishing a promising new direction in the field of NCO.

List of keywords

Search -> S: Combinatorial search and optimisation
Search -> S: Search and machine learning

2937

Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Yulun Zhang, He Jiang, Varun Bhatt, Stefanos Nikolaidis, Jiaoyang Li

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

We study how to use guidance to improve the throughput of lifelong Multi-Agent Path Finding (MAPF). Previous studies have demonstrated that, while incorporating guidance, such as highways, can accelerate MAPF algorithms, this often results in a trade-off with solution quality. In addition, how to generate good guidance automatically remains largely unexplored, with current methods falling short of surpassing manually designed ones. In this work, we introduce the guidance graph as a versatile representation of guidance for lifelong MAPF, framing Guidance Graph Optimization as the task of optimizing its edge weights. We present two GGO algorithms to automatically generate guidance for arbitrary lifelong MAPF algorithms and maps. The first method directly optimizes edge weights, while the second method optimizes an update model capable of generating edge weights. Empirically, we show that (1) our guidance graphs improve the throughput of three representative lifelong MAPF algorithms in eight benchmark maps, and (2) our update model can generate guidance graphs for as large as 93 x 91 maps and as many as 3,000 agents. We include the source code at: https://github.com/lunjohnzhang/ggo_public. All optimized guidance graphs are available online at: https://yulunzhang.net/publication/zhang2024ggo.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Applications
Robotics -> ROB: Multi-robot systems
Search -> S: Evolutionary computation

2939

Capturing Knowledge Graphs and Rules with Octagon Embeddings

Victor Charpenay, Steven Schockaert

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to model rules, thus failing to deliver on the main promise of region based models. With the aim of addressing these limitations, we investigate regions which are composed of axis-aligned octagons. Such octagons are particularly easy to work with, as intersections and compositions can be straightforwardly computed, while they are still sufficiently expressive to model arbitrary knowledge graphs. Among others, we also show that our octagon embeddings can properly capture a non-trivial class of rule bases. Finally, we show that our model achieves competitive experimental results.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Data Mining -> DM: Knowledge graphs and knowledge base completion
Machine Learning -> ML: Neuro-symbolic methods

2942

Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods

Andrej Tschalzev, Paul Nitschke, Lukas Kirchdorfer, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

12 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific ‘random effects’ from cluster-invariant ‘fixed effects’ have been proposed to improve generalization and interpretability for clustered data. However, existing methods only allow for approximate quantification of cluster effects and are limited to regression and binary targets with only one clustering feature. We present MC-GMENN, a novel approach employing Monte Carlo techniques to train Generalized Mixed Effects Neural Networks. We empirically demonstrate that MC-GMENN outperforms existing mixed effects deep learning models in terms of generalization performance, time complexity, and quantification of inter-cluster variance. Additionally, MC-GMENN is applicable to a wide range of datasets, including multi-class classification tasks with multiple high-cardinality categorical features. For these datasets, we show that MC-GMENN outperforms conventional encoding and embedding methods, simultaneously offering a principled methodology for interpreting the effects of clustering patterns.

List of keywords

Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Classification
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Probabilistic machine learning

2947

Ordinal Maximin Guarantees for Group Fair Division

Pasin Manurangsi, Warut Suksompong

12 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We investigate fairness in the allocation of indivisible items among groups of agents using the notion of maximin share (MMS). While previous work has shown that no nontrivial multiplicative MMS approximation can be guaranteed in this setting for general group sizes, we demonstrate that ordinal relaxations are much more useful. For example, we show that if n agents are distributed equally across g groups, there exists a 1-out-of-k MMS allocation for k = O(g log(n/g)), while if all but a constant number of agents are in the same group, we obtain k = O(log n / log log n). We also establish the tightness of these bounds and provide non-asymptotic results for the case of two groups.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

2950

On Using Admissible Bounds for Learning Forward Search Heuristics

Carlos Núñez-Molina, Masataro Asai, Pablo Mesejo, Juan Fernandez-Olivares

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.

List of keywords

Planning and Scheduling -> PS: Learning in planning and scheduling
Machine Learning -> ML: Knowledge-aided learning
Search -> S: Heuristic search
Machine Learning -> ML: Neuro-symbolic methods

2971

Towards Automatic Composition of ASP Programs from Natural Language Specifications

Manuel Borroto Santana, Irfan Kareem, Francesco Ricca

6 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

This paper moves the first step towards automating the composition of Answer Set Programming (ASP) specifications. In particular, the following contributions are provided: (i) A dataset focused on graph-related problem specifications, designed to develop and assess tools for ASP automatic coding; (ii) A two-step architecture, implemented in the NL2ASP tool, for generating ASP programs from natural language specifications. NL2ASP uses neural machine translation to transform natural language into Controlled Natural Language (CNL) statements. Subsequently, CNL statements are converted into ASP code using the CNL2ASP tool. An experimental analysis confirms the viability of the approach.

List of keywords

Natural Language Processing -> NLP: Applications
Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Generative models

2975

ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

Mengqi Xue, Qihan Huang, Haofei Zhang, Jingwen Hu, Jie Song, Mingli Song, Canghong Jin

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Prototypical part network (ProtoPNet) and its variants have drawn wide attention and been applied to various tasks due to their inherent self-explanatory property. Previous ProtoPNets are primarily built upon convolutional neural networks (CNNs). Therefore, it is natural to investigate whether these explainable methods can be advantageous for the recently emerged Vision Transformers (ViTs). However, directly utilizing ViT-backed models as backbones can lead to prototypes paying excessive attention to background positions rather than foreground objects (i.e., the “distraction” problem). To address the problem, this paper proposes prototypical part Transformer (ProtoPFormer) for interpretable image recognition. Based the architectural characteristics of ViTs, we modify the original ProtoPNet by creating separate global and local branches, each accompanied by corresponding prototypes that can capture and highlight representative holistic and partial features. Specifically, the global prototypes can guide local prototypes to concentrate on the foreground and effectively suppress the background influence. Subsequently, local prototypes are explicitly supervised to concentrate on different discriminative visual parts. Finally, the two branches mutually correct each other and jointly make the final decisions. Moreover, extensive experiments demonstrate that ProtoPFormer can consistently achieve superior performance on accuracy, visualization results, and quantitative interpretability evaluation over the state-of-the-art (SOTA) baselines. Our code has been released at https://github.com/zju-vipa/ProtoPFormer.

List of keywords

Computer Vision -> CV: Interpretability and transparency
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

2979

Epistemic Logic Programs: Non-Ground and Counting Complexity

Thomas Eiter, Johannes K. Fichte, Markus Hecher, Stefan Woltran

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Answer Set Programming (ASP) is a prominent problem-modeling and solving framework, whose solutions are called answer sets. Epistemic logic programs (ELP) extend ASP to reason about all or some answer sets. Solutions to an ELP can be seen as consequences over multiple collections of answer sets, known as world views. While the complexity of propositional programs is well studied, the non-ground case remains open. This paper establishes the complexity of non-ground ELPs. We provide a comprehensive picture for well-known program fragments, which turns out to be complete for the class NEXPTIME with access to oracles up to SigmaP2. In the quantitative setting, we establish complexity results for counting complexity beyond #EXP. To mitigate high complexity, we establish results in case of bounded predicate arity, reaching up to the fourth level of the polynomial hierarchy. Finally, we provide ETH-tight runtime results for the parameter treewidth, which has applications in quantitative reasoning, where we reason on (marginal) probabilities of epistemic literals.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

2980

HVOFusion: Incremental Mesh Reconstruction Using Hybrid Voxel Octree

Shaofan Liu, Junbo Chen, Jianke Zhu

6 min. talk | August 6th at 11:30 | Session: ROB: Robotics (1/2)

[+] More

[-] Less

Incremental scene reconstruction is essential to the navigation in robotics. Most of the conventional methods typically make use of either TSDF (truncated signed distance functions) volume or neural networks to implicitly represent the surface. Due to the voxel representation or involving with time-consuming sampling, they have difficulty in balancing speed, memory storage, and surface quality. In this paper, we propose a novel hybrid voxel-octree approach to effectively fuse octree with voxel structures so that we can take advantage of both implicit surface and explicit triangular mesh representation. Such sparse structure preserves triangular faces in the leaf nodes and produces partial meshes sequentially for incremental reconstruction. This storage scheme allows us to naturally optimize the mesh in explicit 3D space to achieve higher surface quality. We iteratively deform the mesh towards the target and recovers vertex colors by optimizing a shading model. Experimental results on several datasets show that our proposed approach is capable of quickly and accurately reconstructing a scene with realistic colors. Code is available at https://github.com/Frankuzi/HVOFusion

List of keywords

Robotics -> ROB: Localization, mapping, state estimation
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Robotics -> ROB: Robotics and vision

2991

Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation

Yaohua Liu, Jiaxin Gao, Xuan Liu, Xianghao Jiao, Xin Fan, Risheng Liu

6 min. talk | August 7th at 11:30 | Session: CV: Adversarial learning, adversarial attack and defense methods

[+] More

[-] Less

Transfer attacks generate significant interest for real-world black-box applications by crafting transferable adversarial examples through surrogate models. Whereas, existing works essentially directly optimize the single-level objective w.r.t. the surrogate model, which always leads to poor interpretability of attack mechanism and limited generalization performance over unknown victim models. In this work, we propose the BilEvel Transfer AttacK (BETAK) framework by establishing an initialization derived bilevel optimization paradigm, which explicitly reformulates the nested constraint relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker. Algorithmically, we introduce the Hyper Gradient Response (HGR) estimation as an effective feedback for the transferability over pseudo-victim attackers, and propose the Dynamic Sequence Truncation (DST) technique to dynamically adjust the back-propagation path for HGR and reduce computational overhead simultaneously. Meanwhile, we conduct detailed algorithmic analysis and provide convergence guarantee to support non-convexity of the LL surrogate attacker. Extensive evaluations demonstrate substantial improvement of BETAK (e.g., 53.41% increase of attack success rates against IncRes-v2_ens victim) against different victims and defense methods in targeted and untargeted attack scenarios.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Machine learning for vision

2993

Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss

Lefan Zhang, Zhang-Hao Tian, Wujun Zhou, Wei Wang

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Existing methods are usually dependent on label frequency to tackle class imbalance, while the model bias on different classes is not directly related to label frequency and the true label frequency is inaccessible under label noise. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.

List of keywords

Machine Learning -> ML: Classification

2994

Preferred Reasoning in ABA by Cycle-Breaking

Kiet Nguyen Anh, Markus Ulbricht

12 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

We develop a fixed-parameter tractable (FPT) algorithm for skeptical preferred reasoning in assumption-based argumentation (ABA). To this end we make use of so-called backdoors, i.e. sets of assumptions that need to be evaluated s.t. the remaining ABA framework (ABAF) belongs to a computational beneficial sub-class. In order to identify such target classes, we employ a suitable notion of a dependency graph of an ABAF. We show that these graphs can be constructed in polynomial time and that one can efficiently check sufficient properties ensuring that reasoning in the underlying ABAF is tractable. After establishing the theoretical foundations, we test our implementation against the ASPforABA solver which convincingly won the ABA track of the ICCMA’23 competition. As it turns out, our algorithm outperforms ASPforABA on instances with small backdoor sizes.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

3000

PRASS: Probabilistic Risk-averse Robust Learning with Stochastic Search

Tianle Zhang, Yanghao Zhang, Ronghui Mu, Jiaxu Liu, Jonathan Fieldsend, Wenjie Ruan

6 min. talk | August 9th at 10:00 | Session: ETF: Trustworthy AI

[+] More

[-] Less

Deep learning models, despite their remarkable success in various tasks, have been shown to be vulnerable to adversarial perturbations. Although robust learning techniques that consider adversarial risks against worst-case perturbations can effectively increase a model’s robustness, they may not always be the most suitable approach. This is due to the fact that in certain scenarios, perturbations are more likely to occur probabilistically rather than being intentionally crafted by attackers. To address this challenge, we propose a novel risk-averse robust learning method based on entropic value-at-risk, called PRASS (Probabilistical Risk-Averse Robust Learning with Stochastic Search). Our approach leverages principles of stochastic optimisation and considers perturbing distributions rather than solely worst-case adversaries. By applying adaptive stochastic search to parameterised distributions, we further enhance the scalability of PRASS to handle distributional robustness. Empirical experiments demonstrate that PRASS outperforms existing state-of-the-art baselines.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Robustness

3009

CPa-WAC: Constellation Partitioning-based Scalable Weighted Aggregation Composition for Knowledge Graph Embedding

Sudipta Modak, Aakarsh Malhotra, Sarthak Malik, Anil Surisetty, Esam Abdel-Raheem

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Scalability and training time are crucial for any graph neural network model processing a knowledge graph (KG). While partitioning knowledge graphs helps reduce the training time, the prediction accuracy reduces significantly compared to training the model on the whole graph. In this paper, we propose CPa-WAC: a lightweight architecture that incorporates graph convolutional networks and modularity maximization-based constellation partitioning to harness the power of local graph topology. The proposed CPa-WAC method reduces the training time and memory cost of knowledge graph embedding, making the learning model scalable. The results from our experiments on standard databases, such as Wordnet and Freebase, show that by achieving meaningful partitioning, any knowledge graph can be broken down into subgraphs and processed separately to learn embeddings. Furthermore, these learned embeddings can be used for knowledge graph completion, retaining similar performance compared to training a GCN on the whole KG, while speeding up the training process by upto five times. Additionally, the proposed CPa-WAC method outperforms several other state-of-the-art KG in terms of prediction accuracy.

List of keywords

Knowledge Representation and Reasoning -> KRR: Applications
Uncertainty in AI -> UAI: Graphical models
Machine Learning -> ML: Knowledge-aided learning
Data Mining -> DM: Knowledge graphs and knowledge base completion

3020

Mechanisms That Play a Game, Not Toss a Coin

Toby Walsh

6 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

Randomized mechanisms can have good normative properties compared to their deterministic counter-parts. However, randomized mechanisms are problematic in several ways such as in their verifiability. We propose here to de-randomize such mechanisms by having agents play a game instead of tossing a coin. The game is designed so agents play randomly, and this play injects “randomness” into the mechanism. Surprisingly this de-randomization retains many of the good normative properties of the original randomized mechanism but gives a mechanism that is deterministic and easy, for instance, to audit. We consider three general purpose methods to de-randomize mechanisms, and apply these to six different domains: voting, facility location, task allocation, school choice, peer selection, and resource allocation. We propose a number of novel de-randomized mechanisms for these six domains with good normative properties (such as equilibria in which agents sincerely report preferences over the original problem). In one domain, we additionally show that a new and desirable normative property emerges as a result of de-randomization.property emerges as a result of de-randomization.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Fair division

3024

Scalable Landmark Hub Labeling for Optimal and Bounded Suboptimal Pathfinding

Sabine Storandt

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

Hub Labeling and A* are two well-established algorithms for shortest path computation in large graphs. Hub Labeling offers excellent query times for distance computation, but at the cost of a high space consumption for label storage. Landmark-based A* search requires less space but answers queries much slower. Recently, Landmark Hub Labeling (LHL) has been proposed, which combines both concepts and achieves a smaller space consumption than Hub Labeling and also much better query times than A*. However, the known algorithms for computing a LHL do not scale to large graphs, limiting its applicability. In this paper, we devise novel algorithms for LHL construction that work on graphs with millions of edges. We also further improve the LHL query answering algorithm and investigate how to reduce the space consumption of labeling techniques by performing bounded suboptimal pathfinding. In an extensive experimental study, we demonstrate the effectiveness of our methods and illuminate that sensible trade-offs between space consumption, query time, and path quality can be achieved with LHL.

List of keywords

Planning and Scheduling -> PS: Routing
Multidisciplinary Topics and Applications -> MTA: Transportation
Search -> S: Applications
Search -> S: Combinatorial search and optimisation

3030

Toward Completing the Picture of Control in Schulze and Ranked Pairs Elections

Cynthia Maushagen, David Niclaus, Paul Nüsken, Jörg Rothe, Tessa Seeger

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

Both Schulze and ranked pairs are voting rules that satisfy many natural, desirable axioms. Many standard types of electoral control (with a chair seeking to change the outcome of an election by interfering with the election structure) have already been studied. However, for control by replacing candidates or voters and for (exact) multimode control that combines multiple standard attacks, many questions remain open. We solve a number of these open cases for Schulze and ranked pairs. In addition, we fix a flaw in the reduction of Menton and Singh showing that Schulze is resistant to constructive control by deleting candidates and re-establish a vulnerability result for destructive control by deleting candidates. In some of our proofs, we study variants of s-t vertex cuts in graphs that are related to our control problems.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

3049

Attention Based Document-level Relation Extraction with None Class Ranking Loss

Xiaolong Xu, Chenbin Li, Haolong Xiang, Lianyong Qi, Xuyun Zhang, Wanchun Dou

6 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Through document-level relation extraction (RE), the analysis of the global relation between entities in the text is feasible, and more comprehensive and accurate semantic information can be obtained. In document-level RE, the model needs to infer the implicit relations between two entities in different sentences. To obtain more semantic information, existing methods mainly focus on exploring entity representations. However, they ignore the correlations and indivisibility between relations, entities and contexts. Furthermore, current methods only independently estimate the cases of predefined relations, ignoring the case of "no relation”, which results in poor prediction. To address the above issues, we propose a document-level RE method based on attention mechanisms, which considers the case of "no relation”. Specifically, our approach leverages graph attention and multi-head attention networks to capture the correlations and indivisibility among relations, entities, and contexts, respectively. In addition, a novel multi-label loss function that promotes large margins in label confidence scores between each predefined class and the none class is employed to improve the prediction performance. Extensive experiments conducted on benchmarking datasets demonstrate that our proposed method outperforms the state-of-the-art baselines with higher accuracy.

List of keywords

Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Embeddings

3057

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

12 min. talk | August 8th at 10:00 | Session: MTA: Security and privacy

[+] More

[-] Less

Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areas that vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address this limitation, we propose Delocate, a novel Deepfake detection model that can both recognize and localize unknown domain Deepfake videos. Our method consists of two stages named recovering and localization. In the recovering stage, the model randomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for real faces and a poor recovery effect for fake faces. In the localization stage, the output of the recovery phase and the forgery ground truth mask serve as supervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Security and privacy
Computer Vision -> CV: Biometrics, face, gesture and pose recognition

3058

Deep Multi-Dimensional Classification with Pairwise Dimension-Specific Features

Teng Huang, Bin-Bin Jia, Min-Ling Zhang

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

In multi-dimensional classification (MDC), each instance is associated with multiple class variables characterizing the semantics of objects from different dimensions. To consider the dependencies among class variables and the specific characteristics contained in different semantic dimensions, a novel deep MDC approach named PIST is proposed to jointly deal with the two issues via learning pairwise dimension-specific features. Specifically, PIST conducts pairwise grouping to model the dependencies between each pair of class variables, which are more reliable with limited training samples. For extracting pairwise dimension-specific features, PIST weights the feature embedding with a feature importance vector, which is learned via utilizing a global loss measurement based on intra-class and inter-class covariance. Final prediction w.r.t. each dimension is determined by combining the joint probabilities related to this dimension. Comparative studies with eleven real-world MDC data sets clearly validate the effectiveness of the proposed approach.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label learning

3083

Fast One-Stage Unsupervised Domain Adaptive Person Search

Tianxiang Cui, Huibing Wang, Jinjia Peng, Ruoxi Deng, Xianping Fu, Yang Wang

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increase model complexity. To address this issue, we propose a Fast One-stage Unsupervised person Search (FOUS) which complementaryly integrates domain adaption with label adaption within an end-to-end manner without iterative clustering. To minimize the domain discrepancy, FOUS introduced an Attention-based Domain Alignment Module (ADAM) which can not only align various domains for both detection and ReID tasks but also construct an attention mechanism to reduce the adverse impacts of low-quality candidates resulting from unsupervised detection. Moreover, to avoid the redundant iterative clustering mode, FOUS adopts a prototype-guided labeling method which minimizes redundant correlation computations for partial samples and assigns noisy coarse label groups efficiently. The coarse label groups will be continuously refined via label-flexible training network with an adaptive selection strategy. With the adapted domains and labels, FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW. The code is available at https://github.com/whbdmu/FOUS.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Image and video retrieval

3090

Discriminative Feature Decoupling Enhancement for Speech Forgery Detection

Yijun Bei, Xing Zhou, Erteng Liu, Yang Gao, Sen Lin, Kewei Gao, Zunlei Feng

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

The emergence of AIGC has brought attention to the issue of generating realistic deceptive content. While AIGC has the potential to revolutionize content creation, it also facilitates criminal activities. Specifically, the manipulation of speech has been exploited in tele-fraud and financial fraud schemes, posing a significant threat to societal security. Current deep learning-based methods for detecting forged speech extract mixed features from the original speech, which often contain redundant information. Moreover, these methods fail to consider the distinct characteristics of human voice-specific features and the diversity of background environmental sounds. This paper introduces a framework called Discriminative fEature dEcoupling enhanceMent (DEEM) for detecting speech forgery. Initially, the framework decouples the original speech into human voice features and background sound features. Subsequently, DEEM enhances voice-specific features through temporal dimension aggregation and improves continuity-related features in the background sound map via spectral-dimension aggregation. By employing the decoupling enhancement features, extensive experiments demonstrate that DEEM achieves an accuracy improvement of over 5% on FoR dataset compared to the state-of-the-art methods.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: AI and law, governance, regulation
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI

3100

Rethinking Correlation Learning via Label Prior for Open Set Domain Adaptation

Zi-Xian Huang, Chuan-Xian Ren

12 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

Open Set Domain Adaptation (OSDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain, where known classes exist across domains while unknown classes are present only in the target domain. Existing methods rely on the clustering structure to identify the unknown classes, which empirically induces a large identification error if the unknown classes are a mixture of multiple components. To break through this barrier, we formulate OSDA from the view of correlation and propose a correlation metric-based framework called Balanced Correlation Learning (BCL). BCL employs Hilbert-Schmidt Independence Criterion (HSIC) to characterize the separation between unknown and known classes, where HSIC is reformulated as the nodes’ relation on graph. By considering the label prior as variable, theoretical results are derived to analytically show a sufficient condition for desired learning direction for OSDA. Methodologically, the class-balanced HSIC is proposed to preserve domain-invariant and class-discriminative features. With the guarantee of correlation learning, the entropy-based principle can effectively identify the unknown classes via uncertainty. Empirically, extensive evaluations are conducted, where BCL achieves significant performance improvements.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

3106

Exploring the Role of Node Diversity in Directed Graph Representation Learning

Jincheng Huang, Yujie Mo, Ping Hu, Xiaoshuang Shi, Shangbo Yuan, Zeyu Zhang, Xiaofeng Zhu

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Many methods of Directed Graph Neural Networks (DGNNs) are designed to equally treat nodes in the same neighbor set (i.e., out-neighbor set and in-neighbor set) for every node, without considering the node diversity in directed graphs, so they are often unavailable to adaptively acquire suitable information from neighbors of different directions. To alleviate this issue, in this paper, we investigate a new way to first consider node diversity for representation learning on directed graphs, i.e., neighbor diversity and degree diversity, and then propose a new NDDGNN framework to adaptively assign weights to both outgoing information and incoming information at the node level. Extensive experiments on seven real-world datasets validate the superior performance of our method compared to state-of-the-art methods in terms of both node classification and link prediction tasks.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Semi-supervised learning

3110

With a Little Help from Language: Semantic Enhanced Visual Prototype Framework for Few-Shot Learning

Hecheng Cai, Yang Liu, Shudong Huang, Jiancheng Lv

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Few-shot learning (FSL) aims to recognize new categories given limited training samples. The core challenge is to avoid overfitting to the minimal data while ensuring good generalization to novel classes. One mainstream method employs prototypes from visual feature extractors as classifier weight and the performance depends on the quality of the prototype. Since different categories may have similar visual features, the visual prototype has limitations. This is because existing methods only learn a simple visual feature extractor during the pre-training stage but neglect the importance of a well-developed feature space for the prototype. We introduce the Semantic Enhanced Visual Prototype framework (SEVpro) to address this issue. SEVpro refines prototype learning from the pre-training stage and serves as a versatile plug-and-play framework for all prototype-based FSL methods. Specifically, we enhance prototype discriminability by transforming semantic embeddings into the visual space, aiding in separating categories with similar visual features. For novel class learning, we leverage knowledge from base classes and incorporate semantic information to elevate prototype quality further. Meanwhile, extensive experiments on FSL benchmarks and ablation studies demonstrate the superiority of our proposed SEVpro for FSL.

List of keywords

Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Multi-modal learning

3113

SAEIR: Sequentially Accumulated Entropy Intrinsic Reward for Cooperative Multi-Agent Reinforcement Learning with Sparse Reward

Xin He, Hongwei Ge, Yaqing Hou, Jincheng Yu

6 min. talk | August 6th at 11:30 | Session: ML: Multiagent Reinforcement Learning

[+] More

[-] Less

Multi-agent reinforcement learning (MARL) performs well for solving complex cooperative tasks when the scenarios have well-defined dense rewards. However, there are usually sparse reward settings in many real-world multi-agent systems, which makes it difficult for MARL algorithms to successfully learn an effective strategy. To tackle this problem, we propose a novel sequentially accumulated entropy intrinsic reward named SAEIR, which utilizes the entropy of multi-agent system as a bonus to accelerate learning. Specifically, the multi-scale hypergraph critic is proposed to obtain high-order system state representation, which also enhances the ability to effectively evaluate the action produced by the actor. Based on the comprehensive and compact system state representation, the orderliness of multi-agent systems can be measured to determine the highly valuable states for adding entropy-based intrinsic rewards which leads to a highly efficient learning process. Empirical results demonstrate that our proposed method achieves state-of-the-art performance in several complex cooperative multi-agent environments with sparse reward settings.

List of keywords

Machine Learning -> ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

3119

Towards Dynamic-Prompting Collaboration for Source-Free Domain Adaptation

Mengmeng Zhan, Zongqian Wu, Rongyao Hu, Ping Hu, Heng Tao Shen, Xiaofeng Zhu

12 min. talk | August 6th at 15:00 | Session: CV: Transfer, low-shot, semi- and un- supervised learning

[+] More

[-] Less

In domain adaptation, challenges such as data privacy constraints can impede access to source data, catalyzing the development of source-free domain adaptation (SFDA) methods. However, current approaches heavily rely on models trained on source data, posing the risk of overfitting and suboptimal generalization.This paper introduces a dynamic prompt learning paradigm that harnesses the power of large-scale vision-language models to enhance the semantic transfer of source models. Specifically, our approach fosters robust and adaptive collaboration between the source-trained model and the vision-language model, facilitating the reliable extraction of domain-specific information from unlabeled target data, while consolidating domain-invariant knowledge. Without the need for accessing source data, our method amalgamates the strengths inherent in both traditional SFDA approaches and vision-language models, formulating a collaborative framework for addressing SFDA challenges. Extensive experiments conducted on three benchmark datasets showcase the superiority of our framework over previous SOTA methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Representation learning

3125

Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES—that all input features are task-relevant—poses challenges, especially when confronted with irrelevant features common in real-world problems. This work scrutinizes this limitation, particularly focusing on the Natural Evolution Strategies (NES) variant. We propose NESHT, a novel approach that integrates Hard-Thresholding (HT) with NES to champion sparsity, ensuring only pertinent features are employed. Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks. Our code is available at https://github.com/cangcn/NES-HT.

List of keywords

Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Machine Learning -> ML: Learning sparse models
Machine Learning -> ML: Optimization

3146

SceneDiff: Generative Scene-Level Image Retrieval with Text and Sketch Using Diffusion Models

Ran Zuo, Haoxiang Hu, Xiaoming Deng, Cangjun Gao, Zhengming Zhang, Yu-Kun Lai, Cuixia Ma, Yong-Jin Liu, Hongan Wang

6 min. talk | August 6th at 11:30 | Session: CV: Video analysis and understanding

[+] More

[-] Less

Jointly using text and sketch for scene-level image retrieval utilizes the complementary between text and sketch to describe the fine-grained scene content and retrieve the target image, which plays a pivotal role in accurate image retrieval. Existing methods directly fuse the features of sketch and text and thus suffer from the bottleneck of limited utilization for crucial semantic and structural information, leading to inaccurate matching with images. In this paper, we propose SceneDiff, a novel retrieval network that leverages a pre-trained diffusion model to establish a shared generative latent space, enabling a joint latent representation learning for both sketch and text features and precise alignment with the corresponding image. Specifically, we encode text, sketch and image features, and project them into the diffusion-based share space, conditioning the denoising process on sketch and text features to generate latent fusion features, while employing the pre-trained autoencoder for latent image features. Within this space, we introduce the content-aware feature transformation module to reconcile encoded sketch and image features with the diffusion latent space’s dimensional requirements and preserve their visual content information. Then we augment the representation capability of the generated latent fusion features by integrating multiple samplings with partition attention, and utilize contrastive learning to align both direct fusion features and generated latent fusion features with corresponding image representations. Our method outperforms the state-of-the-art works through extensive experiments, providing a novel insight into the related retrieval field.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Multimodal learning

3164

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90~160× faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.

List of keywords

Machine Learning -> ML: Geometric learning
Computer Vision -> CV: 3D computer vision
Multidisciplinary Topics and Applications -> MTA: Real-time systems
Robotics -> ROB: Robotics and vision

3165

ADMN: Agent-Driven Modular Network for Dynamic Parameter Sharing in Cooperative Multi-Agent Reinforcement Learning

Yang Yu, Qiyue Yin, Junge Zhang, Pei Xu, Kaiqi Huang

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Parameter sharing is a common strategy in multi-agent reinforcement learning (MARL) to make the training more efficient and scalable. However, applying parameter sharing among agents indiscriminately hinders the emergence of agents diversity and degrades the final cooperative performance. To better balance parameter sharing and agents diversity, we propose a novel Agent-Driven Modular Network (ADMN), where agents share a base network consisting of multiple specialized modules, and each agent has its own routing to connect these modules. In ADMN, modules are shared among agents to improve the training efficiency, while the combination of different modules brings rich diversity. The agent routing at different time steps is learned end-to-end to achieve a dynamic and adaptive balance. Specifically, we also propose an information-theoretical regularization between the routing of agents and their behavior to further guarantee the identifiability of different routing. We evaluated ADMN in challenging StarCraft micromanagement games and Google Research Football games, and results demonstrate the superior performance of ADMN, particularly in larger or heterogeneous cooperative tasks.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning

3180

Unlearning from Weakly Supervised Learning

Yi Tang, Yi Gao, Yong-gang Luo, Ju-Cheng Yang, Miao Xu, Min-Ling Zhang

12 min. talk | August 8th at 10:00 | Session: ML: Weakly supervised learning

[+] More

[-] Less

Machine unlearning provides users with the right to remove their privacy data from a well-trained model. Existing approaches of machine unlearning mainly focus on exploring data removing within supervised learning (SL) tasks. However, weakly supervised learning (WSL) is more applicable to real-world scenarios since collecting WSL data is less laborious than collecting fully supervised data. In this paper, we first propose a machine unlearning approach for WSL by updating the model parameters. Motivated by the uniform distributions of untrained model predictions, we derive a formulated target to force the model’s predictions of removed data to be indistinguishable. This encourages the model to forget its ability to recognize features of data slated for unlearning. Moreover, we employ formulated targets to transform the classification unlearning into the convex regression, which can significantly reduce computational cost and avoid extra information storage during the training process. Additionally, we discuss how to design a target to ensure the models’ predictions of removed data being indistinguishable in different learning scenarios, e.g., SL or WSL. As the flexibility in formulating targets, the proposed approach effectively deals with the WSL problem while still excels in SL models. Empirical studies show the superiority of the proposed approach.

List of keywords

Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Other

3202

A Prior-information-guided Residual Diffusion Model for Multi-modal PET Synthesis from MRI

Zaixin Ou, Caiwen Jiang, Yongsheng Pan, Yuanwang Zhang, Zhiming Cui, Dinggang Shen

6 min. talk | August 9th at 11:30 | Session: ML: Generative models

[+] More

[-] Less

Alzheimer’s disease (AD) leads to abnormalities in various biomarkers (i.e., amyloid-β and tau proteins), which makes PET imaging (which can detect these biomarkers) essential in AD diagnosis. However, the high radiation risk of PET imaging limits its scanning number within a short period, presenting challenges to the joint multi-biomarker diagnosis of AD. In this paper, we propose a novel unified model to simultaneously synthesize multi-modal PET images from MRI, to achieve low-cost and time-efficient joint multi-biomarker diagnosis of AD. Specifically, we incorporate residual learning into the diffusion model to emphasize inter-domain differences between PET and MRI, thereby forcing each modality to maximally reconstruct its modality-specific details. Furthermore, we leverage prior information, such as age and gender, to guide the diffusion model in synthesizing PET images with semantic consistency, enhancing their diagnostic value. Additionally, we develop an intra-domain difference loss to ensure that the intra-domain differences among synthesized PET images closely match those among real PET images, promoting more accurate synthesis, especially at the modality-specific information. Extensive experiments conducted on the ADNI dataset demonstrate that our method achieves superior performance both quantitatively and qualitatively compared to the state-of-the-art methods. All codes for this study have been uploaded to GitHub (https://github.com/Ouzaixin/ResDM).

List of keywords

Machine Learning -> ML: Generative models
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Supervised Learning

3203

Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization

Xuelin Zhang, Hong Chen, Bin Gu, Tieliang Gong, Feng Zheng

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Stochastic bilevel optimization (SBO) has been integrated into many machine learning paradigms recently including hyperparameter optimization, meta learning, reinforcement learning, etc. Along with the wide range of applications, there have been abundant studies on concerning the computing behaviors of SBO. However, the generalization guarantees of SBO methods are far less understood from the lens of statistical learning theory. In this paper, we provide a systematical generalization analysis of the first-order gradient-based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generalization gap of SBO methods. Then, we derive the upper bounds of on-average argument stability for single timescale stochastic gradient descent (SGD) and two timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex-convex (C-C) and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimental analysis validates our theoretical findings. Compared with the previous algorithmic stability analysis, our results do not require the re-initialization of the inner-level parameters before each iteration and are suit for more general objective functions.

List of keywords

Machine Learning -> ML: Learning theory

3228

Boosting Diffusion Models with an Adaptive Momentum Sampler

Xiyu Wang, Anh-Dung Dinh, Daochang Liu, Chang Xu

12 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

Diffusion probabilistic models (DPMs) have been shown to generate high-quality images without the need for delicate adversarial training. The sampling process of DPMs is mathematically similar to Stochastic Gradient Descent (SGD), with both being iteratively updated with a function increment. Building on this, we present a novel reverse sampler for DPMs in this paper, drawing inspiration from the widely-used Adam optimizer. Our proposed sampler can be readily applied to a pre-trained diffusion model, utilizing momentum mechanisms and adaptive updating to enhance the generated image’s quality. By effectively reusing update directions from early steps, our proposed sampler achieves a better balance between high-level semantics and low-level details. Additionally, this sampler is flexible and can be easily integrated into pre-trained DPMs regardless of the sampler used during training. Our experimental results on multiple benchmarks demonstrate that our proposed reverse sampler yields remarkable improvements over different baselines.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation

3230

Image Retrieval with Self-Supervised Divergence Minimization and Cross-Attention Classification

Vivek Trivedy, Longin Jan Latecki

12 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Common approaches to image retrieval include contrastive methods and specialized loss functions such as ranking losses and entropy regularizers. We present DMCAC (Divergence Minimization with Cross-Attention Classification), a novel image retrieval method that offers a new perspective on this training paradigm. We use self-supervision with a novel divergence loss framework alongside a simple data flow adjustment that minimizes a distribution over a database directly during training. We show that jointly learning a query representation over a database is a competitive and often improved alternative to traditional contrastive methods for image retrieval. We evaluate our method across several model configurations and four datasets, achieving state-of-the-art performance in multiple settings. We also conduct a thorough set of ablations that show the robustness of our method across full vs. approximate retrieval and different hyperparameter configurations.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Representation learning

3244

Learning Label Dependencies for Visual Information Extraction

Minghong Yao, Liansheng Zhuang, Houqiang Li, Jiuchang Wei

6 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

Visual Information Extraction (VIE), which aims to extract structured information from visually rich document images, has drawn much attention due to its wide applications in document understanding. However, previous methods often treat the VIE task as a sequence labeling problem and ignore the label correlations in the sequence, which may significantly degrade their performance. To address this issue, this paper proposes a novel framework to exploit the potential of label correlations to improve the VIE models’ performance. Its key idea is to learn the label dependency of entities, and use it to regularize the label sequence. Specifically, to capture the label dependency of entities, a label transformer is pre-trained to assign a higher likelihood to the label sequence that respects the label patterns of document layouts. During testing stages, an inference transformer is used to predict the label sequence by considering not only the features of each entity but also the likelihood of the label sequence evaluated by the label transformer. Our framework can be combined with existing popular VIE models such as LayoutLM and GeoLayoutLM. Extensive experiments on public datasets have demonstrated the effectiveness of our framework.

List of keywords

Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Information extraction

3245

Nonparametric Detection of Gerrymandering in Multiparty Plurality Elections

Dariusz Stolicki, Wojciech Słomczyński, Stanisław Szufa

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

Partisan gerrymandering, i.e., manipulation of electoral district boundaries for political advantage, is one of the major challenges to election integrity in modern day democracies. Yet most of the existing methods for detecting partisan gerrymandering are narrowly tailored toward fully contested two-party elections, and fail if there are more parties or if the number of candidates per district varies. We propose a new method, applying nonparametric statistical learning to detect anomalies in the relation between (aggregate) votes and (aggregate) seats. Unlike in most of the existing methods, we propose to learn the standard of fairness in districting from empirical data rather than assume one a priori. Finally, we test the proposed methods against experimental data as well as real-life data from 17 countries employing the plurality (FPTP) system.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MTA: Social sciences

3261

An Image-enhanced Molecular Graph Representation Learning Framework

Hongxin Xiang, Shuting Jin, Jun Xia, Man Zhou, Jianmin Wang, Li Zeng, Xiangxiang Zeng

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Extracting rich molecular representation is a crucial prerequisite for accurate drug discovery. Recent molecular representation learning methods achieve impressive progress, but the paradigm of learning from a single modality gradually encounters the bottleneck of limited representation capabilities. In this work, we fully consider the rich visual information contained in 3D conformation molecular images (i.e., texture, shadow, color and planar spatial information) and distill graph-based models for more discriminative drug discovery. Specifically, we propose an image-enhanced molecular graph representation learning framework that leverages multi-view molecular images rendered from 3D conformations to boost molecular graph representations. To extract useful auxiliary knowledge from multi-view images, we design a teacher, which is pre-trained on 2 million molecules with conformations through five meticulously designed pre-training tasks. To transfer knowledge from teacher to graph-based students, we pose an efficient cross-modal knowledge distillation strategy with knowledge enhancer and task enhancer. It is worth noting that the distillation architecture of IEM can be directly integrated into existing graph-based models, and significantly improves the capabilities of these models (e.g. GIN, EdgePred, GraphMVP, MoleBERT) for molecular representation learning. In particular, GraphMVP and MoleBERT equipped with IEM achieve new state-of-the-art performance on MoleculeNet benchmark, achieving average 73.89% and 73.81% ROC-AUC, respectively. Code is available at https://github.com/HongxinXiang/IEM.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Representation learning

3265

Formalisation and Evaluation of Properties for Consequentialist Machine Ethics

Raynaldio Limarga, Yang Song, Abhaya Nayak, David Rajaratnam, Maurice Pagnucco

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

As artificial intelligence (AI) technologies continue to influence our daily lives, there has been a growing need to ensure that AI enabled decision making systems adhere to principles expected of human decision makers. This need has given rise to the area of Machine Ethics. We formalise several ethical principles from the philosophical literature in the situation calculus framework to verify the ethical permissibility of a plan. Moreover, we propose several important properties, including some of our own that are intuitively appealing, and a number derived from the social choice literature that would appear to be relevant in evaluating the various approaches. Finally we provide an assessment of how our various situation calculus models of Machine Ethics that we examine satisfy the important properties we have identified.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Moral decision making
Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Knowledge Representation and Reasoning -> KRR: Common-sense reasoning
Knowledge Representation and Reasoning -> KRR: Other

3276

Multi-scale Context-Aware Networks Based on Fragment Association for Human Activity Recognition

Zhiqiong Wang, Hanyu Liu, Boyang Zhao, Qi Shen, Mingzhe Li, Ningfeng Que, Mingke Yan, Junchang Xin

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Sensor-based Human Activity Recognition (HAR) constitutes a key component of many artificial intelligence applications. Although deep feature extraction technology is constantly updated and iterated with excellent results, it is still a difficult task to find a balance between performance and computational efficiency. Through an in-depth exploration of the inherent characteristics of HAR data, we propose a lightweight feature perception model, which encompasses an internal feature extractor and a contextual feature perceiver. The model mainly consists of two stages. The first stage is a hierarchical multi-scale feature extraction module, which is composed of deep separable convolution and multi-head attention mechanism. This module serves to extract conventional features for Human Activity Recognition. After the feature goes through a fragment recombination operation, it is passed into the Context-Aware module of the second stage, which is based on Retentive Transformer and optimized by Dropkey method to efficiently extract the relationship between the feature fragments, so as to mine more valuable feature information. Importantly, this does not add too much complexity to the model, thereby preventing excessive resource consumption. We conducted extensive experimental validation on multiple publicly available HAR datasets.

List of keywords

Humans and AI -> HAI: Applications
Data Mining -> DM: Networks

3281

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

Zicheng Liu, Li Wang, Siyuan Li, Zedong Wang, Haitao Lin, Stan Z. Li

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism’s computational cost limits its practicality for long sequences. Although there are existing attention variants that improve computational efficiency, they have a limited ability to abstract global information effectively based on their hand-crafted mixing strategies. On the other hand, state-space models (SSMs) are tailored for long sequences but cannot capture complicated local information. Therefore, the combination of them as a unified token mixer is a trend in recent long-sequence models. However, the linearized attention degrades performance significantly even when equipped with SSMs. To address the issue, we propose a new method called LongVQ. LongVQ uses the vector quantization (VQ) technique to compress the global abstraction as a length-fixed codebook, enabling the linear-time computation of the attention matrix. This technique effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues. Our experiments on the Long Range Arena benchmark, autoregressive language modeling, and image and speech classification demonstrate the effectiveness of LongVQ. Our model achieves significant improvements over other sequence models, including variants of Transformers, Convolutions, and recent State Space Models.

List of keywords

Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Representation learning
Natural Language Processing -> NLP: Embeddings

3286

Unlearning during Learning: An Efficient Federated Machine Unlearning Method

Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the "right to be forgotten," the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders them less practical in real FL scenarios. In this paper, we introduce FedAU, an innovative and efficient FMU framework aimed at overcoming these limitations. Specifically, FedAU incorporates a lightweight auxiliary unlearning module into the learning process and employs a straightforward linear operation to facilitate unlearning. This approach eliminates the requirement for extra time-consuming steps, rendering it well-suited for FL. Furthermore, FedAU exhibits remarkable versatility. It not only enables multiple clients to carry out unlearning tasks concurrently but also supports unlearning at various levels of granularity, including individual data samples, specific classes, and even at the client level. We conducted extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets to evaluate the performance of FedAU. The results demonstrate that FedAU effectively achieves the desired unlearning effect while maintaining model accuracy.

List of keywords

Machine Learning -> ML: Federated learning
Multidisciplinary Topics and Applications -> MTA: Security and privacy

3291

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Zihe Liu, Jie Lu, Guangquan Zhang, Junyu Xuan

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Deep reinforcement learning is used in various domains, but usually under the assumption that the environment has stationary conditions like transitions and state distributions. When this assumption is not met, performance suffers. For this reason, tracking continuous environmental changes and adapting to unpredictable conditions is challenging yet crucial because it ensures that systems remain reliable and flexible in practical scenarios. Our research introduces Behavior-Aware Detection and Adaptation (BADA), an innovative framework that merges environmental change detection with behavior adaptation. The key inspiration behind our method is that policies exhibit different global behaviors in changing environments. Specifically, environmental changes are identified by analyzing variations between behaviors using Wasserstein distances without manually set thresholds. The model adapts to the new environment through behavior regularization based on the extent of changes. The results of a series of experiments demonstrate better performance relative to several current algorithms. This research also indicates significant potential for tackling this long-standing challenge.

List of keywords

Machine Learning -> ML: Reinforcement learning

3294

Welfare Loss in Connected Resource Allocation

Xiaohui Bei, Alexander Lam, Xinhang Lu, Warut Suksompong

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We study the allocation of indivisible goods that form an undirected graph and investigate the worst-case welfare loss when requiring that each agent must receive a connected subgraph. Our focus is on both egalitarian and utilitarian welfare. Specifically, we introduce the concept of egalitarian (resp., utilitarian) price of connectivity, which captures the worst-case ratio between the optimal egalitarian (resp., utilitarian) welfare among all allocations and that among the connected allocations. We provide tight or asymptotically tight bounds on the price of connectivity for various large classes of graphs when there are two agents, and for paths, stars and cycles in the general case. Many of our results are supplemented with algorithms which find connected allocations with a welfare guarantee corresponding to the price of connectivity.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation

3297

CausalNET: Unveiling Causal Structures on Event Sequences by Topology-Informed Causal Attention

Hua Zhu, Hong Huang, Kehan Yin, Zejun Fan, Hai Jin, Bang Liu

6 min. talk | August 7th at 10:00 | Session: UAI: Causality, structural causal models and causal inference

[+] More

[-] Less

Causal discovery on event sequences holds a pivotal significance across domains such as healthcare, finance, and industrial systems. The crux of this endeavor lies in unraveling causal structures among event types, typically portrayed as directed acyclic graphs (DAGs). Nonetheless, prevailing methodologies often grapple with untenable assumptions and intricate optimization hurdles. To address these challenges, we present a novel model named CausalNET. At the heart of CausalNET is a special prediction module based on the Transformer architecture, which prognosticates forthcoming events by leveraging historical occurrences, with its predictive prowess amplified by a trainable causal graph engineered to fathom causal relationships among event types. Further, to augment the predictive paradigm, we devise a causal decay matrix to encapsulate the reciprocal influence of events upon each other within the topological network. During training, we alternatively refine the prediction module and fine-tune the causal graph. Comprehensive evaluation on a spectrum of real-world and synthetic datasets underscores the superior performance and scalability of CausalNET, which marks a promising step forward in the realm of causal discovery. Code and Appendix are available at https://github.com/CGCL-codes/CausalNET.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Machine Learning -> ML: Causality

3305

Personalized Heart Disease Detection via ECG Digital Twin Generation

Yaojun Hu, Jintai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

6 min. talk | August 9th at 11:30 | Session: MTA: Health and medicine

[+] More

[-] Less

Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals’ anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. The code can be found at https://github.com/huyjj/LAVQ-Editor.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Machine Learning -> ML: Applications
Machine Learning -> ML: Generative models

3314

Label Distribution Learning from Logical Label

Yuheng Jia, Jiawei Tang, Jiahao Jiang

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

Label distribution learning (LDL) is an effective method to predict the label description degree (a.k.a. label distribution) of a sample. However, annotating label distribution (LD) for training samples is extremely costly. So recent studies often first use label enhancement (LE) to generate the estimated label distribution from the logical label and then apply external LDL algorithms on the recovered label distribution to predict the label distribution for unseen samples. But this step-wise manner overlooks the possible connections between LE and LDL. Moreover, the existing LE approaches may assign some description degrees to invalid labels. To solve the above problems, we propose a novel method to learn an LDL model directly from the logical label, which unifies LE and LDL into a joint model, and avoids the drawbacks of the previous LE methods. We also give the generalization error bound of our method and theoretically prove that directly learning an LDL model from the logical labels is feasible. Extensive experiments on various datasets prove that the proposed approach can construct a reliable LDL model directly from the logical label, and produce more accurate label distribution than the state-of-the-art LE methods. The code and the supplementary file can be found in https://github.com/seutjw/DLDL.

List of keywords

Machine Learning -> ML: Multi-label learning
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Machine Learning -> ML: Optimization

3319

Optimizing Prosumer Policies in Periodic Double Auctions Inspired by Equilibrium Analysis

Bharat Manvi, Sanjay Chandlekar, Easwar Subramanian

6 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

We consider a periodic double auction (PDA) wherein the main participants are wholesale suppliers and brokers representing retailers. The suppliers are represented by a composite supply curve and the brokers are represented by individual bids. Additionally, the brokers can also participate in small-scale selling by placing individual asks; hence, they act as prosumers. Specifically, in a PDA, the prosumers who are net buyers have multiple opportunities to buy or sell multiple units of a commodity with the aim of minimising the cost of buying across multiple rounds of the PDA. Formulating optimal bidding strategies for such a PDA setting involves planning across current and future rounds while taking into account the bidding strategies of other agents. In this work, we propose Markov perfect Nash equilibrium (MPNE) policies for a setup where multiple prosumers with knowledge of the composite supply curve compete to procure commodities. Thereafter, the MPNE policies are used to develop an algorithm called MPNE-BBS for the case wherein the prosumers need to re-construct an approximate composite supply curve using past auction information. The efficacy of the proposed algorithm is demonstrated on the PowerTAC wholesale market simulator against several baselines and state-of-the-art bidding policies.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

3333

Dynamic Weighted Graph Fusion for Deep Multi-View Clustering

Yazhou Ren, Jingyu Pu, Chenhang Cui, Yan Zheng, Xinyue Chen, Xiaorong Pu, Lifang He

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

By exploring complex graph information hidden in data from multiple views, multi-view clustering based on graph neural network significantly enhances the clustering performance and has drawn increasing attention in recent years. Although considerable progress has been made, most existing GNN based MVC models merely consider the explicit presence of graph structure in raw data and ignore that latent graphs of different views also provide specific information for the clustering task. We propose dynamic weighted graph fusion for deep multi-view clustering (DFMVC) to address this issue. Specifically, DFMVC learns embedded features via deep autoencoders and then constructs latent graphs for each individual view. Then, it concatenates the embedded features of all views to form a global feature to leverage complementary information, as well as generates a fusion graph via combining all latent graphs to accurately capture the topological information among samples. Based on the informative fusion graph and global features, the graph convolution module is adopted to derive a representation with global comprehensive information, which is further used to generate pseudo-label information. In a self-supervised manner, such information guides each view to dynamically learn discriminative features and latent graphs. Extensive experimental results demonstrate the efficacy of DFMVC.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering

3339

Dual Semantic Fusion Hashing for Multi-Label Cross-Modal Retrieval

Kaiming Liu, Yunhong Gong, Yu Cao, Zhenwen Ren, Dezhong Peng, Yuan Sun

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Cross-modal hashing (CMH) has been widely used for multi-modal retrieval tasks due to its low storage cost and fast query speed. Although existing CMH methods achieve promising performance, most of them mainly rely on coarse-grained supervision information (\ie pairwise similarity matrix) to measure the semantic similarities between all instances, ignoring the impact of multi-label distribution. To address this issue, we construct fine-grained semantic similarity to explore the cluster-level semantic relationships between multi-label data, and propose a new dual semantic fusion hashing (DSFH) for multi-label cross-modal retrieval. Specifically, we first learn the modal-specific representation and consensus hash codes, thereby merging the specificity with consistency. Then, we fuse the coarse-grained and fine-grained semantics to mine multiple-level semantic relationships, thereby enhancing hash codes discrimination. Extensive experiments on three benchmarks demonstrate the superior performance of our DSFH compared with 16 state-of-the-art methods.

List of keywords

Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Multi-view learning

3347

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

6 min. talk | August 6th at 11:30 | Session: CV: Video analysis and understanding

[+] More

[-] Less

There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the Guide (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with Guide and perform in-depth analysis. Given the diversity and practicality of Guide, we believe that it can be used as a better benchmark for instructional video comprehension.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Natural Language Processing -> NLP: Resources and evaluation

3355

DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

6 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, crafting public prompts for semantic diversity, and the risk of prompt leakage during frequent transmission. To address these issues, we propose DiffStega, an innovative training-free diffusion-based CIS strategy for universal application. DiffStega uses a password-dependent reference image as an image prompt alongside the text, ensuring that only authorized parties can retrieve the hidden information. Furthermore, we develop Noise Flip technique to further secure the steganography against unauthorized decryption. To comprehensively assess our method across general CIS tasks, we create a dataset comprising various image steganography instances. Experiments indicate substantial improvements in our method over existing ones, particularly in aspects of versatility, password sensitivity, and recovery quality. Codes are available at https://github.com/evtricks/DiffStega.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning

3356

Computational Complexity of Verifying the Group No-show Paradox

Farhad Mohsin, Qishen Han, Sikai Ruan, Pin-Yu Chen, Francesca Rossi, Lirong Xia

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

The (group) no-show paradox refers to the undesirable situation where a group of agents have incentive to abstain from voting to make the winner more favorable to them. To understand whether it is a critical concern in practice, in this paper, we take a computational approach by examining the computational complexity of verifying whether the group no-show paradox exists given agents’ preferences and the voting rule. We prove that, unfortunately, the verification problem is NP-hard to compute for some commonly studied voting rules, i.e., Copeland, maximin, single transferable vote, and all Condorcetified positional scoring rules such as Black’s rule. We propose integer linear programming-based algorithms and a search-based algorithm for the verification problem for different voting rules. Experimental results on synthetic data illustrate that the former is efficient when the number of unique rankings in a profile is not too high, and the latter is efficient for a small number of agents. With the help of these algorithms, we observe that group no-show paradoxes rarely occur in real-world data.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

3359

KG-CoT: Chain-of-Thought Prompting of Large Language Models over Knowledge Graphs for Knowledge-Aware Question Answering

Ruilin Zhao, Feng Zhao, Long Wang, Xianzhi Wang, Guandong Xu

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

Large language models (LLMs) encounter challenges such as hallucination and factual errors in knowledge-intensive tasks. One the one hand, LLMs sometimes struggle to generate reliable answers based on the black-box parametric knowledge, due to the lack of responsible knowledge. Moreover, fragmented knowledge facts extracted by knowledge retrievers fail to provide explicit and coherent reasoning paths for improving LLM reasoning. To address these challenges, we propose KG-CoT, a novel knowledge-augmented paradigm that leverages a small-scale step-by-step graph reasoning model to reason over knowledge graphs (KGs) and utilizes a reasoning path generation method to generate chains of reasoning with high confidence for large-scale LLMs. Extensive experiments demonstrate that our KG-CoT significantly improves the performance of LLMs on knowledge-intensive question answering tasks, such as multi-hop, single-hop, and open-domain question answering benchmarks, without fine-tuning LLMs. KG-CoT outperforms the CoT prompting as well as prior retrieval-augmented and knowledge base question answering baselines. Moreover, KG-CoT can reduce the number of API calls and cost and generalize to various LLM backbones in a lightweight plug-and-play manner.

List of keywords

Natural Language Processing -> NLP: Question answering
Natural Language Processing -> NLP: Language generation

3360

Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

Xuyang Chen, Jingliang Duan, Lin Zhao

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action- space only. We push the boundary to investigate the classic single-sample single-timescale actor-critic on continuous (infinite) state-action space, where we employ the canonical linear quadratic regulator (LQR) problem as a case study. We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity for solving LQR on the demanding continuous state-action space. Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Learning theory

3364

Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

Tong Xiao, Jiayu Liu, Zhenya Huang, Jinze Wu, Jing Sha, Shijin Wang, Enhong Chen

6 min. talk | August 9th at 11:30 | Session: NLP: Natural Language Processing (3/3)

[+] More

[-] Less

Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of encoders, which neglects the essential characteristics of human geometry reasoning. In this paper, inspired by dual-process theory, we propose a Dual-Reasoning Geometry Solver (DualGeoSolver) to simulate the dual-reasoning process of humans for GPS. Specifically, we construct two systems in DualGeoSolver, namely Knowledge System and Inference System. Knowledge System controls an implicit reasoning process, which is responsible for providing diagram information and geometry knowledge according to a step-wise reasoning goal generated by Inference System. Inference System conducts an explicit reasoning process, which specifies the goal in each reasoning step and applies the knowledge to generate program tokens for resolving it. The two systems carry out the above process iteratively, which behaves more in line with human cognition. We conduct extensive experiments on two benchmark datasets, GeoQA and GeoQA+. The results demonstrate the superiority of DualGeoSolver in both solving accuracy and robustness from explicitly modeling human reasoning process and knowledge application.

List of keywords

Natural Language Processing -> NLP: Question answering
Knowledge Representation and Reasoning -> KRR: Learning and reasoning

3379

BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data

Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai

6 min. talk | August 9th at 11:30 | Session: ML: Generative models

[+] More

[-] Less

Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOTA matrix completion methods appeared less effective, primarily due to overreliance on neighboring elements for predictions. We systematically analyze the issue and propose a novel matrix completion method "BlockEcho" for a more comprehensive solution. This method creatively integrates Matrix Factorization (MF) within Generative Adversarial Networks (GAN) to explicitly retain long-distance inter-element relationships in the original matrix. Besides, we incorporate an additional discriminator for GAN, comparing the generator’s intermediate progress with pre-trained MF results to constrain high-order feature distributions. Subsequently, we evaluate BlockEcho on public datasets across three domains. Results demonstrate superior performance over both traditional and SOTA methods when imputing block-wise missing data, especially at higher missing rates. The advantage also holds for scattered missing data at high missing rates. We also contribute on the analyses in providing theoretical justification on the optimality and convergence of fusing MF and GAN for missing block data.

List of keywords

Machine Learning -> ML: Generative models
Data Mining -> DM: Other

3383

A Grassmannian Manifold Self-Attention Network for Signal Classification

Rui Wang, Chen Hu, Ziheng Chen, Xiao-Jun Wu, Xiaoning Song

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

In the community of artificial intelligence, significant progress has been made in encoding sequential data using deep learning techniques. Nevertheless, how to effectively mine useful information from channel dimensions remains a major challenge, as these features have a submanifold structure. Linear subspace, the basic element of the Grassmannian manifold, has proven to be an effective manifold-valued feature descriptor in statistical representation. Besides, the Euclidean self-attention mechanism has shown great success in capturing long-range relationships of data. Inspired by these facts, we extend the self-attention mechanism to the Grassmannian manifold. Our framework can effectively characterize the spatiotemporal fluctuations of sequential data encoded in the Grassmannian manifold. Extensive experimental results on three benchmarking datasets (a drone recognition dataset and two EEG signal classification datasets) demonstrate the superiority of our method over the state-of-the-art. The code and supplementary material for this work can be found at https://github.com/ChenHu-ML/GDLNet.

List of keywords

Machine Learning -> ML: Attention models
Machine Learning -> ML: Classification
Machine Learning -> ML: Geometric learning

3384

Fair Distribution of Delivery Orders

Hadi Hosseini, Shivika Narang, Tomasz Wąs

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We initiate the study of fair distribution of delivery tasks among a set of agents wherein delivery jobs are placed along the vertices of a graph. Our goal is to fairly distribute delivery costs (modeled as a submodular function) among a fixed set of agents while satisfying some desirable notions of economic efficiency. We adopt well-established fairness concepts—such as envy-freeness up to one item (EF1) and minimax share (MMS)—to our setting and show that fairness is often incompatible with the efficiency notion of social optimality. Yet, we characterize instances that admit fair and socially optimal solutions by exploiting graph structures. We further show that achieving fairness along with Pareto optimality is computationally intractable. Nonetheless, we design an XP algorithm (parameterized by the number of agents) for finding MMS and Pareto optimal solutions on every tree instance, and show that the same algorithm can be modified to find efficient solutions along with EF1, when such solutions exist. We complement these results by theoretically and experimentally analyzing the price of fairness.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

3391

Unified View Imputation and Feature Selection Learning for Incomplete Multi-view Data

Yanyong Huang, Zongxin Shen, Tianrui Li, Fengmao Lv

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating imputation and feature selection processes fails to capitalize on the potential synergy where local structural information gleaned from feature selection could guide the imputation, thereby improving the feature selection performance in turn. Additionally, previous methods only focus on leveraging samples’ local structure information, while ignoring the intrinsic locality of the feature space. To tackle these problems, a novel MUFS method, called UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed. UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces. Then, UNIFIER dynamically recovers the missing views, guided by the sample and feature similarity graphs during the feature selection procedure. Furthermore, the half-quadratic minimization technique is used to automatically weight different instances, alleviating the impact of outliers and unreliable restored data. Comprehensive experimental results demonstrate that UNIFIER outperforms other state-of-the-art methods.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Machine Learning -> ML: Unsupervised learning

3406

SaSDim:Self-Adaptive Noise Scaling Diffusion Model for Spatial Time Series Imputation

Shunyang Zhang, Senzhang Wang, Xianzhen Tan, Renzhi Wang, Ruochen Liu, Jian Zhang, Jianxin Wang

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Spatial time series imputation is of great importance to various real-world applications. As the state-of-the-art generative models, diffusion models (e.g. CSDI) have outperformed statistical and autoregressive based models in time series imputation. However, diffusion models may introduce unstable noise owing to the inherent uncertainty in sampling, leading to the generated noise deviating from the intended Gaussian distribution. Consequently, the imputed data may deviate from the real data. To this end, we propose a Self-adaptive noise Scaling Diffusion Model named SaSDim for spatial time series imputation. Specifically, we introduce a novel Probabilistic High-Order SDE Solver Module to scale the noise following the standard Gaussian distribution. The noise scaling operation helps the noise prediction module of the diffusion model to more accurately estimate the variance of noise. To effectively learn the spatial and temporal features, a Spatial guided Global Convolution Module (SgGConv) for multi-periodic temporal dependencies learning with the Fast Fourier Transformation and dynamic spatial dependencies learning with dynamic graph convolution is also proposed. Extensive experiments conducted on three real-world spatial time series datasets verify the effectiveness of SaSDim.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

3412

vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement

Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations — policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.

List of keywords

Machine Learning -> ML: Reinforcement learning

3415

Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval

Xiaobo Shen, Qianxin Huang, Long Lan, Yuhui Zheng

6 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

As video-based social networks continue to grow exponentially, there is a rising interest in video retrieval using natural language. Cross-modal hashing, which learns compact hash code for encoding multi-modal data, has proven to be widely effective in large-scale cross-modal retrieval, e.g., image-text retrieval, primarily due to its computation and storage efficiency. However, when applied to video-text retrieval, existing cross-modal hashing methods generally extract features at the frame- or word-level for videos and texts individually, thereby ignoring their long-term dependencies. To address this issue, we propose Contrastive Transformer Cross-Modal Hashing (CTCH), a novel approach designed for video-text retrieval task. CTCH employs bidirectional transformer encoder to encode video and text and leverages their long-term dependencies. CTCH further introduces supervised multi-modality contrastive loss that effectively exploits inter-modality and intra-modality similarities among videos and texts. The experimental results on three video benchmark datasets demonstrate that CTCH outperforms the state-of-the-arts in video-text retrieval tasks.

List of keywords

Computer Vision -> CV: Image and video retrieval
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Multi-view learning

3435

Unsupervised Deep Graph Structure and Embedding Learning

Xiaobo Shen, Lei Shi, Xiuwen Gong, Shirui Pan

6 min. talk | August 9th at 11:30 | Session: DM: Mining graphs (3/3)

[+] More

[-] Less

Graph Neural Network (GNN) is powerful in graph embedding learning, but its performance has been shown to be heavily degraded under adversarial attacks. Deep graph structure learning (GSL) is proposed to defend attack by jointly learning graph structure and graph embedding, typically in node classification task. Label supervision is expensive in real-world applications, and thus unsupervised GSL is more challenging and still remains less studied. To fulfill this gap, this paper proposes a new unsupervised GSL method, i.e., unsupervised property GNN (UPGNN). UPGNN first refines graph structure by exploring properties of low rank, sparsity, feature smoothness. UPGNN employs graph mutual information loss to learn graph embedding by maximizing its correlation with refined graph. The proposed UPGNN learns graph structure and embedding without label supervision, and thus can be applied various downstream tasks. We further propose Accelerated UPGNN (AUPGNN) to reduce computational complexity, providing a efficient alternative to UPGNN. Our extensive experiments on node classification and clustering demonstrate the effectiveness of the proposed method over the state-of-the-arts especially under heavy perturbation.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Unsupervised learning

3450

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

Wei Duan, Jie Lu, Junyu Xuan

6 min. talk | August 6th at 11:30 | Session: ML: Multiagent Reinforcement Learning

[+] More

[-] Less

Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG’s superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.

List of keywords

Machine Learning -> ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

3452

Make Bricks with a Little Straw: Large-Scale Spatio-Temporal Graph Learning with Restricted GPU-Memory Capacity

Binwu Wang, Pengkun Wang, Zhengyang Zhou, Zhe Zhao, Wei Xu, Yang Wang

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

Traffic prediction plays a key role in various smart city applications, which can help traffic managers make traffic plans in advance, assist online ride-hailing companies in deploying vehicles reasonably, and provide early warning of congestion for safety authorities. While increasingly complex models achieve impressive prediction performance, there are concerns about the effectiveness of these models in handling large-scale road networks. Especially for researchers who don’t have access to powerful GPU devices, the expensive memory burden limits the usefulness of these models. In this paper, we take the first step of learning on the large-scale spatio-temporal graph and propose a divide-and-conquer training strategy for Large Spatio-Temporal Graph Learning, namely LarSTL. The core idea behind this strategy is to divide the large graph into multiple subgraphs, which are treated as task streams to sequentially train the model to conquer each subgraph one by one. We introduce a novel perspective based on the continuous learning paradigm to achieve this goal. In order to overcome forgetting the knowledge learned from previous subgraphs, an experience-replay strategy consolidates the learned knowledge by replaying nodes sampled from previous subgraphs. Moreover, we configure specific feature adaptors for each subgraph to extract personalized features, and it is also beneficial to consolidate the learned knowledge from the perspective of parameters. We conduct experiments using multiple large-scale traffic network datasets on a V100 GPU with only 16GB memory, and the results demonstrate that our LarSTL can achieve competitive performance and high efficiency.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Mining graphs

3466

Active Deep Multi-view Clustering

Helin Zhao, Wei Chen, Peng Zhou

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Deep multi-view clustering has been widely studied. However, since it is an unsupervised task, where no labels are used to guide the training, it is still unreliable especially when handling complicated data. Although deep semi-supervised multi-view clustering can alleviate this problem by using some supervised information, the supervised information is often pregiven or randomly selected. Unfortunately, as we know, the clustering performance highly depends on the quality of the supervised information and most of the semi-supervised methods ignore the supervised information selection. To tackle this problem, in this paper, we propose a novel active deep multi-view clustering method, which can actively select important data for querying human annotations. In this method, we carefully design a fusion module, an active selection module, a supervised module, and an unsupervised module, and integrate them into a unified framework seamlessly. In this framework, we can obtain a more reliable clustering result with as few annotations as possible. The extensive experiments on benchmark data sets show that our method can outperform state-of-the-art unsupervised and semi-supervised methods, demonstrating the effectiveness and superiority of the proposed method. The code is available at https://github.com/wodedazhuozi/ADMC .

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Active learning
Machine Learning -> ML: Multi-modal learning

3473

Decoupled Invariant Attention Network for Multivariate Time-series Forecasting

Haihua Xu, Wei Fan, Kun Yi, Pengyang Wang

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

To achieve more accurate prediction results in Time Series Forecasting (TSF), it is essential to distinguish between the valuable patterns (invariant patterns) of the spatial-temporal relationship and the patterns that are prone to generate distribution shift (variant patterns), then combine them for forecasting.The existing works, such as transformer-based models and GNN-based models, focus on capturing main forecasting dependencies whether it is stable or not, and they tend to overlook patterns that carry both useful information and distribution shift. In this paper, we propose a model for better forecasting time series: Decoupled Invariant Attention Network (DIAN), which contains two modules to learn spatial and temporal relationships respectively: 1) Spatial Decoupled Invariant-Variant Learning (SDIVL) to decouple the spatial invariant and variant attention scores, and then leverage convolutional networks to effectively integrate them for subsequent layers; 2) Temporal Augmented Invariant-Variant Learning (TAIVL) to decouple temporal invariant and variant patterns and combine them for further forecasting.In this module, we also design Temporal Intervention Mechanism to create multiple intervened samples by reassembling variant patterns across time stamps to eliminate the spurious impacts of variant patterns.In addition, we propose Joint Optimization to minimize the loss function considering all invariant patterns, variant patterns and intervened patterns so that our model can gain a more stable predictive ability.Extensive experiments on five datasets have demonstrated our superior performance with higher efficiency compared with state-of-the-art methods.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

3481

DGR: A General Graph Desmoothing Framework for Recommendation via Global and Local Perspectives

Leilei Ding, Dazhong Shen, Chao Wang, Tianfu Wang, Le Zhang, Yanyong Zhang

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Graph Convolutional Networks (GCNs) have become pivotal in recommendation systems for learning user and item embeddings by leveraging the user-item interaction graph’s node information and topology. However, these models often face the famous over-smoothing issue, leading to indistinct user and item embeddings and reduced personalization. Traditional desmoothing methods in GCN-based systems are model-specific, lacking a universal solution. This paper introduces a novel, model-agnostic approach named Desmoothing Framework for GCN-based Recommendation Systems (DGR). It effectively addresses over-smoothing on general GCN-based recommendation models by considering both global and local perspectives. Specifically, we first introduce vector perturbations during each message passing layer to penalize the tendency of node embeddings approximating overly to be similar with the guidance of the global topological structure. Meanwhile, we further develop a tailored-design loss term for the readout embeddings to preserve the local collaborative relations between users and their neighboring items. In particular, items that exhibit a high correlation with neighboring items are also incorporated to enhance the local topological information. To validate our approach, we conduct extensive experiments on 5 benchmark datasets based on 5 well-known GCN-based recommendation models, demonstrating the effectiveness and generalization of our proposed framework. Our code is available at https://github.com/me-sonandme/DGR.

List of keywords

Data Mining -> DM: Collaborative filtering
Data Mining -> DM: Recommender systems

3488

Denoising-Aware Contrastive Learning for Noisy Time Series

Shuang Zhou, Daochen Zha, Xiao Shen, Xiao Huang, Rui Zhang, Korris Chung

6 min. talk | August 7th at 11:30 | Session: ML: Self-supervised Learning

[+] More

[-] Less

Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods before model training. However, this pre-processing approach may not fully eliminate the effect of noise in SSL for two reasons: (i) the diverse types of noise in time series make it difficult to automatically determine suitable denoising methods; (ii) noise can be amplified after mapping raw data into latent space. In this paper, we propose denoising-aware contrastive learning (DECL), which uses contrastive learning objectives to mitigate the noise in the representation and automatically selects suitable denoising methods for every sample. Extensive experiments on various datasets verify the effectiveness of our method. The code is open-sourced.

List of keywords

Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Time series and data streams

3496

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generalize to the target language. However, these automatic labeling procedures inevitably introduce noisy labels, thus leading to a performance drop. In this paper, we propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER. Specifically, GLoDe introduces a progressive denoising strategy to rectify incorrect pseudo labels by leveraging both global and local distribution information in the semantic space. The refined pseudo-labeled target language data significantly improves the model’s generalization ability. Moreover, previous methods only consider improving the model with language-agnostic features, however, we argue that target language-specific features are also important and should never be ignored. To this end, we employ a simple auxiliary task to achieve this goal. Experimental results on two benchmark datasets with six target languages demonstrate that our proposed GLoDe significantly outperforms current state-of-the-art methods.

List of keywords

Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Information extraction

3498

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Yonghao Yu, Shunan Zhu, Huai Qin, Haorui Li

6 min. talk | August 9th at 11:30 | Session: ML: Generative models

[+] More

[-] Less

Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement. Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.

List of keywords

Machine Learning -> ML: Generative models
Computer Vision -> CV: 3D computer vision
Multidisciplinary Topics and Applications -> MTA: Arts and creativity

3519

M2Beats: When Motion Meets Beats in Short-form Videos

Dongxiang Jiang, Yongchang Zhang, Shuai He, Anlong Ming

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

In recent years, short-form videos have gained popularity and the editing of these videos, particularly when motion is synchronized with music, is highly favored due to its beat-matching effect. However, detecting motion rhythm poses a significant challenge as it is influenced by multiple factors that make it difficult to define using explicit rules. While traditional methods attempt to define motion rhythm, they often yield unsatisfactory results. On the other hand, learning-based methods can extract motion rhythm without relying on explicit rules but require high-quality datasets. Unfortunately, existing datasets simply substitute music rhythm for motion rhythm which are not equivalent. To address these challenges, we present the motion rhythm dataset AIST-M2B, which is annotated with meticulously curated motion rhythm labels derived from the profound correlation between motion and music in professional dance. We propose a novel network architecture called M2BNet that is specifically trained on AIST-M2B to effectively extract intricate motion rhythms by incorporating both human body structure and temporal information. Additionally, we introduce a pioneering algorithm for enhancing motion rhythm synchronization with beats. Experimental results substan- tiate the superior performance of our method compared to other existing algorithms in the domain of motion rhythm analysis. Our code is available at https://github.com/mRobotit/M2Beats.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Motion and tracking

3521

Advancing Medical Image Segmentation via Self-supervised Instance-adaptive Prototype Learning

Guoyan Liang, Qin Zhou, Jingyuan Chen, Zhe Wang, Chang Yao

6 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Medical Image Segmentation (MIS) plays a crucial role in medical therapy planning and robot navigation. Prototype learning methods in MIS focus on generating segmentation masks through pixel-to-prototype comparison. However, current approaches often overlook sample diversity by using a fixed prototype per semantic class and neglect intra-class variation within each input. In this paper, we propose to generate instance-adaptive prototypes for MIS, which integrates a common prototype proposal (CPP) capturing common visual patterns and an instance-specific prototype proposal (IPP) tailored to each input. To further account for the intra-class variation, we propose to guide the IPP generation by re-weighting the intermediate feature map according to their confidence scores. These confidence scores are hierarchically generated using a transformer decoder. Additionally we introduce a novel self-supervised filtering strategy to prioritize the foreground pixels during the training of the transformer decoder. Extensive experiments demonstrate favorable performance of our method.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Self-supervised Learning

3525

Spatial-Temporal Perceiving: Deciphering User Hierarchical Intent in Session-Based Recommendation

Xiao Wang, Tingting Dai, Qiao Liu, Shuang Liang

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

Session-based recommendation (SBR) aims to predict the next-interacted item based on anonymous users’ behavior sequences. The main challenge is how to recognize the user intent with limited interactions to achieve a more accurate inference of user behavior. Existing works usually regard several consecutive items in the current session as intent. However, we argue such intent generation based on temporal transition ignores the fact that each item also has its semantically connected items in the feature space, which can be regarded as spatial intent. The limited consideration of intent fails to capture complex behavioral patterns in real-world scenarios, leading to sub-optimal solutions. To address this issue, we propose the Hierarchical Intent Perceiving Contrastive Learning Framework (HearInt) for SBR, which proposes a hierarchical consideration of intents from both temporal and spatial perspective. Specifically, we first propose that the user’s temporal intents are mutually exclusive while the spatial intents are mutually compatible. Following these analyses, we design a Temporal Intent Decoupling module to mitigate the mutual influence of long-term and short-term intents, and a Cross-scale Contrastive Learning task to enhance the consistency of intents across different spatial scales. Experimental results on three real-world datasets exhibit that HearInt achieves state-of-the-art performance.

List of keywords

Data Mining -> DM: Recommender systems

3528

Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning

Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang

12 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to decipher the trajectory representation learning process from a causal perspective. Building upon the SCM, we further present a Trajectory modeling framework (TrajCL) based on Causal Learning, which leverages the backdoor adjustment theory as an intervention tool to eliminate the spurious correlations between geospatial context and trajectories. Extensive experiments on two real-world datasets verify that TrajCL markedly enhances performance in trajectory classification tasks while showcasing superior generalization and interpretability.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Multidisciplinary Topics and Applications -> MTA: Transportation

3538

Practical Hybrid Gradient Compression for Federated Learning Systems

Sixu Hu, Linshan Jiang, Bingsheng He

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

The high communication cost is a major challenge in the federated learning (FL) training process. Several methods have been proposed to reduce communication costs on the uplink channel, primarily sparsification-based methods, which have overlooked the impact of downlink channels. However, model accuracy and communication cost issues arise when applying them in practical FL applications, especially when the bandwidth is limited both on the uplink and downlink channels. In this paper, we propose a novel secure-FL-compatible hybrid gradient compression framework (HGC) that handles both uplink and downlink communication. Specifically, HGC identifies and exploits three types of redundancies in the FL training process. With proposed optimization methods based on compression ratio correction and dynamic momentum correction, HGC improves the trade-off between communication cost and model performance. The extensive theoretical and empirical analysis demonstrates the effectiveness of our framework in achieving a high compression ratio for both uplink and downlink communications with negligible loss of model accuracy, surpassing the state-of-the-art compression methods.

List of keywords

Machine Learning -> ML: Federated learning

3549

Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology. While achieving promising results, current traffic speed prediction methods still suffer from ignoring topology-free patterns, which cannot be captured by GNNs. To tackle this challenge, we propose a generic model for enabling the current GNN-based methods to preserve topology-free patterns. Specifically, we first develop a Dual Cross-Scale Transformer (DCST) architecture, including a Spatial Transformer and a Temporal Transformer, to preserve the cross-scale topology-free patterns and associated dynamics, respectively. Then, to further integrate both topology-regularized/-free patterns, we propose a distillation-style learning framework, in which the existing GNN-based methods are considered as the teacher model, and the proposed DCST architecture is considered as the student model. The teacher model would inject the learned topology-regularized patterns into the student model for integrating topology-free patterns. The extensive experimental results demonstrated the effectiveness of our methods.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

3577

AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models

Haiyang Zhang, Mengchao Wang, Shuai He, Anlong Ming

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

Current text-to-image synthesis (TIS) models have demonstrated the ability to generate high-fidelity images based on textual prompts. However, the efficacy of these models heavily relies on the keywords present in the prompts, and there is a dearth of objective analysis regarding how different keywords impact the ultimate quality of generated results. Therefore, manual evaluation becomes necessary but limited and inefficient to ascertain the role played by keywords. In this paper, we propose automated keywords-ranking for prompts (AK4Prompts), a keyword evaluation model based on mainstream TIS models that explicitly quantifies the multidimensional impact of various keywords on image generation based on prompts. To enable personalized keyword evaluation based on prompt content, we propose decoupling the latent representations of keywords and prompts in TIS models, followed by integrating the semantic features of prompts into keywords. For quantitative and multidimensional evaluation, we align the fused features of keywords using HPSv2, aesthetic score, and CLIP score, each representing distinct factors contributing to keyword impact. Our AK4Prompts can flexibly and automatically select the keywords that best match the original prompt based on individual user preferences. Extensive experimental results show the superiority of AK4Prompts to improve the quality of generated images significantly over strong baselines. Our approach not only enhances usability and user experience but also addresses the current gap in automated analysis and evaluation of keyword effects. Our code is availableat https://github.com/mRobotit/AK4Prompts.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision

3578

Label-efficient Semantic Scene Completion with Scribble Annotations

Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Applications

3582

Generalized Taxonomy-Guided Graph Neural Networks

Yu Zhou, Di Jin, Jianguo Wei, Dongxiao He, Zhizhi Yu, Weixiong Zhang

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Graph neural networks have been demonstrated to be effective analytic apparatus for mining network data. Most real-world networks are inherently hierarchical, offering unique opportunities to acquire latent, intrinsic network organizational properties by utilizing network taxonomies. The existing approaches for learning implicit hierarchical network structures focus on introducing taxonomy to graph neural networks but often run short of exploiting the rich network semantics and structural properties in the taxonomy, resulting in poor generalizability and reusability. To address these issues, we propose generalized Taxonomy-Guided Graph Neural Networks (TG-GNN) to integrate taxonomy into network representation learning. We first construct a taxonomy representation learning module that introduces the concept of ego network to propagate and aggregate rich semantic and structural information in the taxonomy. We then design a taxonomy-guided Markov mechanism, which encapsulates taxonomy knowledge in pairwise potential functions, to refine network embeddings. Extensive experiments on various real-world networks illustrate the effectiveness of TG-GNN over the state-of-the-art methods on scenarios involving incomplete taxonomies and inductive settings.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning

3586

Individual Causal Structure Learning from Population Data

Wei Chen, Xiaokai Huang, Zijian Li, Ruichu Cai, Zhiyi Huang, Zhifeng Hao

6 min. talk | August 7th at 10:00 | Session: UAI: Causality, structural causal models and causal inference

[+] More

[-] Less

Learning the causal structure of each individual plays a crucial role in neuroscience, biology, and so on. Existing methods consider data from each individual separately, which may yield inaccurate causal structure estimations in limited samples. To leverage more samples, we consider incorporating data from all individuals as population data. We observe that the variables of all individuals are influenced by the common environment variables they share. These shared environment variables can be modeled as latent variables and serve as a bridge connecting data from different individuals. In particular, we propose an Individual Linear Acyclic Model (ILAM) for each individual from population data, which models the individual’s variables as being linearly influenced by their parents, in addition to environment variables and noise terms. Theoretical analysis shows that the model is identifiable when all environment variables are non-Gaussian, or even if some are Gaussian with an adequate diversity in the variance of noises for each individual. We then develop an individual causal structures learning method based on the Share Independence Component Analysis technique. Experimental results on synthetic and real-world data demonstrate the correctness of the method even when the sample size of each individual’s data is small.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

3589

Robust Contrastive Multi-view Kernel Clustering

Peng Su, Yixi Liu, Shujian Li, Shudong Huang, Jiancheng Lv

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view kernel clustering (MKC) aims to fully reveal the consistency and complementarity of multiple views in a potential Hilbert space, thereby enhancing clustering performance. The clustering results of most MKC methods are highly sensitive to the quality of the constructed kernels, as traditional methods independently compute kernel matrices for each view without fully considering complementary information across views. In previous contrastive multi-view kernel learning, the goal was to bring cross-view instances of the same sample closer during the kernel construction process while pushing apart instances across samples to achieve a comprehensive integration of cross-view information. However, its inherent drawback is the potential inappropriate amplification of distances between different instances of the same clusters (i.e., false negative pairs) during the training process, leading to a reduction in inter-class discriminability. To address this challenge, we propose a Robust Contrastive multi-view kernel Learning approach (R-CMK) against false negative pairs. It partitions negative pairs into different intervals based on distance or similarity, and for false negative pairs, reverses their optimization gradient. This effectively avoids further amplification of distances for false negative pairs while simultaneously pushing true negative pairs farther apart. We conducted comprehensive experiments on various MKC methods to validate the effectiveness of the proposed method. The code is available at https://github.com/Duo-laimi/rcmk_main.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Kernel methods

3593

Regression Residual Reasoning with Pseudo-labeled Contrastive Learning for Uncovering Multiple Complex Compositional Relations

Chengtai Li, Yuting He, Jianfeng Ren, Ruibin Bai, Yitian Zhao, Heng Yu, Xudong Jiang

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

Abstract Visual Reasoning (AVR) has been widely studied in literature. Our study reveals that AVR models tend to rely on appearance matching rather than a genuine understanding of underlying rules. We hence develop a challenging benchmark, Multiple Complex Compositional Reasoning (MC2R), composed of diverse compositional rules on attributes with intentionally increased variations. It aims to identify two outliers from five given images, in contrast to single-answer questions in previous AVR tasks. To solve MC2R tasks, a Regression Residual Reasoning with Pseudo-labeled Contrastive Learning (R3PCL) is proposed, which first transforms the original problem by selecting three images following the same rule, and iteratively regresses one normal image by using the other two, allowing the model to gradually comprehend the underlying rules. The proposed PCL leverages a set of min-max operations to generate more reliable pseudo labels, and exploits contrastive learning with data augmentation on pseudo-labeled images to boost the discrimination and generalization of features. Experimental results on two AVR datasets show that the proposed R3PCL significantly outperforms state-of-the-art models.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning

3597

ScreenAgent: A Vision Language Model-driven Computer Control Agent

Runliang Niu, Jindong Li, Shiqi Wang, Yali Fu, Xiyu Hu, Xueyuan Leng, He Kong, Yi Chang, Qi Wang

6 min. talk | August 6th at 11:30 | Session: NLP: Dialogue and interactive systems

[+] More

[-] Less

Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphical User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing daily computer tasks. Finally, we train a model, ScreenAgent, which achieves comparable computer control capabilities to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code and more detailed information are at https://github.com/niuzaisheng/ScreenAgent.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Vision, language and reasoning
Natural Language Processing -> NLP: Resources and evaluation

3600

TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

Minsoo Khang, Teakgyu Hong

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Table Structure Recognition (TSR) is a task aimed at converting table images into a machine-readable format (e.g. HTML), to facilitate other applications such as information retrieval. Recent works tackle this problem by identifying the HTML tags and text regions, where the latter is used for text extraction from the table document. These works however, suffer from misalignment issues when mapping text into the identified text regions. In this paper, we introduce a new TSR framework, called TFLOP (TSR Framework with LayOut Pointer mechanism), which reformulates the conventional text region prediction and matching into a direct text region pointing problem. Specifically, TFLOP utilizes text region information to identify both the table’s structure tags and its aligned text regions, simultaneously. Without the need for region prediction and alignment, TFLOP circumvents the additional text region matching stage, which requires finely-calibrated post-processing. TFLOP also employs span-aware contrastive supervision to enhance the pointing mechanism in tables with complex structure. As a result, TFLOP achieves the state-of-the-art performance across multiple benchmarks such as PubTabNet, FinTabNet, and SynthTabNet. In our extensive experiments, TFLOP not only exhibits competitive performance but also shows promising results on industrial document TSR scenarios such as documents with watermarks or in non-English domain. Source code of our work is publicly available at: https://github.com/UpstageAI/TFLOP.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications
Natural Language Processing -> NLP: Applications

3601

A Bias-Free Revenue-Maximizing Bidding Strategy for Data Consumers in Auction-based Federated Learning

Xiaoli Tang, Han Yu, Zengxiang Li, Xiaoxiao Li

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Auction-based Federated Learning (AFL) is a burgeoning research area. However, existing bidding strategies for AFL data consumers (DCs) primarily focus on maximizing expected accumulated utility, disregarding the more complex goal of revenue maximization. They also only consider winning bids, leading to biased estimates by overlooking information from losing bids. To address these issues, we propose a Bias-free Revenue-maximizing Federated bidding strategy for DCs in AFL (BR-FEDBIDDER). Our theoretical exploration of the relationships between Return on Investment (ROI), bid costs, and utility, and their impact on overall revenue underscores the complexity of maximizing revenue solely by prioritizing ROI enhancement. Leveraging these insights, BR-FEDBIDDER optimizes bid costs with any given ROI constraint. In addition, we incorporate an auxiliary task of winning probability estimation into the framework to achieve bias-free learning by leveraging bid records from historical bid requests, including both winning and losing ones. Extensive experiments on six widely used benchmark datasets show that BR-FEDBIDDER outperforms eight state-of-the-art methods, surpassing the best-performing baseline by 5.66%, 6.08% and 2.44% in terms of the total revenue, ROI, and test accuracy of the resulting FL models, respectively.

List of keywords

Machine Learning -> ML: Federated learning

3643

A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints

Yifan Xia, Xiangyi Zhang

6 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

The vehicle routing problem with two-dimensional loading constraints (2L-CVRP) and the last-in-first-out (LIFO) rule presents significant practical and algorithmic challenges. While numerous heuristic approaches have been proposed to address its complexity, stemming from two NP-hard problems: the vehicle routing problem (VRP) and the two-dimensional bin packing problem (2D-BPP), less attention has been paid to developing exact algorithms. Bridging this gap, this article presents an exact algorithm that integrates advanced machine learning techniques, specifically a novel combination of attention and recurrence mechanisms. This integration accelerates the state-of-the-art exact algorithm by a median of 29.79% across various problem instances. Moreover, the proposed algorithm successfully resolves an open instance in the standard test-bed, demonstrating significant improvements brought about by the incorporation of machine learning models. Code is available at https://github.com/xyfffff/NCG-for-2L-CVRP.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Modeling
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MTA: Transportation

3654

KDDC: Knowledge-Driven Disentangled Causal Metric Learning for Pre-Travel Out-of-Town Recommendation

Yinghui Liu, Guojiang Shen, Chengyong Cui, Zhenzhen Zhao, Xiao Han, Jiaxin Du, Xiangyu Zhao, Xiangjie Kong

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

Pre-travel recommendation is developed to provide a variety of out-of-town Point-of-Interests (POIs) for users planning to travel away from their hometowns but have not yet decided on their destination. Existing out-of-town recommender systems work on constructing users’ latent preferences and inferring travel intentions from their check-in sequences. However, there are still two challenges that hamper the performance of these approaches: i) Users’ interactive data (including hometown and out-of-town check-ins) tend to be rare, and while candidate POIs that come from different regions contain various semantic information; ii) The causes for user check-in include not only interest but also conformity, which are easily entangled and overlooked. To fill these gaps, we propose a Knowledge-Driven Disentangled Causal metric learning framework (KDDC) that mitigates interaction data sparsity by enhancing POI semantic representation and considers the distributions of two causes (i.e., conformity and interest) for pre-travel recommendation. Specifically, we pretrain a constructed POI attribute knowledge graph through a segmented interaction method and POI semantic information is aggregated via relational heterogeneity. In addition, we devise a disentangled causal metric learning to model and infer userrelated representations. Extensive experiments on two real-world nationwide datasets display the consistent superiority of our KDDC over state-of-theart baselines.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Mining spatial and/or temporal data

3666

BADFSS: Backdoor Attacks on Federated Self-Supervised Learning

Jiale Zhang, Chengcheng Zhu, Di Wu, Xiaobing Sun, Jianming Yong, Guodong Long

6 min. talk | August 7th at 10:00 | Session: ETF: Safety and robustness

[+] More

[-] Less

Self-supervised learning (SSL) is capable of learning remarkable representations from centrally available data. Recent works further implement federated learning with SSL to learn from rapidly growing decentralized unlabeled images (e.g., from cameras and phones), often resulting from privacy constraints. Extensive attention has been paid to designing new frameworks or methods that achieve better performance for the SSL-based FL. However, such an effort has not yet taken the security of SSL-based FL into consideration. We aim to explore backdoor attacks in the context of SSL-based FL via an in-depth empirical study. In this paper, we propose a novel backdoor attack BADFSS against SSL-based FL. First, BADFSS learns a backdoored encoder via supervised contrastive learning on poison datasets constructed based on local datasets. Then, BADFSS employs attention alignment to enhance the backdoor effect and maintain the consistency between backdoored and global encoders. Moreover, we perform empirical evaluations of the proposed backdoor attacks on four datasets and compared BADFSS with three existing backdoor attacks that are transferred into federated self-supervised learning. The experiments demonstrate that BADFSS outperforms baseline methods and is effective under various settings.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Multidisciplinary Topics and Applications -> MTA: Security and privacy

3669

ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT’s potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Motion and tracking
Machine Learning -> ML: Optimization
Multidisciplinary Topics and Applications -> MTA: Security and privacy

3675

Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model. In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model’s prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior. During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes’ theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB.

List of keywords

Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Semi-supervised learning

3695

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a flexible and generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at https://github.com/GeminiLight/flag-vne.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
Machine Learning -> ML: Applications

3696

Implicit Prompt Learning for Image Denoising

Yao Lu, Bo Jiang, Guangming Lu, Bob Zhang

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

Recently, various deep denoising methods have been proposed to solve the insufficient feature problem in image denoising. These methods can be mainly classified into two categories: (1) Injecting learnable tensors into denoising backbone to supplement feature, which is effective to some extent but may cause serious over-fitting. (2) Using diverse natural images from large image datasets to synthesize noisy images and pre-train denoising models, which can bring model generalization but require large model size and expensive training costs. To address these issues, this paper proposes Implicit Prompt Learning for Image Denoising (IPLID) method to flexibly generate adaptive prompts without meticulously designing them. Specifically, we first introduce an efficient Linear Prompt (LP) block with ultra-few parameters to produce dynamic prompts for both different stages and samples in denoising procedure. We further propose an efficient Compact Feature Fusion (CFF) block to process previous multi-level prompted denoising feature to reconstruct the denoising images. Finally, to further efficiently and effectively produce satisfactory prompt and denoising performance, a Gradient Accumulation (GA) learning scheme is proposed. Experiments on multiple benchmarks showed that the proposed IPLID achieves competitive results with only 1 percent of pre-trained backbone parameters, outperforming classical denoising methods in both efficiency and quality of restored images.

List of keywords

Machine Learning -> ML: Knowledge-aided learning

3699

QFormer: An Efficient Quaternion Transformer for Image Denoising

Bo Jiang, Yao Lu, Guangming Lu, Bob Zhang

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Since Deep Convolutional Neural Networks (DCNNs) and Vision Transformer perform well in learning generalizable image priors from large-scale data, these models have been widely used in image denoising tasks. However, vanilla DCNNs and Transformer suffer from two problems. First, the vanilla DCNNs and Transformer only accumulate the output along the channel axis, ignoring the internal relationship among channels. This results in the severely inadequate color structure representation retrieved from color images. Secondly, the DCNNs or Transformer-based image denoising models usually have a large number of parameters, high computational complexity, and slow inference speed. To resolve these issues, this paper proposes a highly-efficient Quaternion Transformer (QFormer) for image denoising. Specifically, the proposed Quaternion Transformer Block (QTB) simplifies the typical Transformer from a multi-branch structure to an elaborately sequential structure mainly with quaternion transformations, to alternately capture both long-range dependencies and local contextual features with color structure information. Furthermore, the proposed QTB can also avoid considerable element-wise multiplications of computing the self-attention matrices. Thus, our QTB can significantly reduce the computational complexity and its sequential structure can further improve the practical inference speed. Comprehensive experiments demonstrate that the proposed QFormer produces state-of-the-art results in both denoising performance and efficiency. We hope that our work will encourage further research to explore the Quaternion Transformer architecture for image denoising tasks.

List of keywords

Machine Learning -> ML: Knowledge-aided learning

3718

Zero-shot Learning for Preclinical Drug Screening

Kun Li, Weiwei Liu, Yong Luo, Xiantao Cai, Jia Wu, Wenbin Hu

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. The results of experiments on two large drug response datasets showed that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery. The code is available at https://github.com/DrugD/MSDA.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Knowledge graphs and knowledge base completion
Multidisciplinary Topics and Applications -> MTA: Bioinformatics

3743

Correct and Optimal: The Regular Expression Inference Challenge

Mojtaba Valizadeh, Philip John Gorinski, Ignacio Iacobacci, Martin Berger

12 min. talk | August 9th at 10:00 | Session: NLP: Resources and evaluation

[+] More

[-] Less

We propose regular expression inference (REI) as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program optimisation task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings P and N and a cost function cost(·), the task is to generate an expression r that accepts all strings in P and rejects all strings in N , while no other such expression r′ exists with cost(r′) < cost(r). REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI’s asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g. P or N cardinality, string lengths of examples, or the cost function); this lets us easily finetune REI-hardness; (iv) REI, with its emphasis on optimisation, is an unsolved problem for deep learning based ML. Recently, an REI solver was implemented on GPUs, using program synthesis techniques. This enabled, for the first time, fast generation of minimal regular expressions for complex REI instances. Building on this advance, we generate and publish the first large-scale datasets for REI, and devise and evaluate several initial heuristic and machine learning baselines. We invite the community to participate and explore ML methods that learn to solve REI problems. We believe that progress in REI directly translates to progress in code/language modelling.

List of keywords

Natural Language Processing -> NLP: Resources and evaluation
Machine Learning -> ML: Applications
Machine Learning -> ML: Other
Natural Language Processing -> NLP: Other

3762

Contrastive Learning Is Not Optimal for Quasiperiodic Time Series

Adrian Atienza, Jakob Bardram, Sadasivan Puthusserypady

12 min. talk | August 7th at 11:30 | Session: ML: Self-supervised Learning

[+] More

[-] Less

Despite recent advancements in Self-Supervised Learning (SSL) for Time Series analysis, a noticeable gap persists between the anticipated achievements and actual performance. While these methods have demonstrated formidable generalization capabilities with minimal labels in various domains, their effectiveness in distinguishing between different classes based on a limited number of annotated records is notably lacking. Our hypothesis attributes this bottleneck to the prevalent use of Contrastive Learning, a shared training objective in previous state-of-the-art (SOTA) methods. By mandating distinctiveness between representations for negative pairs drawn from separate records, this approach compels the model to encode unique record-based patterns but simultaneously neglects changes occurring across the entire record. To overcome this challenge, we introduce Distilled Embedding for Almost-Periodic Time Series (DEAPS) in this paper, offering a non-contrastive method tailored for quasiperiodic time series, such as electrocardiogram (ECG) data. By avoiding the use of negative pairs, we not only mitigate the model’s blindness to temporal changes but also enable the integration of a "Gradual Loss (L_gra)" function. This function guides the model to effectively capture dynamic patterns evolving throughout the record. The outcomes are promising, as DEAPS demonstrates a notable improvement of +10% over existing SOTA methods when just a few annotated records are presented to fit a Machine Learning (ML) model based on the learned representation.

List of keywords

Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Time series and data streams
Multidisciplinary Topics and Applications -> MTA: Health and medicine

3769

Are Logistic Models Really Interpretable?

Danial Dervovic, Freddy Lecue, Nicolas Marchesotti, Daniele Magazzeni

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

The demand for open and trustworthy AI models points towards widespread publishing of model weights. Consumers of these model weights must be able to act accordingly with the information provided. That said, one of the simplest AI classification models, Logistic Regression (LR), has an unwieldy interpretation of its model weights, with greater difficulties when extending LR to generalised additive models. In this work, we show via a User Study that skilled participants are unable to reliably reproduce the action of small LR models given the trained parameters. As an antidote to this, we define Linearised Additive Models (LAMs), an optimal piecewise linear approximation that augments any trained additive model equipped with a sigmoid link function, requiring no retraining. We argue that LAMs are more interpretable than logistic models — survey participants are shown to solve model reasoning tasks with LAMs much more accurately than with LR given the same information. Furthermore, we show that LAMs do not suffer from large performance penalties in terms of ROC-AUC and calibration with respect to their logistic counterparts on a broad suite of public financial modelling data.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Classification

3807

Contrastive Learning Drug Response Models from Natural Language Supervision

Kun Li, Xiuwen Gong, Jia Wu, Wenbin Hu

6 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)

[+] More

[-] Less

Deep learning-based drug response prediction (DRP) methods can accelerate the drug discovery process and reduce research and development costs. Despite their high accuracy, generating regression-aware representations remains challenging for mainstream approaches. For instance, the representations are often disordered, aggregated, and overlapping, and they fail to characterize distinct samples effectively. This results in poor representation during the DRP task, diminishing generalizability and potentially leading to substantial costs during the drug discovery. In this paper, we propose CLDR, a contrastive learning framework with natural language supervision for the DRP. The CLDR converts regression labels into text, which is merged with the drug response caption as a second sample modality instead of the traditional modes, i.e., graphs and sequences. Simultaneously, a common-sense numerical knowledge graph is introduced to improve the continuous text representation. Our framework is validated using the genomics of drug sensitivity in cancer dataset with average performance increases ranging from 7.8% to 31.4%. Furthermore, experiments demonstrate that the proposed CLDR effectively maps samples with distinct label values into a high-dimensional space. In this space, the sample representations are scattered, significantly alleviating feature overlap. The code is available at: https://github.com/DrugD/CLDR.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Knowledge graphs and knowledge base completion
Multidisciplinary Topics and Applications -> MTA: Bioinformatics

3816

MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

Mingkai Tang, Yuanhang Li, Hongji Liu, Yingbing Chen, Ming Liu, Lujia Wang

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Single-agent pathfinding (DVS). However, this paper demonstrates that the methods based on DVS cannot always obtain the optimal solution. To obtain the optimal result, we propose the Multi-Goal Conflict-Based Search (MGCBS), which is based on Decoupling the goal Safe interval visiting order search and the Single-agent pathfinding (DSS). Additionally, we present the Time-Interval-Space Forest (TIS Forest) to enhance the efficiency of MGCBS by maintaining the shortest paths from any start point at any start time step to each safe interval at the goal points. The experiment demonstrates that our method can consistently obtain optimal results and execute up to 7 times faster than the state-of-the-art method in our evaluation.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Robot planning
Robotics -> ROB: Multi-robot systems
Search -> S: Combinatorial search and optimisation

3838

Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting

Wei Fan, Kun Yi, Hangting Ye, Zhiyuan Ning, Qi Zhang, Ning An

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std.) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distribution information and would further lead to information utilization bottleneck in normalization, thus hindering forecasting performance. To address this problem, we propose to utilize the whole frequency spectrum to transform time series to make full use of data distribution from the frequency perspective. We present a deep frequency derivative learning framework, DERITS, for non-stationary time series forecasting. Specifically, DERITS is built upon a novel reversible transformation, namely Frequency Derivative Transformation (FDT) that makes signals derived in the frequency domain to acquire more stationary frequency representations. Then, we propose the Order-adaptive Fourier Convolution Network to conduct adaptive frequency filtering and learning. Furthermore, we organize DERITS as a parallel-stacked architecture for the multi-order derivation and fusion for forecasting. Finally, we conduct extensive experiments on several datasets which show the consistent superiority in both time series forecasting and shift alleviation.

List of keywords

Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data

3854

On the Power and Limitations of Examples for Description Logic Concepts

Balder ten Cate, Raoul Koudijs, Ana Ozaki

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

Labeled examples (i.e., positive and negative examples) are an attractive medium for communicating complex concepts. They are useful for deriving concept expressions (such as in concept learning, interactive concept specification, and concept refinement) as well as for illustrating concept expressions to a user or domain expert. We investigate the power of labeled examples for describing description-logic concepts. Specifically, we systematically study the existence and efficient computability of finite characterizations, i.e. finite sets of labeled examples that uniquely characterize a single concept, for a wide variety of description logics between EL and ALCQI, both without an ontology and in the presence of a DL-Lite ontology. Finite characterizations are relevant for debugging purposes, and their existence is a necessary condition for exact learnability with membership queries.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Machine Learning -> ML: Learning theory

3863

HypBO: Accelerating Black-Box Scientific Experiments Using Experts’ Hypotheses

Abdoulatif Cissé, Xenophon Evangelopoulos, Sam Carruthers, Vladimir V. Gusev, Andrew I. Cooper

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

Robotics and automation offer massive acceleration for solving intractable, multivariate scientific problems such as materials discovery, but the available search spaces can be dauntingly large. Bayesian optimization has emerged as a popular sample-efficient optimization engine, thriving in tasks where no analytic form of the target function/property is known. Here, we exploit expert human knowledge in the form of hypotheses to direct Bayesian searches more quickly to promising regions of chemical space. Previous methods have used underlying distributions derived from existing experimental measurements, which is unfeasible for new, unexplored scientific tasks. Also, such distributions cannot capture intricate hypotheses. Our proposed method uses expert human hypotheses to generate improved seed samples. Unpromising seeds are automatically discounted, while promising seeds are used to augment the surrogate model data, thus achieving better-informed sampling. This process continues in a global versus local search fashion, organized in a bilevel optimization framework. We validate the performance of our method on a range of synthetic functions and demonstrate its practical utility on a real chemical design task where the use of expert hypotheses accelerates the search performance significantly.

List of keywords

Machine Learning -> ML: Optimization
Humans and AI -> HAI: Human-AI collaboration
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Bayesian learning

3866

PoRank: A Practical Framework for Learning to Rank Policies

Pengjie Gu, Mengchen Zhao, Xu He, Yi Cai, Bo An

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

In many real-world scenarios, we need to select from a set of candidate policies before online deployment. Although existing Off-policy evaluation (OPE) methods can be used to estimate the online performance, they suffer from high variance. Fortunately, we care only about the ranking of the candidate policies, rather than their exact online rewards. Based on this, we propose a novel framework PoRank for learning to rank policies. In practice, learning to rank policies faces two main challenges: 1) generalization over the huge policy space and 2) lack of supervision signals. To overcome the first challenge, PoRank uses a Policy Comparison Transformer (PCT) for learning cross-policy representations, which capture the core discrepancies between policies and generalizes well across the whole policy space. The second challenge arises because learning to rank requires online comparisons of policies as ground-truth labels, whereas deploying policies online might be highly expensive. To overcome this, PoRank adopts a crowdsourcing based learning-to-rank (LTR) framework, where a set of OPE algorithms are employed to provide weak comparison labels. Experimental results show that PoRank not only outperforms baselines when the ground-truth labels are provided, but also achieves competitive performance when the ground-truth labels are unavailable.

List of keywords

Machine Learning -> ML: Reinforcement learning

3874

Dirichlet-based Uncertainty Quantification for Personalized Federated Learning with Improved Posterior Networks

Nikita Kotelevskii, Samuel Horváth, Karthik Nandakumar, Martin Takac, Maxim Panov

12 min. talk | August 8th at 15:00 | Session: UAI: Uncertainty in AI

[+] More

[-] Less

In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for this client. Eventually, it may perform worse for these data than the non-personalized global model trained in a federated way on the data from all the clients. This paper presents a new approach to federated learning that allows selecting a model from global and personalized ones that would perform better for a particular input point. It is achieved through a careful modeling of predictive uncertainties that helps to detect local and global in- and out-of-distribution data and use this information to select the model that is confident in a prediction. The comprehensive experimental evaluation on the popular real-world image datasets shows the superior performance of the model in the presence of out-of-distribution data while performing on par with state-of-the-art personalized federated learning algorithms in the standard scenarios.

List of keywords

Uncertainty in AI -> UAI: Bayesian networks
Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Probabilistic machine learning

3880

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

12 min. talk | August 8th at 10:00 | Session: MTA: Security and privacy

[+] More

[-] Less

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Security and privacy
Computer Vision -> CV: Biometrics, face, gesture and pose recognition

3882

It Ain’t That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

Xingcheng Xu, Zihao Pan, Haipeng Zhang, Yanqing Yang

12 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Large language models (LLMs) have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not always satisfying and the generalization problem is common for generative transformer models in general. Researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. It is observed that when training models on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably on longer, unseen cases (out-of-distribution (OOD) generalization). We bring this unexplained performance drop into attention and ask whether there is systematic OOD generalization. Towards understanding LLMs, we train various smaller language models which may share the same underlying mechanism. We discover that the strong ID generalization stems from structured representations, while behind the unsatisfying OOD performance, the models still exhibit clear learned algebraic structures. Specifically, these models map unseen OOD inputs to outputs with learned equivalence relations in the ID domain, which we call the equivalence generalization. These findings deepen our knowledge regarding the generalizability of generative models including LLMs, and provide insights into potential avenues for improvement.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Learning and reasoning

3885

Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints

Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang, Chenghu Zhou

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints. The penalty function method has recently been studied as an effective approach for handling constraints, which imposes constraints penalties on the objective to transform the constrained problem into an unconstrained one. However, it is challenging to choose appropriate penalties that balance policy performance and constraint satisfaction efficiently. In this paper, we propose a theoretically guaranteed penalty function method, Exterior Penalty Policy Optimization (EPO), with adaptive penalties generated by a Penalty Metric Network (PMN). PMN responds appropriately to varying degrees of constraint violations, enabling efficient constraint satisfaction and safe exploration. We theoretically prove that EPO consistently improves constraint satisfaction with a convergence guarantee. We propose a new surrogate function and provide worst-case constraint violation and approximation error. In practice, we propose an effective smooth penalty function, which can be easily implemented with a first-order optimizer. Extensive experiments are conducted, showing that EPO outperforms the baselines in terms of policy performance and constraint satisfaction with a stable training process, particularly on complex tasks.

List of keywords

Machine Learning -> ML: Reinforcement learning
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Machine Learning -> ML: Optimization
Machine Learning -> ML: Theory of deep learning

3891

Learning-Based Tracking-before-Detect for RF-Based Unconstrained Indoor Human Tracking

Zhi Wu, Dongheng Zhang, Zixin Shang, Yuqin Yuan, Hanqin Gong, Binquan Wang, Zhi Lu, Yadong Li, Yang Hu, Qibin Sun, Yan Chen

6 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

Existing efforts on human tracking using wireless signal are primarily focused on constrained scenarios with only a few individuals in empty spaces. However, in practical unconstrained scenarios with severe interference and attenuation, accurate multi-person tracking has been intractable. In this paper, we propose NeuralTBD, utilizing the capability of deep models and advancement of Tracking-Before-Detect (TBD) methodology to achieve accurate human tracking. TBD is a classical tracking methodology from signal processing accumulating measurement in time domain to distinguish target traces from interference, which however relies on handcrafted shape/motion models, impeding efficacy in complex indoor scenarios. To tackle this challenge, we build an end-to-end learning-based TBD framework leverages the advanced modeling capabilities of deep models to significantly enhance the performance of TBD. To evaluate NeuralTBD, we collect an RF-based tracking dataset in unconstrained scenarios, which encompasses 4 million annotated radar frames with up to 19 individuals acting in 6 different scenarios. NeuralTBD realizes a 70% improvement in performance compared to conventional TBD methods. To our knowledge, this is the first attempt dealing with RF-based unconstrained human tracking. The code and dataset will be released.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Sensor networks and smart cities
Multidisciplinary Topics and Applications -> MTA: Ubiquitous computing cystems

3894

Dual Contrastive Graph-Level Clustering with Multiple Cluster Perspectives Alignment

Jinyu Cai, Yunhe Zhang, Jicong Fan, Yali Du, Wenzhong Guo

6 min. talk | August 8th at 11:30 | Session: ML: Unsupervised learning

[+] More

[-] Less

Graph-level clustering, which is essential in medical, biomedical, and social network data analysis, aims to group a set of graphs into various clusters. However, existing methods generally rely on a single clustering criterion, e.g., $k$-means, which limits their abilities to fully exploit the complex Euclidean and structural information inherent in graphs. To bridge this gap, we propose a dual contrastive graph-level clustering (DCGLC) method in this paper. DCGLC leverages graph contrastive learning and introduces the Euclidian-based and subspace-based cluster heads to capture the cluster information from different cluster perspectives. To overcome the inconsistency estimations and fuse the cluster information of multiple cluster heads, we propose a contrastive mechanism to align the cluster information derived from them. The cluster-perspective contrast facilitates the capture of more comprehensive cluster information. Importantly, DCGLC is an end-to-end framework in which graph contrastive learning and cluster-perspective contrast are mutually improved. We demonstrate the superiority of DCGLC over the state-of-the-art baselines on numerous graph benchmarks.

List of keywords

Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Clustering

3909

Alleviating Imbalanced Pseudo-label Distribution: Self-Supervised Multi-Source Domain Adaptation with Label-specific Confidence

Shuai Lü, Meng Kang, Ximing Li

6 min. talk | August 9th at 10:00 | Session: ML: Multi-task and transfer learning

[+] More

[-] Less

The existing self-supervised Multi-Source Domain Adaptation (MSDA) methods often suffer an imbalanced characteristic among the distribution of pseudo-labels. Such imbalanced characteristic results in many labels with too many or too few pseudo-labeled samples on the target domain, referred to as easy-to-learn label and hard-to-learn label, respectively. Both of these labels hurt the generalization performance on the target domain. To alleviate this problem, in this paper we propose a novel multi-source domain adaptation method, namely Self-Supervised multi-Source Domain Adaptation with Label-specific Confidence (S3DA-LC). Specifically, we estimate the label-specific confidences, i.e., the learning difficulties of labels, and adopt them to generate the pseudo-labels for target samples, enabling to simultaneously constrain and enrich the pseudo supervised signals for easy-to-learn and hard-to-learn labels. We evaluate S3DA-LC on several benchmark datasets, indicating its superior performance compared with the existing MSDA baselines.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Self-supervised Learning

3918

Prompt-enhanced Network for Hateful Meme Classification

Junxi Liu, Yanyan Feng, Jiehai Chen, Yun Xue, Fenghuan Li

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

The dynamic expansion of social media has led to an inundation of hateful memes on media platforms, accentuating the growing need for efficient identification and removal. Acknowledging the constraints of conventional multimodal hateful meme classification, which heavily depends on external knowledge and poses the risk of including irrelevant or redundant content, we developed Pen—a prompt-enhanced network framework based on the prompt learning approach. Specifically, after constructing the sequence through the prompt method and encoding it with a language model, we performed region information global extraction on the encoded sequence for multi-view perception. By capturing global information about inference instances and demonstrations, Pen facilitates category selection by fully leveraging sequence information. This approach significantly improves model classification accuracy. Additionally, to bolster the model’s reasoning capabilities in the feature space, we introduced prompt-aware contrastive learning into the framework to improve the quality of sample feature distributions. Through extensive ablation experiments on two public datasets, we evaluate the effectiveness of the Pen framework, concurrently comparing it with state-of-the-art model baselines. Our research findings highlight that Pen surpasses manual prompt methods, showcasing superior generalization and classification accuracy in hateful meme classification tasks. Our code is available at https://github.com/juszzi/Pen.

List of keywords

Natural Language Processing -> NLP: Text classification
Machine Learning -> ML: Clustering
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Natural Language Processing -> NLP: Language models

3920

Pre-DyGAE: Pre-training Enhanced Dynamic Graph Autoencoder for Occupational Skill Demand Forecasting

Xi Chen, Chuan Qin, Zhigaoyuan Wang, Yihang Cheng, Chao Wang, Hengshu Zhu, Hui Xiong

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

Occupational skill demand (OSD) forecasting seeks to predict dynamic skill demand specific to occupations, beneficial for employees and employers to grasp occupational nature and maintain a competitive edge in the rapidly evolving labor market. Although recent research has proposed data-driven techniques for forecasting skill demand, the focus has remained predominantly on overall trends rather than occupational granularity. In this paper, we propose a novel Pre-training Enhanced Dynamic Graph Autoencoder (Pre-DyGAE), forecasting skill demand from an occupational perspective. Specifically, we aggregate job descriptions (JDs) by occupation and segment them into several timestamps. Subsequently, in the initial timestamps, we pre-train a graph autoencoder (GAE), consisting of a semantically-aware cross-attention enhanced uncertainty-aware encoder and decoders for link prediction and edge regression to achieve graph reconstruction. In particular, we utilize contrastive learning on skill cooccurrence clusters to solve the data sparsity and a unified Tweedie and ranking loss for predicting the imbalanced distribution. Afterward, we incorporate an adaptive temporal encoding unit and a temporal shift module into GAE to achieve a dynamic GAE (DyGAE). Furthermore, we fine-tune the DyGAE with a two-stage optimization strategy and infer future representations. Extensive experiments on four real-world datasets validate the effectiveness of Pre-DyGAE compared with state-of-the-art baselines.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Mining spatial and/or temporal data

3926

Enhancing Dual-Target Cross-Domain Recommendation with Federated Privacy-Preserving Learning

Zhenghong Lin, Wei Huang, Hengyu Zhang, Jiayu Xu, Weiming Liu, Xinting Liao, Fan Wang, Shiping Wang, Yanchao Tan

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

Recently, dual-target cross-domain recommendation (DTCDR) has been proposed to alleviate the data sparsity problem by sharing the common knowledge across domains simultaneously. However, existing methods often assume that personal data containing abundant identifiable information can be directly accessed, which results in a controversial privacy leakage problem of DTCDR. To this end, we introduce the P2DTR framework, a novel approach in DTCDR while protecting private user information. Specifically, we first design a novel inter-client knowledge extraction mechanism, which exploits the private set intersection algorithm and prototype-based federated learning to enable collaboratively modeling among multiple users and a server. Furthermore, to improve the recommendation performance based on the extracted common knowledge across domains, we proposed an intra-client enhanced recommendation, consisting of a constrained dominant set (CDS) propagation mechanism and dual-recommendation module. Extensive experiments on real-world datasets validate that our proposed P2DTR framework achieves superior utility under a privacy-preserving guarantee on both domains.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Applications
Data Mining -> DM: Privacy-preserving data mining

3942

One-step Spiking Transformer with a Linear Complexity

Xiaotian Song, Andy Song, Rong Xiao, Yanan Sun

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

Spiking transformers have recently emerged as a robust alternative in deep learning. One focus of this field is the reduction of energy consumption, given that spiking transformers require lengthy simulation timesteps and complex floating-point attention mechanisms. In this paper, we propose a one-step approach that requires only one timestep and is of linear complexity. The proposed One-step Spiking Transformer (OST) incorporates a Time Domain Compression and Compensation (TDCC) component, which can significantly mitigate the spatio-temporal overhead of spiking transformers. Another novel component in OST is the Spiking Linear Transformation (SLT), designed to greatly reduce the number of floating-point multiply-and-accumulate operations. Experiments on both static and neuromorphic images show that OST can perform as well as or better than SOTA methods with just one timestep, even for more difficult tasks. For instance, comparing with Spikeformer, OST gains 1.59% in accuracy on ImageNet, yet 40.27% more efficient, and gains 0.7% on DVS128 Gesture. The supplementary materials and source code are available at https://github.com/songxt3/OST.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Applications
Machine Learning -> General

3966

LG-GNN: Local-Global Adaptive Graph Neural Network for Modeling Both Homophily and Heterophily

Zhizhi Yu, Bin Feng, Dongxiao He, Zizhen Wang, Yuxiao Huang, Zhiyong Feng

6 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)

[+] More

[-] Less

Most Graph Neural Networks (GNNs) are based on the homophily assumption, where nodes with the same labels or similar features tend to be connected to each other. However, real-world graphs often do not adhere to this homophily assumption. Currently, most researches aggregate multi-hop neighbor information to discover more potentially relevant nodes. However, in the aggregation process of GNNs, the difference in modeling global and local information is not considered, inevitably leading to information loss. Motivated by this limitation, we propose LG-GNN, a local-global adaptive graph neural network for modeling both homophily and heterophily. Specifically, we model the long-range structural similarity and local feature similarity between nodes from global and local perspectives, in order to capture distant dependencies in highly heterophilic networks while reducing the mixing of locally dissimilar feature nodes, thereby increasing the effectiveness of information aggregation in highly heterophilic graphs. Extensive experiments on a wide range of real-world datasets demonstrate that our proposed approach performs well in both heterophilic and homophilic graphs.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning

3971

ROME: Robust Multi-Modal Density Estimator

Anna Mészáros, Julian F. Schumann, Javier Alonso-Mora, Arkady Zgonnikov, Jens Kober

6 min. talk | August 7th at 10:00 | Session: ML: Evaluation

[+] More

[-] Less

The estimation of probability density functions is a fundamental problem in science and engineering. However, common methods such as kernel density estimation (KDE) have been demonstrated to lack robustness, while more complex methods have not been evaluated in multi-modal estimation problems. In this paper, we present ROME (RObust Multi-modal Estimator), a non-parametric approach for density estimation which addresses the challenge of estimating multi-modal, non-normal, and highly correlated distributions. ROME utilizes clustering to segment a multi-modal set of samples into multiple uni-modal ones and then combines simple KDE estimates obtained for individual clusters in a single multi-modal estimate. We compared our approach to state-of-the-art methods for density estimation as well as ablations of ROME, showing that it not only outperforms established methods but is also more robust to a variety of distributions. Our results demonstrate that ROME can overcome the issues of over-fitting and over-smoothing exhibited by other estimators.

List of keywords

Machine Learning -> ML: Evaluation
Machine Learning -> ML: Probabilistic machine learning

3988

Deep Embedding Clustering Driven by Sample Stability

Zhanwen Cheng, Feijiang Li, Jieting Wang, Yuhua Qian

6 min. talk | August 8th at 10:00 | Session: ML: Clustering

[+] More

[-] Less

Deep clustering methods improve the performance of clustering tasks by jointly optimizing deep representation learning and clustering. While numerous deep clustering algorithms have been proposed, most of them rely on artificially constructed pseudo targets for performing clustering. This construction process requires some prior knowledge, and it is challenging to determine a suitable pseudo target for clustering. To address this issue, we propose a deep embedding clustering algorithm driven by sample stability (DECS), which eliminates the requirement of pseudo targets. Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability. The sample stability aims to explore the deterministic relationship between samples and all cluster centroids, pulling samples to their respective clusters and keeping them away from other clusters with high determinacy. We analyzed the convergence of the loss using Lipschitz continuity in theory, which verifies the validity of the model. The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Convolutional networks
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Unsupervised learning

3991

VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting

Yingnan Yang, Qingling Zhu, Jianyong Chen

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Multivariate time series (MTS) forecasting has been extensively applied across diverse domains, such as weather prediction and energy consumption. However, current studies still rely on the vanilla point-wise self-attention mechanism to capture cross-variable dependencies, which is inadequate in extracting the intricate cross-correlation implied between variables. To fill this gap, we propose Variable Correlation Transformer (VCformer), which utilizes Variable Correlation Attention (VCA) module to mine the correlations among variables. Specifically, based on the stochastic process theory, VCA calculates and integrates the cross-correlation scores corresponding to different lags between queries and keys, thereby enhancing its ability to uncover multivariate relationships. Additionally, inspired by Koopman dynamics theory, we also develop Koopman Temporal Detector (KTD) to better address non-stationarity in time series. The two key components enable VCformer to extract both multivariate correlations and temporal dependencies. Our extensive experiments on eight real-world datasets demonstrate the effectiveness of VCformer, achieving top-tier performance compared to other state-of-the-art baseline models. Code is available at this repository: https://github.com/CSyyn/VCformer.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Attention models

3992

Integrating Vision-Language Semantic Graphs in Multi-View Clustering

JunLong Ke, Zichen Wen, Yechenhao Yang, Chenhang Cui, Yazhou Ren, Xiaorong Pu, Lifang He

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

In recent years, a variety of graph learning-based multi-view clustering (MVC) methods have emerged. However, these methods continue to face challenges in extracting latent features from real-world data, particularly in scenarios involving high-resolution color images and high-dimensional features. This task is notably difficult in cases where images are visually similar yet semantically diverse. To address this issue, we present a novel large-scale pre-trained model for multi-view clustering, named Integrate Vision-Language Semantic Graphs in Multi-View Clustering (IVSGMV), which harnesses the capabilities of visual-language pre-training models to enhance clustering performance and confronts issues in the unsupervised tuning of pre-trained models for multi-view data. We introduce an effective unsupervised approach for creating semantic graphs from image multi-view datasets using pre-trained encoders. Our method addresses the inherent spatial noise and imbalance in these encoders by employing graph filters and a joint process that integrates both image node and edge features. Additionally, we demonstrate the application of our approach to multi-view image clustering on extensive datasets, notably the high-resolution MVImgNet, achieving an impressive 82% accuracy. Furthermore, our method extends the zero-shot capabilities of large-scale pre-trained models, resulting in good performance in clustering tasks on untrained multi-view datasets.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-modal learning

4008

Model Checking Causality

Tiago de Lima, Emiliano Lorini

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

We present a novel modal language for causal reasoning and interpret it by means of a semantics in which causal information is represented using causal bases in propositional form. The language includes modal operators of conditional causal necessity where the condition is a causal change operation. We provide a succinct formulation of model checking for our language and a model checking procedure based on a polysize reduction to QBF. We illustrate the expressiveness of our language through some examples and show that it allows us to represent and to formally verify a variety of concepts studied in the field of explainable AI including abductive explanation, intervention and actual cause.

List of keywords

Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages

4034

Scene-Adaptive Person Search via Bilateral Modulations

Yimin Jiang, Huibing Wang, Jinjia Peng, Xianping Fu, Yang Wang

6 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97.1% mAP and PRW with 60.5% mAP. The code is available at https://github.com/whbdmu/SEAS.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Representation learning

4044

Self-Promoted Clustering-based Contrastive Learning for Brain Networks Pretraining

Junbo Ma, Caixuan Luo, Jia Hou, Kai Zhao

6 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Rapid advancements in neuroimaging techniques, such as magnetic resonance imaging (MRI), have facilitated the acquisition of the structural and functional characteristics of the brain. Brain network analysis is one of the essential tools for exploring brain mechanisms from MRI, providing valuable insights into the brain’s organization, and stimulating the understanding of brain cognition and pathology of neurodegenerative diseases. Graph Neural Networks (GNNs) are commonly used for brain network analysis, but they are limited by the scarcity of medical data. Although Graph Contrastive Learning methods have been developed to address this, they often involve graph augmentations that distort the anatomical brain structures. To address these challenges, an augmentation-free contrastive learning method, named Self-Promoted Clustering-based Contrastive Learning(SPCCL), is proposed in this paper. Specifically, by introducing a clustering-based contrastive Learning loss and a self-promoted contrastive pairs creation scheme, the proposed SPCCL can be pre-trained from additional healthy subjects’ data that are relatively easier to acquire than disorder ones. The proposed SPCCL leverages these additional data with respect to the integrity of the original brain structure, making it a promising approach for effective brain network analysis. Comprehensive experiments are conducted on an open-access schizophrenic dataset, demonstrating the effectiveness of the proposed method.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Humans and AI -> HAI: Brain sciences
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Multi-modal learning

4050

A New Paradigm for Counterfactual Reasoning in Fairness and Recourse

Lucius E.J. Bynum, Joshua R. Loftus, Julia Stoyanovich

12 min. talk | August 7th at 10:00 | Session: UAI: Causality, structural causal models and causal inference

[+] More

[-] Less

Counterfactuals underpin numerous techniques for auditing and understanding artificial intelligence (AI) systems. The traditional paradigm for counterfactual reasoning in this literature is the interventional counterfactual, where hypothetical interventions are imagined and simulated. For this reason, the starting point for causal reasoning about legal protections and demographic data in AI is an imagined intervention on a legally-protected characteristic, such as ethnicity, race, gender, disability, age, etc. We ask, for example, what would have happened had your race been different? An inherent limitation of this paradigm is that some demographic interventions — like interventions on race — may not be well-defined or translate into the formalisms of interventional counterfactuals. In this work, we explore a new paradigm based instead on the backtracking counterfactual, where rather than imagine hypothetical interventions on legally-protected characteristics, we imagine alternate initial conditions while holding these characteristics fixed. We ask instead, what would explain a counterfactual outcome for you as you actually are or could be? This alternate framework allows us to address many of the same social concerns, but to do so while asking fundamentally different questions that do not rely on demographic interventions.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Moral decision making
Machine Learning -> ML: Causality

4089

Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference

Yonghao Liu, Mengyu Li, Di Liang, Ximing Li, Fausto Giunchiglia, Lan Huang, Xiaoyue Feng, Renchu Guan

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Natural Language Inference (NLI) is a crucial task in natural language processing that involves determining the relationship between two sentences, typically referred to as the premise and the hypothesis. However, traditional NLI models solely rely on the semantic information inherent in independent sentences and lack relevant situational visual information, which can hinder a complete understanding of the intended meaning of the sentences due to the ambiguity and vagueness of language. To address this challenge, we propose an innovative ScenaFuse adapter that simultaneously integrates large-scale pre-trained linguistic knowledge and relevant visual information for NLI tasks. Specifically, we first design an image-sentence interaction module to incorporate visuals into the attention mechanism of the pre-trained model, allowing the two modalities to interact comprehensively. Furthermore, we introduce an image-sentence fusion module that can adaptively integrate visual information from images and semantic information from sentences. By incorporating relevant visual information and leveraging linguistic knowledge, our approach bridges the gap between language and vision, leading to improved understanding and inference capabilities in NLI tasks. Extensive benchmark experiments demonstrate that our proposed ScenaFuse, a scenario-guided approach, consistently boosts NLI performance.

List of keywords

Machine Learning -> ML: Multi-modal learning
Data Mining -> DM: Mining text, web, social media
Knowledge Representation and Reasoning -> KRR: Learning and reasoning

4094

Attention Shifting to Pursue Optimal Representation for Adapting Multi-granularity Tasks

Gairui Bai, Wei Xi, Yihan Zhao, Xinhui Liu, Jizhong Zhao

6 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

Object recognition in open environments, e.g., video surveillance, poses significant challenges due to the inclusion of unknown and multi-granularity tasks (MGT). However, recent methods exhibit limitations as they struggle to capture subtle differences between different parts within an object and adaptively handle MGT. To address this limitation, this paper proposes a Class-semantic Guided Attention Shift (SegAS) method. SegAS transforms adaptive MGT into dynamic combinations of invariant discriminant representations across different levels to effectively enhance adaptability to multi-granularity downstream tasks. Specifically, SegAS incorporates a hardness-based Attention Part Filtering Strategy (ApFS) to dynamically decompose objects into complementary parts based on the object structure and relevance to the instance. Then, SegAS shifts attention to the optimal discriminant region of each part under the guidance of hierarchical class semantics. Finally, a diversity loss is employed to emphasize the importance and distinction of different partial features. Extensive experiments validate SegAS’ effectiveness in multi-granularity recognition of three tasks.

List of keywords

Computer Vision -> CV: Representation learning
Computer Vision -> CV: Recognition (object detection, categorization)

4106

Core-Structures-Guided Multi-Modal Classification Neural Architecture Search

Pinhan Fu, Xinyan Liang, Tingjin Luo, Qian Guo, Yayu Zhang, Yuhua Qian

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

The multi-modal classification methods based on neural architecture search (NAS-MMC) can automatically learn a satisfied classifier from a given multi-modal search space. However, as the number of multi-modal features and fusion operators increases, the complexity of search space has increased dramatically. Rapidly identifying the satisfied fusion model from this vast space is very challenging. In this paper, we propose an efficient NAS-MMC method based on an idea of shrink-and-expansion search space, called core-structure-guided neural architecture search (CSG-NAS). Specifically, an evolutionary algorithm is first used to find core structures from a shrunk space (also called core structure search space) determined by high-quality features and fusion operators. Then a local search algorithm is used to find the optimal MMC model from the expanded space determined by the discovered core structures and the rest features as well as fusion operators. Moreover, a knowledge transfer strategy is introduced to further improve the overall performance and efficiency of the entire search process. Finally, extensive experimental results demonstrate the effectiveness of our CSG-NAS, attaining the superiority of classification performance, training efficiency and model complexity, compared to state-of-the-art ompetitors on several public benchmark multi-modal tasks. The source code is available at https://github.com/fupinhan123/CSG-NAS.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Multi-modal learning

4130

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.

List of keywords

Machine Learning -> ML: Federated learning
Multidisciplinary Topics and Applications -> MTA: Software engineering

4131

Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval

Xiaobo Shen, Haoyu Cai, Xiuwen Gong, Yuhui Zheng

12 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

Hashing utilizes hash code as a compact image representation, offering excellent performance in large-scale image retrieval due to its computational and storage advantages. However, the prevalence of degraded images on social media platforms, resulting from imperfections in the image capture process, poses new challenges for conventional image retrieval methods. To address this issue, we propose Contrastive Transformer Masked Image Hashing (CTMIH), a novel deep unsupervised hashing method specifically designed for degraded image retrieval, which is challenging yet relatively less studied. CTMIH addresses the problem by training on transformed and masked images, aiming to learn transform-invariant hash code in an unsupervised manner to mitigate performance degradation caused by image deterioration. CTMIH utilizes Vision Transformer (ViT) architecture applied to image patches to capture distant semantic relevance. CTMIH introduces cross-view debiased contrastive loss to align hash tokens from augmented views of the same image and presents semantic mask reconstruction loss at the patch level to recover masked patch tokens. Extensive empirical studies conducted on three benchmark datasets demonstrate the superiority of the proposed CTMIH over the state-of-the-art in both degraded and normal image retrieval.

List of keywords

Computer Vision -> CV: Image and video retrieval
Machine Learning -> ML: Unsupervised learning

4133

Learning Multi-Granularity and Adaptive Representation for Knowledge Graph Reasoning

Ziyu Shang, Peng Wang, Wenjun Ke, Jiajun Liu, Hailang Huang, Guozheng Li, Chenxiao Wu, Jianghan Liu, Xiye Chen, Yining Li

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Knowledge graph reasoning (KGR) aims to infer new factual triples from existing knowledge graphs (KGs). Recently, a new category of methods, possessing both transductive and inductive reasoning capabilities, has been proposed to tackle this task via learning entity-independent representations from local neighboring structures. However, these methods are plagued by inefficiency issues and they exclusively capture evidence from well-designed local structures, ignoring the correlation between the query and different structures within KGs. In this work, we first propose a novel multi-granularity and adaptive representation framework, MulGA, exploiting the connectivity subgraph to uniformly and hierarchically model query-related triples, relation paths, and subgraphs without explicitly extracting any graph structure, hence mitigating inefficiency issues. Second, we introduce a message-passing mechanism across connectivity subgraphs, facilitating all entities to attain query-related structural representations of diverse granularity levels, i.e., triple and relation paths of different lengths. Third, we design a self-attention-based merging mechanism that allocates weights to different granularities and then consolidates them into subgraph granularity representations for reasoning. The systematic experiments have been conducted on 15 benchmarks and MulGA achieves a significant improvement in MRR by an average of 1.5% on transductive and 2.7% on inductive tasks than existing state-of-the-art methods. Moreover, MulGA boasts faster convergence speed, competitive inference time, and alleviates the over-smoothing prevalent in graph neural networks.

List of keywords

Data Mining -> DM: Knowledge graphs and knowledge base completion

4140

EFEVD: Enhanced Feature Extraction for Smart Contract Vulnerability Detection

Chi Jiang, Xihan Liu, Shenao Wang, Jinzhuo Liu, Yin Zhang

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Because of the wide deployment of smart contracts, smart contract vulnerabilities pose a challenging risk to blockchain security. Currently, deep learning-based vulnerability detection is a very attractive solution due to its ability to identify complex patterns and features. The existing methods mainly consider the contract code content features, expert knowledge patterns, and contract code modalities. To further enhance smart contract vulnerability detection, this paper attempts to identify community features from smart contracts with similar semantic and syntactic structures, and shared features from two related vulnerability detection tasks, vulnerability classification and localization. The experimental results verify that the proposed approach significantly outperforms the state-of-the-art methods in terms of accuracy, recall, precision, and F1-score.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Data Mining -> DM: Exploratory data mining
Multidisciplinary Topics and Applications -> MTA: Security and privacy

4146

Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, Yanan Sun

12 min. talk | August 7th at 10:00 | Session: ML: Incremental learning

[+] More

[-] Less

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft.

List of keywords

Machine Learning -> ML: Incremental learning
Computer Vision -> CV: Machine learning for vision

4147

Look-ahead Search on Top of Policy Networks in Imperfect Information Games

Ondřej Kubíček, Neil Burch, Viliam Lisý

12 min. talk | August 6th at 11:30 | Session: ML: Multiagent Reinforcement Learning

[+] More

[-] Less

Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search during train time. We evaluate the algorithm’s performance when trained along Regularized Nash Dynamics, and we evaluate the benefit of using the search in the standard benchmark game of Leduc hold’em, multiple variants of imperfect information Goofspiel, and Battleships.

List of keywords

Machine Learning -> ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Search -> S: Game playing
Search -> S: Local search

4155

Efficient Cost-Minimization Schemes for Electrical Energy Demand Satisfaction by Prosumers in Microgrids with Battery Storage Capabilities

Laura Codazzi, Gergely Csáji, Matthias Mnich

12 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

We introduce and study various models for satisfying electrical energy demands of prosumers in a microgrid, while optimizing their costs. Each prosumer has individual demands of electrical energy, which can vary day-by-day, and which they can satisfy by either generating electrical energy through a self-operated mini power plant like a solar panel, through buying from an external energy provider, such as the main grid or by trading with other prosumers. Our models take into account two key aspects motivated by real-life scenarios: first, we consider a daily volatility of prices for buying and selling the energy, and second, the possibility to store the self-generated energy in a battery of finite capacity to be either self-consumed or sold to other prosumers in the future. We provide a thorough complexity analysis, as well as efficient algorithms, so that prosumers can minimize their overall cost over the entire time horizon. As a byproduct, we also solve a new, generalized version of the KNAPSACK problem which may be of independent interest. We complement our theoretical findings by extensive experimental evaluations on realistic data sets.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Planning under uncertainty
Planning and Scheduling -> PS: Theoretical foundations of planning

4164

Couples Can Be Tractable: New Algorithms and Hardness Results for the Hospitals/Residents Problem with Couples

Gergely Csáji, David Manlove, Iain McBride, James Trimble

12 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

In this paper we study the Hospitals/Residents problem with Couples (HRC), where a solution is a stable matching or a report that none exists. We present a novel polynomial-time algorithm that can find a near-feasible stable matching (adjusting the hospitals’ capacities by at most 1) in an HRC instance where the couples’ preferences are sub-responsive (i.e., if one member switches to a better hospital, than the couple also improves if the new pair is also acceptable) and sub-complete (i.e., each pair of hospitals that are individually acceptable to both members are jointly acceptable for the couple) by reducing it to an instance of the Stable Fixtures problem. We also present a polynomial-time algorithm for HRC in a sub-responsive, sub-complete instance that is a Dual Market, or where all couples are one of several possible types. Our polynomial-time solvability results greatly expand the class of known tractable instances of HRC. We complement our algorithms with several hardness results. We show that HRC with sub-responsive and sub-complete couples is NP-hard, even with other strong restrictions. We also show that HRC with a Dual Market is NP-hard under several simultaneous restrictions.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Computational social choice

4175

Popular and Dominant Matchings with Uncertain and Multimodal Preferences

Gergely Csáji

6 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

We study the Popular Matching (PM) problem in multiple models, where the preferences of the agents in the instance may change or may be unknown or uncertain. In particular, we study an Uncertainty model, where each agent has a possible set of preference lists, a Multilayer model, where there are layers of preference profiles, and a Robust popularity model, where any agent may move some other agents up or down some places in his preference list. Our goal is always to find a matching that is popular in any possible preference profile. We study both one-sided (only one class of the agents have preferences) and two-sided bipartite markets. In the one-sided model, we show that all our problems can be solved in polynomial time by utilizing the structure of popular matchings. We also obtain nice structural results. With two-sided preferences, we show that all three above models lead to NP-hard questions for popular matchings. By using the connection between dominant matchings and stable matchings, we show that in the robust and uncertainty models, a certainly dominant matching in all possible preference profiles can be found in polynomial time, whereas in the multilayer model, the problem remains NP-hard for dominant matchings too. We also answer an open question about d-robust stable matchings.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Computational social choice

4176

Dynamic against Dynamic: An Open-Set Self-Learning Framework

Haifeng Yang, Chuanxing Geng, Pong C. Yuen, Songcan Chen

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

In open set recognition, existing methods generally learn statically fixed decision boundaries to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just simply reject unknown class samples during testing without any effective utilization for them. In fact, such samples completely can constitute the true instantiated representation of the unknown classes to further enhance the model’s performance. To address these issues, this paper proposes a novel dynamic against dynamic idea, i.e., dynamic method against dynamic changing open-set world, where an open-set self-learning (OSSL) framework is correspondingly developed. OSSL starts with a good closed-set classifier trained by known classes and utilizes available test samples for model adaptation during testing, thus gaining the adaptability to changing data distributions. In particular, a novel self-matching module is designed for OSSL, which can achieve the adaptation in automatically identifying known class samples while rejecting unknown class samples which are further utilized to enhance the discriminability of the model as the instantiated representation of unknown classes. Our method establishes new performance milestones respectively in almost all standard and cross-data benchmarks.

List of keywords

Machine Learning -> ML: Supervised Learning
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Classification

4177

Enhancing Fine-Grained Urban Flow Inference via Incremental Neural Operator

Qiang Gao, Xiaolong Song, Li Huang, Goce Trajcevski, Fan Zhou, Xueqin Chen

6 min. talk | August 6th at 11:30 | Session: MTA: Multidisciplinary Topics and Applications (1/2)

[+] More

[-] Less

Fine-grained urban flow inference (FUFI), which involves inferring fine-grained flow maps from their coarse-grained counterparts, is of tremendous interest in the realm of sustainable urban traffic services. To address the FUFI, existing solutions mainly concentrate on investigating spatial dependencies, introducing external factors, reducing excessive memory costs, etc., — while rarely considering the catastrophic forgetting (CF) problem. Motivated by recent operator learning, we present an Urban Neural Operator solution with Incremental learning (UNOI), primarily seeking to learn grained-invariant solutions for FUFI in addition to addressing CF. Specifically, we devise an urban neural operator (UNO) in UNOI that learns mappings between approximation spaces by treating the different-grained flows as continuous functions, allowing a more flexible capture of spatial correlations. Furthermore, the phenomenon of CF behind time-related flows could hinder the capture of flow dynamics. Thus, UNOI mitigates CF concerns as well as privacy issues by placing UNO blocks in two incremental settings, i.e., flow-related and task-related. Experimental results on large-scale real-world datasets demonstrate the superiority of our proposed solution against the baselines.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Sensor networks and smart cities
Data Mining -> DM: Applications
Data Mining -> DM: Mining spatial and/or temporal data
Multidisciplinary Topics and Applications -> MTA: Transportation

4184

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models

Kanxue Li, Baosheng Yu, Qi Zheng, Yibing Zhan, Yuhui Zhang, Tianle Zhang, Yijun Yang, Yue Chen, Lei Sun, Qiong Cao, Li Shen, Lusong Li, Dapeng Tao, Xiaodong He

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

Foundation models have demonstrated significant emergent abilities, holding great promise for enhancing embodied agents’ reasoning and planning capacities. However, the absence of a comprehensive benchmark for evaluating embodied agents with multimodal observations in complex environments remains a notable gap. In this paper, we present MuEP, a comprehensive Multimodal benchmark for Embodied Planning. MuEP facilitates the evaluation of multimodal and multi-turn interactions of embodied agents in complex scenes, incorporating fine-grained evaluation metrics that provide insights into the performance of embodied agents throughout each task. Furthermore, we evaluate embodied agents with recent state-of-the-art foundation models, including large language models (LLMs) and large multimodal models (LMMs), on the proposed benchmark. Experimental results show that foundation models based on textual representations of environments usually outperform their visual counterparts, suggesting a gap in embodied planning abilities with multimodal observations. We also find that control language generation is an indispensable ability beyond common-sense knowledge for accurate embodied task completion. We hope the proposed MuEP benchmark can contribute to the advancement of embodied AI with foundation models.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Computer Vision -> CV: Embodied vision: Active agents, simulation
Multidisciplinary Topics and Applications -> MTA: Databases

4197

ABM: Attention before Manipulation

Fan Zhuo, Ying He, Fei Yu, Pengteng Li, Zheyi Zhao, Xilong Sun

6 min. talk | August 9th at 11:30 | Session: CV: Computer Vision (1/2)

[+] More

[-] Less

Vision-language models (VLMs) show promising generalization and zero-shot capabilities, offering a potential solution to the impracticality and cost of enabling robots to comprehend diverse human instructions and scene semantics in the real world. Existing approaches most directly integrate the semantic representations from pre-trained VLMs with policy learning. However, these methods are limited to the labeled data learned, resulting in poor generalization ability to unseen instructions and objects. To address the above limitation, we propose a simple method called "Attention before Manipulation" (ABM), which fully leverages the object knowledge encoded in CLIP to extract information about the target object in the image. It constructs an Object Mask Field, serving as a better representation of the target object for the model to separate visual grounding from action prediction and acquire specific manipulation skills effectively. We train ABM for 8 RLBench tasks and 2 real-world tasks via behavior cloning. Extensive experiments show that our method significantly outperforms the baselines in the zero-shot and compositional generalization experiment settings.

List of keywords

Computer Vision -> CV: Embodied vision: Active agents, simulation
Robotics -> ROB: Applications
Robotics -> ROB: Manipulation
Robotics -> ROB: Robotics and vision

4200

Multi-level Disentangling Network for Cross-Subject Emotion Recognition Based on Multimodal Physiological Signals

Ziyu Jia, Fengming Zhao, Yuzhe Guo, Hairong Chen, Tianzi Jiang

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Emotion recognition based on multimodal physiological signals is attracting more and more attention. However, how to deal with the consistency and heterogeneity of multimodal physiological signals, as well as individual differences across subjects, pose two significant challenges. In this paper, we propose a Multi-level Disentangling Network named MDNet for cross-subject emotion recognition based on multimodal physiological signals. Specifically, MDNet consists of a modality-level disentangling module and a subject-level disentangling module. The modality-level disentangling module projects multimodal physiological signals into modality-invariant subspace and modality-specific subspace, capturing modality-invariant features and modality-specific features. The subject-level disentangling module separates subject-shared features and subject-private features among different subjects from multimodal data, which facilitates cross-subject emotion recognition. Experiments on two multimodal emotion datasets demonstrate that MDNet outperforms other state-of-the-art baselines.

List of keywords

Humans and AI -> HAI: Applications
Machine Learning -> ML: Multi-modal learning
Data Mining -> DM: Mining spatial and/or temporal data

4207

Reinforcement Learning from Diverse Human Preferences

Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent’s desired behaviors and properties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are limited by the need for accurate oracle preference labels. This paper addresses this limitation by developing a method for learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to a non-parameterized distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Applications

4215

MISA: MIning Saliency-Aware Semantic Prior for Box Supervised Instance Segmentation

Hao Zhu, Yan Zhu, Jiayu Xiao, Yike Ma, Yucheng Zhang, Jintao Li, Feng Dai

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Box supervised instance segmentation (BSIS) aims to achieve an effective trade-off between annotation costs and model performance by solely relying on bounding box annotations during training process. However, we observe that BSIS model is bottlenecked by the intricate objective under limited guidance, and tends to sacrifice segmentation capability in order to effectively recognize multiple instances. To boost the BSIS model’s perceptual ability for object shape and contour, we introduce MISA, that is, MIning Saliency-Aware semantic prior from a well-optimized box supervised semantic segmentation (BSSS) network, and incorporating cross-model guidance into the learning process of BSIS. Specifically, we first design a Frequency-Space Distillation (FSD) module to extract assorted salient prior knowledge from BSSS model, and perform cross-model alignment for transfering the prior to BSIS model. Furthermore, we introduce Semantic-Enhanced Pairwise Affinity (SEPA), which borrows the object perceptual ability of BSSS model to emphasize the contribution of salient objects for pairwise affinity, providing more accurate guidance for the BSIS network. Extensive experiments show that our proposed MISA consistently surpasses the existing state-of-the-art methods by a large margin in the BSIS scenario.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

4216

SACNN: Self Attention-based Convolutional Neural Network for Fraudulent Behaviour Detection in Sports

Maxx Richard Rahman, Lotfy Abdel Khaliq, Thomas Piper, Hans Geyer, Tristan Equey, Norbert Baume, Reid Aikin, Wolfgang Maass

12 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

Doping practices in sports by unscrupulous athletes have been an important societal issue for several decades. Recently, sample swapping has been raised as a potential practice performed by athletes to swap their doped samples with clean samples to evade the positive doping test. So far, the only proven method to detect such cases is by performing DNA analysis on samples. However, it is expensive and time-consuming, which goes beyond the budgetary limits of anti-doping organisations when implementing to all the samples collected during sports events. Therefore, in this paper, we propose a self attention-based convolutional neural network (SACNN) that incorporates both spatial and temporal behaviour of the longitudinal profile and generates embedding maps for solving the fraud detection problem in sports. We conduct extensive experiments on the real-world datasets. The result shows that SACNN outperforms other state-of-the-art baseline models for sequential anomaly detection. Moreover, we conduct a study with domain experts on real-world profiles using both DNA analysis and our proposed method; the result demonstrates the effectiveness of our proposed method and the impact it could bring to the society.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Sports
Machine Learning -> ML: Applications
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI

4221

SPGNet: A Shape-prior Guided Network for Medical Image Segmentation

Zhengxuan Song, Xun Liu, Wenhao Zhang, Yongyi Gong, Tianyong Hao, Kun Zeng

6 min. talk | August 9th at 10:00 | Session: CV: Biomedical image analysis

[+] More

[-] Less

Given the intricacy and variability of anatomical structures in medical images, some methods employ shape priors to constrain segmentation. However, limited by the representational capability of these priors, existing approaches often struggle to capture diverse target structure morphologies. To address this, we propose SPGNet to guide segmentation by fully exploiting category-specific shape knowledge. The key idea is to enable the network to perceive data shape distributions by learning from statistical shape models. We uncover shape relationships via clustering and obtain statistical prior knowledge using principal component analysis. Our dual-path network comprises a segmentation path and a shape-prior path that collaboratively discern and harness shape prior distribution to improve segmentation robustness. The shape-prior path further serves to refine shapes iteratively by cropping features from the segmentation path, guiding the segmentation path and directing attention specifically to the edges of shapes which could be most significantly susceptible to segmentation error. We demonstrate superior performance on chest X-ray and breast ultrasound benchmarks.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation

4230

CAP: A Context-Aware Neural Predictor for NAS

Han Ji, Yuqi Feng, Yanan Sun

6 min. talk | August 7th at 10:00 | Session: ML: Evaluation

[+] More

[-] Less

Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training based on the contextual information from the architectures. Specifically, the input architectures are encoded into graphs and the predictor infers the contextual structure around the nodes inside each graph. Then, enhanced by the proposed context-aware self-supervised task, the pre-trained predictor can obtain expressive and generalizable representations of architectures. Therefore, only a few annotated architectures are sufficient for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors. In particular, CAP can rank architectures precisely at the budget of only 172 annotated architectures in NAS-Bench-101. Moreover, CAP can help find promising architectures in both NAS-Bench-101 and DARTS search spaces on the CIFAR-10 dataset, serving as a useful navigator for NAS to explore the search space efficiently.

List of keywords

Machine Learning -> ML: Evaluation
Machine Learning -> ML: Automated machine learning

4235

WeatherGNN: Exploiting Meteo- and Spatial-Dependencies for Local Numerical Weather Prediction Bias-Correction

Binqing Wu, Weiqi Chen, Wengwei Wang, Bingqing Peng, Liang Sun, Ling Chen

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Due to insufficient local area information, numerical weather prediction (NWP) may yield biases for specific areas. Previous studies correct biases mainly by employing handcrafted features or applying data-driven methods intuitively, overlooking the complicated dependencies between weather factors and between areas. To address this issue, we propose WeatherGNN, a local NWP bias-correction method that utilizes Graph Neural Networks (GNNs) to exploit meteorological dependencies and spatial dependencies under the guidance of domain knowledge. Specifically, we introduce a factor GNN to capture area-specific meteorological dependencies adaptively based on spatial heterogeneity and a fast hierarchical GNN to capture dynamic spatial dependencies efficiently guided by Tobler’s first and second laws of geography. Our experimental results on two real-world datasets demonstrate that WeatherGNN achieves the state-of-the-art performance, outperforming the best baseline with an average of 4.75 % on RMSE.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Mining spatial and/or temporal data

4250

A Density-driven Iterative Prototype Optimization for Transductive Few-shot Learning

Jingcong Li, Chunjin Ye, Fei Wang, Jiahui Pan

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Few-shot learning (FSL) poses a considerable challenge since it aims to improve the model generalization ability with limited labeled data. Previous works usually attempt to construct class-specific prototypes and then predict novel classes using these prototypes. However, the feature distribution represented by the limited labeled data is coarse-grained, leading to large information gap between the labeled and unlabeled data as well as biases in the prototypes. In this paper, we investigate the correlation between sample quality and density, and propose a Density-driven Iterative Prototype Optimization to acquire high-quality prototypes, and further improve few-shot learning performance. Specifically, the proposed method consists of two optimization strategies. The similarity-evaluating strategy is for capturing the information gap between the labeled and unlabeled data by reshaping the feature manifold for the novel feature distribution. The density-driven strategy is proposed to iteratively refine the prototypes in the direction of density growth. The proposed method could reach or even exceed the state-of-the-art performance on four benchmark datasets, including mini-ImageNet, tiered-ImageNet, CUB, and CIFAR-FS. The code will be available soon at https://github.com/tailofcat/DIPO.

List of keywords

Machine Learning -> ML: Few-shot learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

4255

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

6 min. talk | August 9th at 10:00 | Session: MAS: Applications

[+] More

[-] Less

The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model’s robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Multi-task and transfer learning
Multidisciplinary Topics and Applications -> MTA: Transportation

4276

Solving Quantified Boolean Formulas with Few Existential Variables

Leif Eriksson, Victor Lagerkvist, Sebastian Ordyniak, George Osipov, Fahad Panolan, Mateusz Rychlicki

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

The quantified Boolean formula (QBF) problem is an important decision problem generally viewed as the archetype for PSPACE-completeness. Many problems of central interest in AI are in general not included in NP, e.g., planning, model checking, and non-monotonic reasoning, and for such problems QBF has successfully been used as a modelling tool. However, solvers for QBF are not as advanced as state of the art SAT solvers, which has prevented QBF from becoming a universal modelling language for PSPACE-complete problems. A theoretical explanation is that QBF (as well as many other PSPACE-complete problems) lacks natural parameters guaranteeing fixed-parameter tractability (FPT). In this paper we tackle this problem and consider a simple but overlooked parameter: the number of existentially quantified variables. This natural parameter is virtually unexplored in the literature which one might find surprising given the general scarcity of FPT algorithms for QBF. Via this parameterization we then develop a novel FPT algorithm applicable to QBF instances in conjunctive normal form (CNF) of bounded clause length. We complement this by a W[1]-hardness result for QBF in CNF of unbounded clause length as well as sharper lower bounds for the bounded arity case under the (strong) exponential-time hypothesis.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4283

A Top-Down Tree Model Counter for Quantified Boolean Formulas

Florent Capelli, Jean-Marie Lagniez, Andreas Plank, Martina Seidl

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

This paper addresses the challenge of solution counting for Quantified Boolean Formulas (QBFs), a task distinct from the well-established model counting problem for SAT (\#SAT). Unlike SAT, where models are straightforward assignments to Boolean variables, QBF solution counting involves tree models that capture dependencies among variables within different quantifier blocks. We present a comprehensive top-down tree model counter capable of handling diverse satisfiable QBF formulas. Emphasizing the critical role of the branching heuristic, which must consider variables in the correct order according to quantification blocks, we further demonstrate the importance of addressing connected components, free variables, and caching. Experimental results indicate that our proposed approach for counting tree models of QBF formulas is highly efficient in practice, surpassing existing state-of-the-art methods designed for this specific purpose.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools

4288

Learning Fair Representations for Recommendation via Information Bottleneck Principle

Junsong Xie, Yonghui Yang, Zihan Wang, Le Wu

6 min. talk | August 9th at 10:00 | Session: DM: Recommender systems

[+] More

[-] Less

User-oriented recommender systems (RS) characterize users’ preferences based on observed behaviors and are widely deployed in personalized services. However, RS may unintentionally capture biases related to sensitive attributes (e.g., gender) from behavioral data, leading to unfair issues and discrimination against particular groups (e.g., females). Adversarial training is a popular technique for fairness-aware RS, when filtering sensitive information in user modeling. Despite advancements in fairness, achieving a good accuracy-fairness trade-off remains a challenge in adversarial training. In this paper, we investigate fair representation learning from a novel information theory perspective. Specifically, we propose a model-agnostic Fair recommendation method via the Information Bottleneck principle FairIB. The learning objective of FairIB is to maximize the mutual information between user representations and observed interactions, while simultaneously minimizing it between user representations and sensitive attributes. This approach facilitates the capturing of essential collaborative signals in user representations while mitigating the inclusion of unnecessary sensitive information. Empirical studies on two real-world datasets demonstrate the effectiveness of the proposed FairIB, which significantly improves fairness while maintaining competitive recommendation accuracy, either in single or multiple sensitive scenarios. The code is available at https://github.com/jsxie9/IJCAI_FairIB.

List of keywords

Data Mining -> DM: Recommender systems
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Data Mining -> DM: Collaborative filtering
Machine Learning -> ML: Representation learning

4291

Heterogeneous Graph Transformer with Poly-Tokenization

Zhiyuan Lu, Yuan Fang, Cheng Yang, Chuan Shi

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Graph neural networks have shown widespread success for learning on graphs, but they still face fundamental drawbacks, such as limited expressive power, over-smoothing, and over-squashing. Meanwhile, the transformer architecture offers a potential solution to these issues. However, existing graph transformers primarily cater to homogeneous graphs and are unable to model the intricate semantics of heterogeneous graphs. Moreover, unlike small molecular graphs where the entire graph can be considered as the receptive field in graph transformers, real-world heterogeneous graphs comprise a significantly larger number of nodes and cannot be entirely treated as such. Consequently, existing graph transformers struggle to capture the long-range dependencies in these complex heterogeneous graphs. To address these two limitations, we present Poly-tokenized Heterogeneous Graph Transformer (PHGT), a novel transformer-based heterogeneous graph model. In addition to traditional node tokens, PHGT introduces a novel poly-token design with two more token types: semantic tokens and global tokens. Semantic tokens encapsulate high-order heterogeneous semantic relationships, while global tokens capture semantic-aware long-range interactions. We validate the effectiveness of PHGT through extensive experiments on standardized heterogeneous graph benchmarks, demonstrating significant improvements over state-of-the-art heterogeneous graph representation learning models.

List of keywords

Data Mining -> DM: Mining heterogenous data
Data Mining -> DM: Mining graphs

4300

Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

6 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized models. In this paper, we start from analysing the importance of guidance for the expansion of well-trained learngene layers, inspiring the design of a simple but highly effective Learngene approach termed SWS (Stage-wise Weight Sharing), where both learngene layers and their learning process critically contribute to providing knowledge and guidance for initializing models at varying scales. Specifically, to learn learngene layers, we build an auxiliary model comprising multiple stages where the layer weights in each stage are shared, after which we train it through distillation. Subsequently, we expand these learngene layers containing stage information at their corresponding stage to initialize models of variable depths. Extensive experiments on ImageNet-1K demonstrate that SWS achieves consistent better performance compared to many models trained from scratch, while reducing around 6.6× total training costs. In some cases, SWS performs better only after 1 epoch tuning. When initializing variable-sized models adapting for different resource constraints, SWS achieves better results while reducing around 20× parameters stored to initialize these models and around 10× pre-training costs, in contrast to the pre-training and fine-tuning approach.

List of keywords

Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Classification

4308

Enhancing Cross-Modal Retrieval via Visual-Textual Prompt Hashing

Bingzhi Chen, Zhongqi Wu, Yishu Liu, Biqing Zeng, Guangming Lu, Zheng Zhang

6 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

Cross-modal hashing has garnered considerable research interest due to its rapid retrieval and low storage costs. However, the majority of existing methods suffer from the limitations of context loss and information redundancy, particularly in simulated textual environments enriched with manually annotated tags or virtual descriptions. To mitigate these issues, we propose a novel Visual-Textual Prompt Hashing (VTPH) that aims to bridge the gap between simulated textual and visual modalities within a unified prompt optimization paradigm for cross-modal retrieval. By seamlessly integrating robust reasoning capabilities inherent in large-scale models, we design the visual and textual alignment prompt mechanisms to collaboratively enhance the contextual awareness and semantic capabilities embedded within simulated textual features. Furthermore, an affinity-adaptive contrastive learning strategy is dedicated to dynamically recalibrating the semantic interaction between visual and textual modalities by modeling the nuanced heterogeneity and semantic gaps between simulated and real-world textual environments. To the best of our knowledge, this is the first attempt to integrate both visual and textual prompt learning into cross-modal hashing, facilitating the efficacy of semantic coherence between diverse modalities. Extensive experiments on multiple benchmark datasets consistently demonstrate the superiority and robustness of our VTPH method over state-of-the-art competitors.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision, language and reasoning

4316

Individual Rationality in Topological Distance Games Is Surprisingly Hard

Argyrios Deligkas, Eduard Eiben, Dušan Knop, Šimon Schierreich

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

In the recently introduced topological distance games, strategic agents need to be assigned to a subset of vertices of a topology. In the assignment, the utility of an agent depends on both the agent’s inherent utilities for other agents and its distance from them on the topology. We study the computational complexity of finding individually-rational outcomes; this notion is widely assumed to be the very minimal stability requirement and requires that the utility of every agent in a solution is non-negative. We perform a comprehensive study of the problem’s complexity, and we prove that even in very basic cases, deciding whether an individually-rational solution exists is intractable. To reach at least some tractability, one needs to combine multiple restrictions of the input instance, including the number of agents and the topology and the influence of distant agents on the utility.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4319

Approximate Algorithms for k-Sparse Wasserstein Barycenter with Outliers

Qingyuan Yang, Hu Ding

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be “k-sparse”. In this paper, we study the k-sparse WB problem in the presence of outliers, which is a more practical setting since real-world data often contains noise. Existing WB algorithms cannot be directly extended to handle the case with outliers, and thus it is urgently needed to develop some novel ideas. First, we investigate the relation between k-sparse WB with outliers and the clustering (with outliers) problems. In particular, we propose a clustering based LP method that yields constant approximation factor for the k-sparse WB with outliers problem. Further, we utilize the coreset technique to achieve the (1+ε)-approximation factor for any ε>0, if the dimensionality is not high. Finally, we conduct the experiments for our proposed algorithms and illustrate their efficiencies in practice.

List of keywords

Machine Learning -> ML: Optimization
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Clustering

4335

Parameterized Analysis of Bribery in Challenge the Champ Tournaments

Juhi Chaudhary, Hendrik Molter, Meirav Zehavi

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

Challenge the champ tournaments are one of the simplest forms of competition, where a (initially selected) champ is repeatedly challenged by other players. If a player beats the champ, then that player is considered the new (current) champ. Each player in the competition challenges the current champ once in a fixed order. The champ of the last round is considered the winner of the tournament. We investigate a setting where players can be bribed to lower their winning probability against the initial champ. The goal is to maximize the probability of the initial champ winning the tournament by bribing the other players, while not exceeding a given budget for the bribes. In previous work is was shown that the problem can be solved in pseudo-polynomial time, and that it is in XP when parameterized by the number of players. We show that the problem is weakly NP-hard and W[1]-hard when parameterized by the number of players. On the algorithmic side, we show that the problem is fixed-parameter tractable when parameterized either by the number of different bribe values or the number of different probability values. To this end, we establish several results that are of independent interest. In particular, we show that the product knapsack problem is W[1]-hard when parameterized by the number of items in the knapsack, and that constructive bribery for cup tournaments is W[1]-hard when parameterized by the number of players. Furthermore, we present a novel way of designing mixed integer linear programs, ensuring optimal solutions where all variables are integers.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4348

Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning

Tianyu Ren, Xiao-Jun Zeng

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner’s Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Game Theory and Economic Paradigms -> GTEP: Cooperative games
Machine Learning -> ML: Multiagent Reinforcement Learning
Multidisciplinary Topics and Applications -> MTA: Social sciences

4349

A Tensor-Based Formalization of the Event Calculus

Efthimis Tsilionis, Alexander Artikis, Georgios Paliouras

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

We present a formalization of the Event Calculus (EC) in tensor spaces. The motivation for a tensor-based predicate calculus comes from the area of composite event recognition (CER). As a CER engine, we adopt a logic programming implementation of EC with optimizations for continuous narrative assimilation on data streams. We show how to evaluate EC rules algebraically and solve a linear equation to compute the corresponding models. We demonstrate the scalability of our approach with the use of large datasets from a real-world application domain, and show it outperforms significantly symbolic EC, in terms of processing time.

List of keywords

Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Matrix/tensor methods

4372

Fast Unpaired Multi-view Clustering

Xingfeng Li, Yuangang Pan, Yinghui Sun, Quansen Sun, Ivor Tsang, Zhenwen Ren

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Anchor based pair-wised multi-view clustering often assumes multi-view data are paired, and has demonstrated significant advancements in recent years. However, this presumption is easily violated, and data is commonly unpaired fully in practical applications due to the influence of data collection and storage processes. Addressing unpaired large-scale multi-view data through anchor learning remains a research gap. The absence of pairing in multi-view data disrupts the consistency and complementarity of multiple views, posing significant challenges in learning powerful and meaningful anchors and bipartite graphs from unpaired multi-view data. To tackle this challenge, this study proposes a novel Fast Unpaired Multi-view Clustering (FUMC) framework for fully unpaired large-scale multi-view data. Specifically, FUMC first designs an inverse local manifold learning paradigm to guide the learned anchors for effective pairing and balancing, ensuring alignment, fairness, and power in unpaired multi-view data. Meanwhile, a novel bipartite graph matching framework is developed to align unpaired bipartite graphs, creating a consistent bipartite graph from unpaired multi-view data. The efficacy, efficiency, and superiority of our FUMC are corroborated through extensive evaluations on numerous benchmark datasets with shallow and deep SOTA methods.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering

4392

CoAtFormer: Vision Transformer with Composite Attention

Zhiyong Chang, Mingjun Yin, Yan Wang

6 min. talk | August 9th at 10:00 | Session: CV: Representation learning

[+] More

[-] Less

Transformer has recently gained significant attention and achieved state-of-the-art performance in various computer vision applications, including image classification, instance segmentation, and object detection. However, the self-attention mechanism underlying the transformer leads to quadratic computational cost with respect to image size,limiting its widespread adoption in state-of-the-art vision backbones. In this paper we introduce an efficient and effective attention module we call Composite Attention. It features parallel branches, enabling the modeling of various global dependencies. In each composite attention module, one branch employs a dynamic channel attention module to capture global channel dependencies, while the other branch utilizes an efficient spatial attention module to extract long-range spatial interactions. In addition, we effectively blending composite attention module with convolutions, and accordingly develop a simple hierarchical vision backbone, dubbed CoAtFormer, by simply repeating the basic building block over multiple stages. Extensive experiments show our CoAtFormer achieves state-of-the-art results on various different tasks. Without any pre-training and extra data, CoAtFormer-Tiny, CoAtFormer-Small, and CoAtFormer-Base achieve 84.4%, 85.3%, and 85.9% top-1 accuracy on ImageNet-1K with 24M, 37M, and 73M parameters, respectively. Furthermore, CoAtFormer also consistently outperform prior work in other vision tasks such as object detection, instance segmentation, and semantic segmentation. When further pretraining on the larger dataset ImageNet-22k, we achieve 88.7% Top-1 accuracy on ImageNet-1K

List of keywords

Computer Vision -> CV: Representation learning
Machine Learning -> ML: Deep learning architectures

4407

HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu

6 min. talk | August 7th at 10:00 | Session: NLP: Sentiment analysis, stylistic analysis, and argument mining

[+] More

[-] Less

Multimodal Sentiment Analysis (MSA) aims to identify speakers’ sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i.e., audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy.

List of keywords

Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MTA: Security and privacy

4427

Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy

Chengrui Gao, Haopu Shang, Ke Xue, Dong Li, Chao Qian

12 min. talk | August 6th at 11:30 | Session: S: Search and machine learning

[+] More

[-] Less

Machine learning has been adapted to help solve NP-hard combinatorial optimization problems. One prevalent way is learning to construct solutions by deep neural networks, which has been receiving more and more attention due to the high efficiency and less requirement for expert knowledge. However, many neural construction methods for Vehicle Routing Problems~(VRPs) focus on synthetic problem instances with specified node distributions and limited scales, leading to poor performance on real-world problems which usually involve complex and unknown node distributions together with large scales. To make neural VRP solvers more practical, we design an auxiliary policy that learns from the local transferable topological features, named local policy, and integrate it with a typical construction policy (which learns from the global information of VRP instances) to form an ensemble policy. With joint training, the aggregated policies perform cooperatively and complementarily to boost generalization. The experimental results on two well-known benchmarks, TSPLIB and CVRPLIB, of travelling salesman problem and capacitated VRP show that the ensemble policy significantly improves both cross-distribution and cross-scale generalization performance, and even performs well on real-world problems with several thousand nodes.

List of keywords

Search -> S: Search and machine learning
Machine Learning -> ML: Reinforcement learning
Search -> S: Combinatorial search and optimisation

4442

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with the output of the teacher model, which can alleviate the training difficulty and give student model a comprehensive understanding of global structure. Differently, token-level distillation requires the student model to learn the output distribution of the teacher model, facilitating a more fine-grained transfer of knowledge. Studies have revealed divergent performances between sentence-level and token-level distillation across different scenarios, leading to the confusion on the empirical selection of knowledge distillation methods. In this study, we argue that token-level distillation, with its more complex objective (i.e., distribution), is better suited for “simple” scenarios, while sentence-level distillation excels in “complex” scenarios. To substantiate our hypothesis, we systematically analyze the performance of distillation methods by varying the model size of student models, the complexity of text, and the difficulty of decoding procedure. While our experimental results validate our hypothesis, defining the complexity level of a given scenario remains a challenging task. So we further introduce a novel hybrid method that combines token-level and sentence-level distillation through a gating mechanism, aiming to leverage the advantages of both individual methods. Experiments demonstrate that the hybrid method surpasses the performance of token-level or sentence-level distillation methods and the previous works by a margin, demonstrating the effectiveness of the proposed hybrid method.

List of keywords

Natural Language Processing -> NLP: Machine translation and multilinguality
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Other
Natural Language Processing -> NLP: Summarization

4447

Skip-Timeformer: Skip-Time Interaction Transformer for Long Sequence Time-Series Forecasting

Wenchang Zhang, Hua Wang, Fan Zhang

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Recent studies have raised questions about the suitability of the Transformer architecture for long sequence time-series forecasting. These forecasting models leverage Transformers to capture dependencies between multiple time steps in a time series, with embedding tokens composed of data from individual time steps. However, challenges arise when applying Transformers to predict long sequences with strong periodicity, leading to performance degradation and increased computational burden. Furthermore, embedding tokens formed one time step at a time may struggle to reveal meaningful information in long sequences, failing to capture correlations between different time steps. In this study, we propose Skip-Timeformer, a Transformer-based model that utilizes a skip-time interaction for long sequence time-series forecasting. Specifically, we decompose the time series into multiple subsequences based on different time intervals, embedding various time steps into variable tokens across multiple sequences. The skip-time interaction mechanism utilizes these variable tokens to capture dependencies in the skip-time dimension. Additionally, skip-time interaction is employed to learn dependencies between sequences missed by multiple skip time steps. The Skip-Timeformer model demonstrates state-of-the-art performance on various real-world datasets, further enhancing the long sequence forecasting capabilities of the Transformer variations and better adapting to arbitrary lookback windows.

List of keywords

Machine Learning -> ML: Time series and data streams

4451

Temporal Graph ODEs for Irregularly-Sampled Time Series

Alessio Gravina, Daniele Zambon, Davide Bacciu, Cesare Alippi

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Modern graph representation learning works mostly under the assumption of dealing with regularly sampled temporal graph snapshots, which is far from realistic, e.g., social networks and physical systems are characterized by continuous dynamics and sporadic observations. To address this limitation, we introduce the Temporal Graph Ordinary Differential Equation (TG-ODE) framework, which learns both the temporal and spatial dynamics from graph streams where the intervals between observations are not regularly spaced. We empirically validate the proposed approach on several graph benchmarks, showing that TG-ODE can achieve state-of-the-art performance in irregular graph stream tasks.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Time series and data streams

4454

Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo

6 min. talk | August 8th at 10:00 | Session: MTA: Security and privacy

[+] More

[-] Less

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB’s superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB’s proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB’s adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Security and privacy
Machine Learning -> ML: Multi-armed bandits
Machine Learning -> ML: Trustworthy machine learning

4460

Joint Domain Adaptive Graph Convolutional Network

Niya Yang, Ye Wang, Zhizhi Yu, Dongxiao He, Xin Huang, Di Jin

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

In the realm of cross-network tasks, graph domain adaptation is an effective tool due to its ability to transfer abundant labels from nodes in the source domain to those in the target domain. Existing adversarial domain adaptation methods mainly focus on domain-wise alignment. These approaches, while effective in mitigating the marginal distribution shift between the two domains, often ignore the integral aspect of structural alignment, potentially leading to negative transfer. To address this issue, we propose a joint adversarial domain adaptive graph convolutional network (JDA-GCN) that is uniquely augmented with structural graph alignment, so as to enhance the efficacy of knowledge transfer. Specifically, we construct a structural graph to delineate the interconnections among nodes within identical categories across the source and target domains. To further refine node representation, we integrate the local consistency matrix with the global consistency matrix, thereby leveraging the learning of the sub-structure similarity of nodes to enable more robust and effective representation of nodes. Empirical evaluation on diverse real-world datasets substantiates the superiority of our proposed method, marking a significant advancement over existing state-of-the-art graph domain adaptation algorithms.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Classification
Machine Learning -> ML: Sequence and graph learning

4464

Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports

Xiang Yin, Nico Potyka, Francesca Toni

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

Quantitatively explaining the strength of arguments under gradual semantics has recently received increasing attention. Specifically, several works in the literature provide quantitative explanations by computing the attribution scores of arguments. These works disregard the importance of attacks and supports, even though they play an essential role when explaining arguments’ strength. In this paper, we propose a novel theory of Relation Attribution Explanations (RAEs), adapting Shapley values from game theory to offer fine-grained insights into the role of attacks and supports in quantitative bipolar argumentation towards obtaining the arguments’ strength. We show that RAEs satisfy several desirable properties. We also propose a probabilistic algorithm to approximate RAEs efficiently. Finally, we show the application value of RAEs in fraud detection and large language models case studies.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability

4476

ROCES: Robust Class Expression Synthesis in Description Logics via Iterative Sampling

N’Dah Jean Kouagou, Stefan Heindorf, Caglar Demir, Axel-Cyrille Ngonga Ngomo

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

We consider the problem of class expression learning using cardinality-minimal sets of examples. Recent class expression learning approaches employ deep neural networks and have demonstrated tremendous performance improvements in execution time and quality of the computed solutions. However, they lack generalization capabilities when it comes to the number of examples used in a learning problem, i.e., they often perform poorly on unseen learning problems where only a few examples are given. In this work, we propose a generalization of the classical class expression learning problem to address the limitations above. In short, our generalized learning problem (GLP) forces learning systems to solve the classical class expression learning problem using the smallest possible subsets of examples, thereby improving the learning systems’ ability to solve unseen learning problems with arbitrary numbers of examples. Moreover, we develop ROCES, a learning algorithm for synthesis-based approaches to solve GLP. Experimental results suggest that post training, ROCES outperforms existing synthesis-based approaches on out-of-distribution learning problems while remaining highly competitive overall.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning

4479

Selecting the Most Conflicting Pair of Candidates

Théo Delemazure, Łukasz Janeczko, Andrzej Kaczmarczyk, Stanisław Szufa

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

We study committee elections from a perspective of finding the most conflicting candidates, that is, candidates that imply the largest amount of conflict, as per voter preferences. By proposing basic axioms to capture this objective, we show that none of the prominent multiwinner voting rules meet them. Consequently, we design committee voting rules compliant with our desiderata, introducing conflictual voting rules. A subsequent deepened analysis sheds more light on how they operate. Our investigation identifies various aspects of conflict, for which we come up with relevant axioms and quantitative measures, which may be of independent interest. We support our theoretical study with experiments on both real-life and synthetic data.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4485

Enhancing Controlled Query Evaluation through Epistemic Policies

Gianluca Cima, Domenico Lembo, Lorenzo Marconi, Riccardo Rosati, Domenico Fabio Savo

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

In this paper, we propose the use of epistemic dependencies to express data protection policies in Controlled Query Evaluation (CQE), which is a form of confidentiality-preserving query answering over ontologies and databases. The resulting policy language goes significantly beyond those proposed in the literature on CQE so far, allowing for very rich and practically interesting forms of data protection rules. We show the expressive abilities of our framework and study the data complexity of CQE for (unions of) conjunctive queries when ontologies are specified in the Description Logic DL-LiteR. Interestingly, while we show that the problem is in general intractable, we prove tractability for the case of acyclic epistemic dependencies by providing a suitable query rewriting algorithm. The latter result paves the way towards the implementation and practical application of this new approach to CQE.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4491

Atomic Recovery Property for Multi-view Subspace-Preserving Recovery

Yulong Wang

6 min. talk | August 8th at 10:00 | Session: ML: Clustering

[+] More

[-] Less

As the theoretical underpinnings for subspace clustering and classification, subspace-preserving recovery has attracted intensive attention in recent years. However, previous theoretical advances for subspace-preserving recovery only focus on the single-view data and most of them are based on conditions that are only sufficient. In this paper, we propose a necessary and sufficient condition referred to as Atomic Recovery Property (ARP) for multi-view subspace-preserving recovery. To this end, we generalize the atomic norm from single-view data to multi-view data and define the Multi-view Atomic Norm (MAN). Our another contribution is to provide a geometrically more interpretable characterization of ARP with respect to the unit ball of MAN. Based on the proposed multi-view subspace-preserving recovery theory, we also derive novel theoretical results for multi-view subspace clustering and classification, respectively.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Classification
Machine Learning -> ML: Matrix/tensor methods
Machine Learning -> ML: Multi-view learning

4516

A Logic for Reasoning about Aggregate-Combine Graph Neural Networks

Pierre Nunn, Marco Sälzer, François Schwarzentruber, Nicolas Troquard

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

We propose a modal logic in which counting modalities appear in linear inequalities. We show that each formula can be transformed into an equivalent graph neural network (GNN). We also show that a broad class of GNNs can be transformed efficiently into a formula, thus significantly improving upon the literature about the logical expressiveness of GNNs. We also show that the satisfiability problem is PSPACE-complete. These results bring together the promise of using standard logical methods for reasoning about GNNs and their properties, particularly in applications such as GNN querying, equivalence checking, etc. We prove that such natural problems can be solved in polynomial space.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Learning theory

4535

Normative Testimony and Belief Functions: A Formal Theory of Norm Learning

Taylor Olson, Kenneth D. Forbus

6 min. talk | August 8th at 11:30 | Session: ETF: AI Ethics, Trust, Fairness (2/2)

[+] More

[-] Less

The ability to learn another’s moral beliefs is necessary for all social agents. It allows us to predict their behavior and is a prerequisite to correcting their beliefs if they are incorrect. To make AI systems more socially competent, a formal theory for learning internal normative beliefs is thus needed. However, to the best of our knowledge, a philosophically justified formal theory for this process does not yet exist. This paper begins the development of such a theory, focusing on learning from testimony. We make four main contributions. First, we provide a set of axioms that any such theory must satisfy. Second, we provide justification for belief functions, as opposed to traditional probability theory, for modeling norm learning. Third, we construct a novel learning function that satisfies these axioms. Fourth, we provide a complexity analysis of this formalism and proof that deontic rules are sound under its semantics. This paper thus serves as a theoretical contribution towards modeling learning norms from testimony, paving the road towards more social AI systems.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Values
Agent-based and Multi-agent Systems -> MAS: Normative systems
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Uncertainty in AI -> UAI: Uncertainty representations

4551

Weighted EF1 and PO Allocations with Few Types of Agents or Chores

Jugal Garg, Aniket Murhekar, John Qin

6 min. talk | August 8th at 10:00 | Session: GTEP: Fair division

[+] More

[-] Less

We investigate the existence of fair and efficient allocations of indivisible chores to asymmetric agents who have unequal entitlements or weights. We consider the fairness notion of weighted envy-freeness up to one chore (wEF1) and the efficiency notion of Pareto-optimality (PO). The existence of EF1 and PO allocations of chores to symmetric agents is a major open problem in discrete fair division, and positive results are known only for certain structured instances. In this paper, we study this problem for a more general setting of asymmetric agents and show that an allocation that is wEF1 and PO exists and can be computed in polynomial time for instances with: – Three types of agents where agents with the same type have identical preferences but can have different weights. – Two types of chores For symmetric agents, our results establish that EF1 and PO allocations exist for three types of agents and also generalize known results for three agents, two types of agents, and two types of chores. Our algorithms use a weighted picking sequence algorithm as a subroutine; we expect this idea and our analysis to be of independent interest.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation

4553

Sampling Winners in Ranked Choice Voting

Matthew Iceland, Anson Kahng, Joseph Saber

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

Ranked choice voting (RCV) is a voting rule that iteratively eliminates least-popular candidates until there is a single winner with a majority of all remaining votes. In this work, we explore three central questions about predicting the outcome of RCV on an election given a uniform sample of votes. First, in theory, how poorly can RCV sampling predict RCV outcomes? Second, can we use insights from the recently-proposed map of elections to better predict RCV outcomes? Third, is RCV the best rule to use on a sample to predict the outcome of RCV in real-world elections? We find that although RCV can do quite poorly in the worst case and it may be better to use other rules to predict RCV winners on synthetic data from the map of elections, RCV generally predicts itself well on real-world data, further contributing to its appeal as a theoretically-flawed but practicable voting process. We further supplement our work by exploring the effect of margin of victory (MoV) on sampling accuracy.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4563

Using Large Language Models to Improve Query-based Constraint Acquisition

Younes Mechqrane, Christian Bessiere, Ismail Elabbassi

12 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization

[+] More

[-] Less

Most active constraint acquisition systems suffer from two weaknesses. They require the explicit generation of the set of potential constraints (the bias), whose size can be prohibitive for practical use of these systems, and the answers to queries contain little information. In this paper, we introduce ACQNOGOODS, an active learning schema that does not require the construction of a bias. We then propose LLMACQ, an active learning system that incorporates a Large Language Model component in the ACQNOGOODS schema. LLMACQ interprets the user’s answers given in natural language, leading to more informative communication. As our experiments show, the non requirement of a bias in extension combined to the higher level communication with the user allow LLMACQ to learn constraints of any arity and to dramatically decrease the number of queries.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition

4564

Towards Exact Computation of Inductive Bias

Akhilan Boopathy, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete

12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

List of keywords

Machine Learning -> ML: Learning theory
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Evaluation
Machine Learning -> ML: Other

4582

Structured d-DNNF Is Not Closed under Negation

Harry Vinall-Smeeth

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Both structured d-DNNF and SDD can be exponentially more succinct than OBDD. Moreover, SDD is essentially as tractable as OBDD. But this leaves left two important open questions. Firstly, does OBDD support more tractable transformations than structured d-DNNF? And secondly, is structured d-DNNF more succinct than SDD? In this paper, we answer both questions in the affirmative. For the first question we show that, unlike OBDD, structured d-DNNF does not support polytime negation, disjunction, or existential quantification operations. As a corollary, we deduce that there are functions with an equivalent polynomial-sized structured d-DNNF but with no such representation as an SDD, thus answering the second question. We also lift this second result to arithmetic circuits (AC) to show a succinctness gap between PSDD and the positive AC analogue to structured d-DNNF.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge compilation

4594

GRASP: A Novel Benchmark for Evaluating Language GRounding and Situated Physics Understanding in Multimodal Language Models

Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, Elia Bruni

6 min. talk | August 9th at 10:00 | Session: NLP: Resources and evaluation

[+] More

[-] Less

This paper presents GRASP, a novel benchmark to evaluate the language grounding and physical understanding capabilities of video-based multimodal large language models (LLMs). This evaluation is accomplished via a two-tier approach leveraging Unity simulations. The first level tests for language grounding by assessing a model’s ability to relate simple textual descriptions with visual information. The second level evaluates the model’s understanding of "Intuitive Physics" principles, such as object permanence and continuity. In addition to releasing the benchmark, we use it to evaluate several state-of-the-art multimodal LLMs. Our evaluation reveals significant shortcomings in the language grounding and intuitive physics capabilities of these models. Although they exhibit at least some grounding capabilities, particularly for colors and shapes, these capabilities depend heavily on the prompting strategy. At the same time, all models perform below or at the chance level of 50% in the Intuitive Physics tests, while human subjects are on average 80% correct. These identified limitations underline the importance of using benchmarks like GRASP to monitor the progress of future models in developing these competencies.

List of keywords

Natural Language Processing -> NLP: Resources and evaluation
Computer Vision -> CV: Vision, language and reasoning
Natural Language Processing -> NLP: Language grounding

4598

Searching for Programmatic Policies in Semantic Spaces

Rubens O. Moraes, Levi H. S. Lelis

6 min. talk | August 6th at 11:30 | Session: MTA: Multidisciplinary Topics and Applications (1/2)

[+] More

[-] Less

Syntax-guided synthesis is commonly used to generate programs encoding policies. In this approach, the set of programs, that can be written in a domain-specific language defines the search space, and an algorithm searches within this space for programs that encode strong policies. In this paper, we propose an alternative method for synthesizing programmatic policies, where we search within an approximation of the language’s semantic space. We hypothesized that searching in semantic spaces is more sample-efficient compared to syntax-based spaces. Our rationale is that the search is more efficient if the algorithm evaluates different agent behaviors as it searches through the space, a feature often missing in syntax-based spaces. This is because small changes in the syntax of a program often do not result in different agent behaviors. We define semantic spaces by learning a library of programs that present different agent behaviors. Then, we approximate the semantic space by defining a neighborhood function for local search algorithms, where we replace parts of the current candidate program with programs from the library. We evaluated our hypothesis in a real-time strategy game called MicroRTS. Empirical results support our hypothesis that searching in semantic spaces can be more sample-efficient than searching in syntax-based spaces.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Computer games
Multidisciplinary Topics and Applications -> MTA: Game playing

4601

What Is Best for Students, Numerical Scores or Letter Grades?

Evi Micha, Shreyas Sekar, Nisarg Shah

12 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

We study letter grading schemes, which are routinely employed for evaluating student performance. Typically, a numerical score obtained via one or more evaluations is converted into a letter grade (e.g., A+, B-, etc.) by associating a disjoint interval of numerical scores to each letter grade. We propose the first model for studying the (de)motivational effects of such grading on the students and, consequently, on their performance in future evaluations. We use the model to compare uniform letter grading schemes, in which the range of scores is divided into equal-length parts that are mapped to the letter grades, to numerical scoring, in which the score is not converted to any letter grade (equivalently, every score is its own letter grade). Theoretically, we identify realistic conditions under which numerical scoring is better than any uniform letter grading scheme. Our experiments confirm that this holds under even weaker conditions, but also find cases where the converse occurs.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4605

Faster Optimal Coalition Structure Generation via Offline Coalition Selection and Graph-Based Search

Redha Taguelmimt, Samir Aknine, Djamila Boukredera, Narayan Changder, Tuomas Sandholm

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

Coalition formation is a key capability in multi-agent systems. An important problem in coalition formation is coalition structure generation: partitioning agents into coalitions to optimize the social welfare. This is a challenging problem that has been the subject of active research for the past three decades. In this paper, we present a novel algorithm, SMART, for the problem based on a hybridization of three innovative techniques. Two of these techniques are based on dynamic programming, where we show a powerful connection between the coalitions selected for evaluation and the performance of the algorithms. These algorithms use offline phases to optimize the choice of coalitions to evaluate. The third one uses branch-and-bound and integer partition graph search to explore the solution space. Our techniques bring a new way of approaching the problem and a new level of precision to the field. In experiments over several common value distributions, we show that the hybridization of these techniques in SMART is faster than the fastest prior algorithms (ODP-IP, BOSS) in generating optimal solutions across all the value distributions.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Game Theory and Economic Paradigms -> GTEP: Cooperative games
Machine Learning -> ML: Optimization

4611

Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations

Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling

12 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a k-th order PDE, the k-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Machine Learning -> ML: Applications

4625

Bypassing the ASP Bottleneck: Hybrid Grounding by Splitting and Rewriting

Alexander Beiser, Markus Hecher, Kaan Unalan, Stefan Woltran

6 min. talk | August 7th at 11:30 | Session: KRR: Logic programming

[+] More

[-] Less

Answer Set Programming (ASP) is a key paradigm for problems in artificial intelligence and industrial contexts. In ASP, problems are modeled via a set of rules. Over the time this paradigm grew into a rich language, enabling complex rule types like aggregate expressions. Most practical ASP systems follow a ground-and-solve pattern, where rule schemes are grounded and resulting rules are solved. There, the so-called grounding bottleneck may prevent from solving, due to sheer grounding sizes. Recently body-decoupled grounding (BDG) demonstrated how to reduce grounding sizes by delegating effort to solving. However, BDG provides limited interoperability with traditional grounders and only covers simple rule types. In this work, we establish hybrid grounding — based on a novel splitting theorem that allows us to freely combine BDG with traditional grounders. To mitigate huge groundings in practice, we define rewriting procedures for efficiently deferring grounding effort of aggregates to solving. Our experimental results indicate that this approach is competitive, especially for instances, where traditional grounding fails.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Applications
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

4626

Scalable Ultrafast Almost-optimal Euclidean Shortest Paths

Stefan Funke, Daniel Koch, Claudius Proissl, Axel Schneewind, Armin Weiß, Felix Weitbrecht

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

We consider the problem of computing high-quality Euclidean shortest paths amidst obstacles on a large scale. By transferring and adapting speed-up techniques from the road network setting, we are able to compute source target paths for problem instances with several million obstacle vertices within few milliseconds after moderate preprocessing. We show experimentally that for small instances where optimal solutions are easily available on average our computed paths are less than 0.3% longer than the optimum. For large instances a new lower-bounding technique shows that on average our computed paths are less than 2% longer than the optimum paths. We compare our approach with the current state-of-the-art on problem instances derived from the OpenStreetMap project.

List of keywords

Planning and Scheduling -> PS: Routing
Multidisciplinary Topics and Applications -> MTA: Transportation
Planning and Scheduling -> PS: Applications
Search -> S: Combinatorial search and optimisation

4631

Multi-TA: Multilevel Temporal Augmentation for Robust Septic Shock Early Prediction

Hyunwoo Sohn, Kyungjin Park, Baekkwan Park, Min Chi

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

Early predicting the onset of a disease is critical to timely and accurate clinical decision-making, where a model determines whether a patient will develop the disease n hours later. While deep learning algorithms have demonstrated great success using multivariate irregular time-series data such as electronic health records (EHRs), they often lack temporal robustness due to data scarcity problems becoming more prominent at multilevel as n increases. At event-level, the decreasing number of available events per trajectory increases uncertainty in anticipating future disease behavior. At trajectory-level, the scarcity of patient trajectories limits diversity in the training population, hindering the model’s generalization. This work introduces Multi-TA, a multilevel temporal augmentation framework that leverages BERT-based temporal EHRs representation learning and a unified data augmentation approach, effectively addressing data scarcity issues at both event and trajectory levels. Validated on two real-world EHRs for septic shock, Multi-TA outperforms mixup and GAN-based state-of-the-art models across eight prediction windows, demonstrating improved temporal robustness. Further, we provide in-depth analyses on data augmentation.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Machine Learning -> ML: Robustness
Machine Learning -> ML: Time series and data streams
Natural Language Processing -> NLP: Applications

4638

Hierarchical Reinforcement Learning for Point of Interest Recommendation

Yanan Xiao, Lu Jiang, Kunpeng Liu, Yuanbo Xu, Pengyang Wang, Minghao Yin

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

With the increasing popularity of location-based services, accurately recommending points of interest (POIs) has become a critical task. Although existing technologies are proficient in processing time-series data, they fall short when it comes to accommodating the diversity and dynamism in users’ POI selections, particularly in extracting key signals from complex historical behaviors. To address this challenge, we introduced the Hierarchical Reinforcement Learning Preprocessing Framework (HRL-PRP), a framework that can be integrated into existing recommendation models to effectively optimize user profiles. The HRL-PRP framework employs a two-tiered decision-making process, where the high-level process determines the necessity of modifying profiles, and the low-level process focuses on selecting POIs within the profiles. Through evaluations on multiple real-world datasets, we have demonstrated that HRL-PRP surpasses existing state-of-the-art methods in various recommendation performance metrics.

List of keywords

Data Mining -> DM: Recommender systems

4641

Natural Language Decomposition and Interpretation of Complex Utterances

Harsh Jhamtani, Hao Fang, Patrick Xia, Eran Levy, Jacob Andreas, Benjamin Van Durme

12 min. talk | August 6th at 11:30 | Session: NLP: Dialogue and interactive systems

[+] More

[-] Less

Designing natural language interfaces has historically required collecting supervised data to translate user requests into carefully designed intent representations. This requires enumerating and labeling a long tail of user requests, which is challenging. At the same time, large language models (LLMs) encode knowledge about goals and plans that can help conversational assistants interpret user requests requiring numerous steps to complete. We introduce an approach to handle complex-intent-bearing utterances from a user via a process of hierarchical natural language decomposition and interpretation. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of simpler natural language steps and interprets each step using the language-to-program model designed for the interface. To test our approach, we collect and release DeCU —a new NL-to-program benchmark to evaluate Decomposition of Complex Utterances. Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data, while outperforming standard few-shot prompting approaches.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems
Natural Language Processing -> NLP: Language grounding
Natural Language Processing -> NLP: Natural language semantics

4643

ConstrainedZero: Chance-Constrained POMDP Planning Using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints

Robert J. Moss, Arec Jamgochian, Johannes Fischer, Anthony Corso, Mykel J. Kochenderfer

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce Δ-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO₂ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.

List of keywords

Planning and Scheduling -> PS: POMDPs
Machine Learning -> ML: Partially observable reinforcement learning and POMDPs
Planning and Scheduling -> PS: Planning under uncertainty

4663

Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu

6 min. talk | August 8th at 10:00 | Session: MTA: Security and privacy

[+] More

[-] Less

Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users’ membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user’s recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user’s recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Security and privacy
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

4665

Symplectic Neural Gaussian Processes for Meta-learning Hamiltonian Dynamics

Tomoharu Iwata, Yusuke Tanaka

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

We propose a meta-learning method for modeling Hamiltonian dynamics from a limited number of data. Although Hamiltonian neural networks have been successfully used for modeling dynamics that obey the energy conservation law, they require many data to achieve high performance. The proposed method meta-learns our neural network-based model using datasets in various dynamical systems, such that our model can predict vector fields of unseen systems. In our model, a system representation is inferred from given small data using an encoder network. Then, the system-specific vector field is predicted by modeling the Hamiltonian using a Gaussian process (GP) with neural network-based mean and kernel functions that depend on the inferred system representation. This GP-based Hamiltonian allows us to analytically obtain predictions that are adapted to small data while imposing the constraint of the conservation law. The neural networks are shared across systems, which enables us to learn knowledge from multiple systems, and use it for unseen systems. In our experiments, we demonstrate that the proposed method outperforms existing methods for predicting dynamics from a small number of observations in target systems.

List of keywords

Machine Learning -> ML: Meta-learning
Multidisciplinary Topics and Applications -> MTA: Physical sciences

4724

Dual Calibration-based Personalised Federated Learning

Xiaoli Tang, Han Yu, Run Tang, Chao Ren, Anran Li, Xiaoxiao Li

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Personalized federated learning (PFL) is designed for scenarios with non-independent and identically distributed (non-IID) client data. Existing model mixup-based methods, one of the main approaches of PFL, can only extract either global or personalized features during training, thereby limiting effective knowledge sharing among clients. To address this limitation, we propose the Dual Calibration-based PFL (DC-PFL). It divides local models into a heterogeneous feature extractor and a homogeneous classifier. The FL server utilizes mean and covariance representations from clients’ feature extractors to train a global generalized classifier, facilitating information exchange while preserving privacy. To enhance personalization and convergence, we design a feature extractor-level calibration method with an auxiliary loss for local models to refine feature extractors using global knowledge. Furthermore, DC-PFL refines the global classifier through the global classifier-level calibration, utilizing sample representations derived from an approximate Gaussian distribution model specific to each class. This method precludes the need to transmit original data representations, further enhancing privacy preservation. Extensive experiments on widely used benchmark datasets demonstrate that DC-PFL outperforms eight state-of-the-art methods, surpassing the best-performing baseline by 1.22% and 9.22% in terms of accuracy on datasets CIFAR-10 and CIFAR-100, respectively.

List of keywords

Machine Learning -> ML: Federated learning

4725

LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

Jianfeng Lu, Yue Chen, Shuqin Cao, Longbiao Chen, Wei Wang, Yun Xin

6 min. talk | August 7th at 10:00 | Session: ML: Evaluation

[+] More

[-] Less

Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions involving additional model training to check the data distribution inevitably increase computational costs and the risk of privacy leakage. The challenges in solving these issues are how to reduce the impact of non-IID data without involving raw data, and how to rationalize the communication resource allocation for addressing straggler problem. To tackle these challenges, we propose a novel optimization method based on coaLition formation gamE and grAdient Projection, called LEAP. Specifically, we combine edge data distribution with coalition formation game innovatively to adjust the correlations between clients and ESs dynamically, ensuring optimal correlations. We further capture the client heterogeneity to achieve the rational bandwidth allocation from coalition perception and determine the optimal transmission power within specified delay constraints at the client level. Experimental results on four real datasets show that LEAP is able to achieve 20.62% improvement in model accuracy compared to the state-of-the-art baselines. Moreover, LEAP effectively reduces transmission energy consumption by at least about 2.24 times.

List of keywords

Machine Learning -> ML: Evaluation
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Machine Learning -> ML: Game Theory

4738

Stochastic Neural Simulator for Generalizing Dynamical Systems across Environments

Liu Jiaqi, Jiaxu Cui, Jiayi Yang, Bo Yang

6 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

Neural simulators for modeling complex dynamical systems have been extensively studied for various real-world applications, such as weather forecasting, ocean current prediction, and computational fluid dynamics simulation. Although they have demonstrated powerful fitting and predicting, most existing models are only built to learn single-system dynamics. Several advanced researches have considered learning dynamics across environments, which can exploit the potential commonalities among the dynamics across environments and adapt to new environments. However, these methods still are prone to scarcity problems where per-environment data is sparse or limited. Therefore, we propose a novel CoNDP (Context-Informed Neural ODE Processes) to achieve learning system dynamics from sparse observations across environments. It can fully use contextual information of each environment to better capture the intrinsic commonalities across environments and distinguishable differences among environments while modeling uncertainty of system evolution, producing more accurate predictions. Intensive experiments are conducted on five complex dynamical systems in various fields. Results show that the proposed CoNDP can achieve optimal results compared with common neural simulators and state-of-the-art cross-environmental models.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Machine Learning -> ML: Time series and data streams

4747

ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-modal Uniform Alignment

Ziyan Wang, Zhankun Xiong, Feng Huang, Xuan Liu, Wen Zhang

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https://github.com/wzy-Sarah/ZeroDDI.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Health and medicine

4750

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner

12 min. talk | August 6th at 11:30 | Session: S: Search and machine learning

[+] More

[-] Less

This work proposes a new learning-to-search benchmark and uses AI to discover new mathematical knowledge related to an open conjecture of Erdos (1975) in extremal graph theory. The problem is to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum—jump-starting the search for larger graphs using good graphs found at smaller sizes—we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs.

List of keywords

Search -> S: Search and machine learning
Multidisciplinary Topics and Applications -> MTA: Other
Search -> S: Local search
Search -> S: Combinatorial search and optimisation

4761

Full Bayesian Significance Testing for Neural Networks in Traffic Forecasting

Zehua Liu, Jingyuan Wang, Zimeng Li, Yue He

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

Due to the complex and dynamic traffic contexts, the interpretability and uncertainty of traffic forecasting have gained increasing attention. Significance testing is a powerful tool in statistics used to determine whether a hypothesis is valid, facilitating the identification of pivotal features that predominantly contribute to the true relationship. However, existing works mainly regard traffic forecasting as a deterministic problem, making it challenging to perform effective significance testing. To fill this gap, we propose to conduct Full Bayesian Significance Testing for Neural Networks in Traffic Forecasting, namely ST-nFBST. A Bayesian neural network is utilized to capture the complicated traffic relationships through an optimization function resolved in the context of aleatoric uncertainty and epistemic uncertainty. Thereupon, ST-nFBST can achieve the significance testing by means of a delicate grad-based evidence value, further capturing the inherent traffic schema for better spatiotemporal modeling. Extensive experiments are conducted on METR-LA and PEMS-BAY to verify the advantages of our method in terms of uncertainty analysis and significance testing, helping the interpretability and promotion of traffic forecasting.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Uncertainty in AI -> UAI: Uncertainty representations
Machine Learning -> ML: Explainable/Interpretable machine learning

4766

LLMs Can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively.

List of keywords

Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Multidisciplinary Topics and Applications -> MTA: Education

4777

SCTrans: Multi-scale scRNA-seq Sub-vector Completion Transformer for Gene-selective Cell Type Annotation

Lu Lin, Wen Xue, Xindian Wei, Wenjun Shen, Cheng Liu, Si Wu, Hau San Wong

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Cell type annotation is pivotal to single-cell RNA sequencing data (scRNA-seq)-based biological and medical analysis, e.g., identifying biomarkers, exploring cellular heterogeneity, and understanding disease mechanisms. The previous annotation methods typically learn a nonlinear mapping to infer cell type from gene expression vectors, and thus fall short in discovering and associating salient genes with specific cell types. To address this issue, we propose a multi-scale scRNA-seq Sub-vector Completion Transformer, and our model is referred to as SCTrans. Considering that the expressiveness of gene sub-vectors is richer than that of individual genes, we perform multi-scale partitioning on gene vectors followed by masked sub-vector completion, conditioned on unmasked ones. Toward this end, the multi-scale sub-vectors are tokenized, and the intrinsic contextual relationships are modeled via self-attention computation and conditional contrastive regularization imposed on an encoding transformer. By performing mutual learning between the encoder and an additional lightweight counterpart, the salient tokens can be distinguished from the others. As a result, we can perform gene-selective cell type annotation, which contributes to our superior performance over state-of-the-art annotation methods.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Computer Vision -> CV: Applications
Data Mining -> DM: Applications

4778

RSAP-DFM: Regime-Shifting Adaptive Posterior Dynamic Factor Model for Stock Returns Prediction

Quanzhou Xiang, Zhan Chen, Qi Sun, Rujun Jiang

6 min. talk | August 7th at 10:00 | Session: MTA: Finance

[+] More

[-] Less

As the latest development of asset pricing research, how to use machine learning to improve the performance of factor models has become a topic of concern in recent years. The variability of the instantaneous macro environment brings great difficulties to quantitative investment, so the extended factor model must learn how to self-adapt to extract the macro pattern from the massive stock volume and price information, and how to continuously map the extracted macro pattern to the stock investment is also an open question. To this end, we propose the first continuous regime-based dynamic factor model, RSAP-DFM, which adaptively extracts continuous macroeconomic information and completes the dynamic explicit mapping of stock returns for the first time through dual regime shifting, while the adversarial posterior factors effectively correct the mapping deviation of prior factors. In addition, our model integrates an innovative two-stage optimization algorithm and normally distributed sampling, which further enhances the robustness of the model. Performance on three real stock datasets validates the validity of our model, which exceeds any previous methods available.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Finance
Machine Learning -> ML: Applications

4787

Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation

Jun Yu, Keda Lu, Ji Zhao, Zhihong Wei, Iek-Heng Chu, Peng Chang

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Real-time engagement estimation has been an important research topic in human-computer interaction in recent years. The emergence of the NOvice eXpert Interaction (NOXI) dataset, enriched with frame-wise engagement annotations, has catalyzed a surge in research efforts in this domain. Existing feature sequence partitioning methods for ultra-long videos have encountered challenges including insufficient information utilization and repetitive inference. Moreover, those studies focus mainly on the target participants’ features without taking into account those of the interlocutor. To address these issues, we propose the center-based sliding window method to obtain feature subsequences. The core of these subsequences is modeled using our innovative Central Engagement Attention Model (CEAM). Additionally, we introduce the dialogue cross-enhanced module that effectively incorporates the interlocutor’s features via cross-attention. Our proposed method outperforms the current best model, achieving a substantial gain of 1.5% in coordination correlation coefficient (CCC) and establishing a new state-of-the-art result. Our source codes and model checkpoints are available at https://github.com/wujiekd/Dialogue-Cross-Enhanced-CEAM.

List of keywords

Humans and AI -> HAI: Human-computer interaction
Humans and AI -> HAI: Computer-aided education
Humans and AI -> HAI: Personalization and user modeling

4799

Dual Expert Distillation Network for Generalized Zero-Shot Learning

Zhijie Rao, Jingcai Guo, Xiaocheng Lu, Jingming Liang, Jie Zhang, Haozhao Wang, Kang Wei, Xiaofeng Cao

12 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)

[+] More

[-] Less

Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i.e., Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Extensive experiments on various benchmark datasets indicate a new state-of-the-art. The code is available at github.com/zjrao/DEDN.

List of keywords

Machine Learning -> ML: Cost-sensitive learning
Machine Learning -> ML: Few-shot learning

4807

Exploring Urban Semantics: A Multimodal Model for POI Semantic Annotation with Street View Images and Place Names

Dabin Zhang, Meng Chen, Weiming Huang, Yongshun Gong, Kai Zhao

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Semantic annotation for points of interest (POIs) is the process of annotating a POI with a category label, which facilitates many services related to POIs, such as POI search and recommendation. Most of the existing solutions extract features related to POIs from abundant user-generated content data (e.g., check-ins and user comments). However, such data are often difficult to obtain, especially for newly created POIs. In this paper, we aim to explore semantic annotation for POIs with limited information such as POI (place) names and geographic locations. Additionally, we have found that the street view images provide extensive visual clues about POI attributes and could be an essential supplement to limited information of POIs that enables semantic annotation. To this end, we propose a novel multimodal model for POI semantic annotation, namely M3PA, which achieves enhanced semantic annotation through fusing a POI’s textual and visual representations. Specifically, M3PA extracts visual features from street view images using a pre-trained image encoder and integrates these features to generate the visual representation of a targeted POI based on a geographic attention mechanism. Furthermore, M3PA utilizes the contextual information of neighboring POIs to extract textual features and captures their spatial relationships through geographical encoding to generate the textual representation of a targeted POI. Finally, the visual and textual representations of a POI are fused for semantic annotation. Extensive experiments with POI data from Amap validate the effectiveness of M3PA for POI semantic annotation, compared with several competitive baselines.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

4831

ParaILP: A Parallel Local Search Framework for Integer Linear Programming with Cooperative Evolution Mechanism

Peng Lin, Mengchuan Zou, Zhihan Chen, Shaowei Cai

6 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

The integer linear programming (ILP) problem is a fundamental research topic in operations research, and the local search method is an important class of algorithms for quickly solving many combinatorial optimization problems. With rapidly increasing computing power, parallelism turns out to be a promising approach to enhancing the efficiency of problem-solving. However, rare studies investigate parallel local search algorithms for solving the general ILP problem. We propose the first parallel local search framework (ParaILP) for solving the general ILP problem, based on two novel ideas: a new initialization method named polarity initialization to construct different initial solutions for local search threads and a cooperative evolution mechanism for managing and generating high-quality solutions using information shared by different threads. Extensive experiments demonstrate that ParaILP is significantly better than the state-of-the-art academic parallel solvers FiberSCIP and HiGHS, and is competitive with the state-of-the-art commercial parallel solver Gurobi. Experiments are also conducted to analyze the parallelization scalability and the effectiveness of our techniques.

List of keywords

Search -> S: Local search
Search -> S: Evolutionary computation

4841

Visual Attention Prompted Prediction and Learning

Yifei Zhang, Bo Pan, Siyi Gu, Guangji Bai, Meikang Qiu, Xiaofeng Yang, Liang Zhao

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

Visual explanation (attention)-guided learning uses not only labels but also explanations to guide the model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model retraining. For example, when doing AI-assisted cancer classification on a medical image, users (e.g., clinicians) can provide the AI model with visual attention prompts on which areas are indispensable and which are precluded. Despite its promising objectives, achieving visual attention-prompted prediction presents several major challenges: 1) How can the visual prompt be effectively integrated into the model’s reasoning process? 2) How should the model handle samples that lack visual prompts? 3) What is the impact on the model’s performance when a visual prompt is imperfect? This paper introduces a novel framework for visual attention prompted prediction and learning, utilizing visual prompts to steer the model’s reasoning process. To improve performance in non-prompted situations and align it with prompted scenarios, we propose a co-training approach for both non-prompted and prompted models, ensuring they share similar parameters and activation. Additionally, for instances where the visual prompt does not encompass the entire input image, we have developed innovative attention prompt refinement methods. These methods interpolate the incomplete prompts while maintaining alignment with the model’s explanations. Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.

List of keywords

Machine Learning -> ML: Knowledge-aided learning
Humans and AI -> HAI: Human-AI collaboration
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Multi-task and transfer learning

4850

Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang

6 min. talk | August 8th at 11:30 | Session: ROB: Robotics (2/2)

[+] More

[-] Less

Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs’ success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA’s superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is https://dids-ei.github.io/Project/LLM-OBTEA.

List of keywords

Robotics -> ROB: Behavior and control
Planning and Scheduling -> PS: Robot planning
Robotics -> ROB: Human robot interaction

4856

Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning

Jiewen Deng, Renhe Jiang, Jiaqi Zhang, Xuan Song

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at https://github.com/beginner-sketch/MoSSL.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Time series and data streams

4866

Deciphering the Projection Head: Representation Evaluation Self-supervised Learning

Jiajun Ma, Tianyang Hu, Wenjia Wang

6 min. talk | August 7th at 11:30 | Session: ML: Self-supervised Learning

[+] More

[-] Less

Self-supervised learning (SSL) aims to learn the intrinsic features of data without labels. Despite the diverse SSL architectures, the projection head always plays an important role in improving downstream task performance. In this study, we systematically investigate the role of the projection head in SSL. We find that the projection head targets the uniformity aspect, which maps samples into uniform distribution and enables the encoder to focus on extracting semantic features. Drawing on this insight, we propose a Representation Evaluation Design (RED) in SSL models in which a shortcut connection between the representation and the projection vectors is built. Our extensive experiments with different architectures (including SimCLR, MoCo-V2, and SimSiam) on various datasets demonstrate that the RED-SSL consistently outperforms their baseline counterparts in downstream tasks. Furthermore, the RED-SSL learned representations exhibit superior robustness to previously unseen augmentations and out-of-distribution data.

List of keywords

Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Representation learning

4883

Safety Constrained Multi-Agent Reinforcement Learning for Active Voltage Control

Yang Qu, Jinming Ma, Feng Wu

6 min. talk | August 9th at 10:00 | Session: MAS: Applications

[+] More

[-] Less

Active voltage control presents a promising avenue for relieving power congestion and enhancing voltage quality, taking advantage of the distributed controllable generators in the power network, such as roof-top photovoltaics. While Multi-Agent Reinforcement Learning (MARL) has emerged as a compelling approach to address this challenge, existing MARL approaches tend to overlook the constrained optimization nature of this problem, failing in guaranteeing safety constraints. In this paper, we formalize the active voltage control problem as a constrained Markov game and propose a safety-constrained MARL algorithm. We expand the primal-dual optimization RL method to multi-agent settings, and augment it with a novel approach of double safety estimation to learn the policy and to update the Lagrange-multiplier. In addition, we proposed different cost functions and investigated their influences on the behavior of our constrained MARL method. We evaluate our approach in the power distribution network simulation environment with real-world scale scenarios. Experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art MARL methods.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

4888

Designing Behavior-Aware AI to Improve the Human-AI Team Performance in AI-Assisted Decision Making

Syed Hasan Amin Mahmood, Zhuoran Lu, Ming Yin

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

With the rapid development of decision aids that are driven by AI models, the practice of AI-assisted decision making has become increasingly prevalent. To improve the human-AI team performance in decision making, earlier studies mostly focus on enhancing humans’ capability in better utilizing a given AI-driven decision aid. In this paper, we tackle this challenge through a complementary approach—we aim to train "behavior-aware AI" by adjusting the AI model underlying the decision aid to account for humans’ behavior in adopting AI advice. In particular, as humans are observed to accept AI advice more when their confidence in their own judgement is low, we propose to train AI models with a human-confidence-based instance weighting strategy, instead of solving the standard empirical risk minimization problem. Under an assumed, threshold-based model characterizing when humans will adopt the AI advice, we first derive the optimal instance weighting strategy for training AI models. We then validate the efficacy and robustness of our proposed method in improving the human-AI joint decision making performance through systematic experimentation on synthetic datasets. Finally, via randomized experiments with real human subjects along with their actual behavior in adopting the AI advice, we demonstrate that our method can significantly improve the decision making performance of the human-AI team in practice.

List of keywords

Humans and AI -> HAI: Human-computer interaction
Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Personalization and user modeling
Humans and AI -> HAI: Human computation and crowdsourcing

4892

Reinforcement Nash Equilibrium Solver

Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An

12 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as “inexact solvers”, or “solvers” for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as alpha-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., alpha-rank, CE, FP and PRD, and can be generalized to unseen games.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Game Theory
Machine Learning -> ML: Reinforcement learning

4902

Making LLMs as Fine-Grained Relation Extraction Data Augmentor

Yifan Zheng, Wenjun Ke, Qi Liu, Yuting Yang, Ruizhuo Zhao, Dacheng Feng, Jianwei Zhang, Zhi Fang

6 min. talk | August 9th at 11:30 | Session: NLP: Language models

[+] More

[-] Less

Relation Extraction (RE) identifies relations between entities in text, typically relying on supervised models that demand abundant high-quality data. Various approaches, including Data Augmentation (DA), have been proposed as promising solutions for addressing low-resource challenges in RE. However, existing DA methods in RE often struggle to ensure consistency and contextual diversity in generated data due to the fine-grained nature of RE. Inspired by the extensive generative capabilities of large language models (LLMs), we introduce a novel framework named ConsistRE, aiming to maintain context consistency in RE. ConsistRE initiates by collecting a substantial corpus from external resources and employing statistical algorithms and semantics to identify keyword hints closely related to relation instances. These keyword hints are subsequently integrated as contextual constraints in sentence generation, ensuring the preservation of relation dependence and diversity with LLMs. Additionally, we implement syntactic dependency selection to enhance the syntactic structure of the generated sentences. Experimental results from the evaluation of SemEval, TACRED, and TACREV datasets unequivocally demonstrate that ConsistRE outperforms other baselines in F1 values by 1.76%, 3.92%, and 2.53%, respectively, particularly when operating under low-resource experimental conditions.

List of keywords

Natural Language Processing -> NLP: Language generation
Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Information retrieval and text mining
Natural Language Processing -> NLP: Resources and evaluation

4906

Diversification of Adaptive Policy for Effective Offline Reinforcement Learning

Yunseon Choi, Li Zhao, Chuheng Zhang, Lei Song, Jiang Bian, Kee-Eung Kim

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

Offline Reinforcement Learning (RL) aims to learn policies from pre-collected datasets that capture only a subset of the environment’s dynamics. The predominant approach has been to solve a constrained optimization formulation, which ensures that the policy visits state-action pairs within the support of the offline dataset. However, this approach has limited the ability to make decisions when the agent faces unknown parts of the environment at deployment time. To address the challenge of decision-making in out-of-support regions, model-based Bayes-adaptive approaches have been proposed by considering all dynamics models that could potentially be the true environment. Since it is generally infeasible to compute the posterior of all dynamics models based on the offline dataset, these approaches usually approximate the posterior by using a finite ensemble of highly probable dynamics models. Hence, the diversity of these models is the key to obtaining good policies. In this work, we propose MoDAP (Model-based Diverse Adaptive Policy Learning), an algorithm to enable the adaptive policy to make informed decisions in previously unexplored states. MoDAP adopts an iterative strategy that simultaneously training the policy and dynamics models. The policy optimization seeks to maximize expected returns across dynamics models, while the dynamics models are trained to promote policy diversification through the proposed information-theoretic objective. We evaluate MoDAP through experiments on the D4RL and NeoRL benchmarks, showcasing its performance superiority over state-of-the-art algorithms.

List of keywords

Machine Learning -> ML: Offline reinforcement learning
Machine Learning -> ML: Model-based and model learning reinforcement learning

4921

CONC: Complex-noise-resistant Open-set Node Classification with Adaptive Noise Detection

Qin Zhang, Jiexin Lu, Xiaowei Li, Huisi Wu, Shirui Pan, Junyang Chen

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

As a popular task in graph learning, node classification seeks to assign labels to nodes, taking into account both their features and connections. However, an important challenge for its application in real-world scenarios is the presence of newly-emerged out-of-distribution samples and noisy samples, which affect the quality and robustness of learned classifiers. Out-of-distribution (OOD) samples are often found in both the training and testing phases. Such samples don’t belong to any known categories. These OOD samples are considered as outliers (OOD noise) when they appear during training, and are recognized as open-set samples during the testing. Meanwhile, in-distribution (IND) noisy data, i.e., known class samples with wrong labels, are also prevalent and inevitably degrade a model’s performance. The challenge of open-set learning with complex IND and OOD noise remains largely unexplored, particularly when dealing with non-IID graph data. To address these challenges, this paper introduces a novel complex-noise-resistant open-set node classification approach, designed for open-set graph data containing both IND and OOD noisy nodes. Specifically, a trustworthiness learner is adopted to learn the trustworthiness rates of the feature and label for each node while a decoder and an open-set classifier are trained to reconstruct the structure of a node and to predict its category simultaneously with the guidance of node trustworthiness. The experimental results demonstrate the superiority of our method.

List of keywords

Machine Learning -> ML: Classification
Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Applications

4923

LeRet: Language-Empowered Retentive Network for Time Series Forecasting

Qihe Huang, Zhengyang Zhou, Kuo Yang, Gengyu Lin, Zhongchao Yi, Yang Wang

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Time series forecasting (TSF) plays a pivotal role in many real-world applications. Recently, the utilization of Large Language Models (LLM) in TSF has demonstrated exceptional predictive performance, surpassing most task-specific forecasting models. The success of LLM-based forecasting methods underscores the importance of causal dependence modeling and pre-trained knowledge transfer. However, challenges persist in directly applying LLM to TSF, i.e., the unacceptable parameter scales for resource-intensive model optimization, and the significant gap of feature space between structural numerical time series and natural language. To this end, we propose LeRet, a Language-empowered Retentive network for TSF. Technically, inspired by the causal extraction in LLM, we propose a causal dependence learner, enhanced by a patch-level pre-training task, to capture sequential causal evolution. To minimize the gap between numeric and language, we initialize a language description protocol for time series and design a TS-related language knowledge extractor to learn from language description, avoiding training with large-scale parameters. Finally, we dedicatedly achieve a Language-TS Modality Integrator for the fusion of two types data, and enable language-empowered sequence forecasting. Extensive evaluations demonstrate the effectiveness of our LeRet, especially reveal superiority on few-shot, and zero-shot forecasting tasks. Code is available at https://github.com/hqh0728/LeRet.

List of keywords

Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Applications
Machine Learning -> ML: Applications
Machine Learning -> ML: Regression

4926

Purpose Enhanced Reasoning through Iterative Prompting: Uncover Latent Robustness of ChatGPT on Code Comprehension

Yi Wang, Qidong Zhao, Dongkuan Xu, Xu Liu

6 min. talk | August 9th at 11:30 | Session: NLP: Natural Language Processing (3/3)

[+] More

[-] Less

Code comments are crucial for gaining in-depth insights to facilitate code comprehension. The key to obtaining these insights lies in precisely summarizing the main purpose of the code. Recent approaches on code comment generation lie in prompting large language models (LLMs) such as ChatGPT, instead of training/fine-tuning specific models. Although ChatGPT demonstrates an impressive performance in code comprehension, it still suffers from robustness challenges in consistently producing high-quality code comments. This is because ChatGPT prioritizes the semantics of code tokens, which makes it vulnerable to commonly encountered benign perturbations such as variable name replacements. This study proposes a modular prompting paradigm Perthept to effectively mitigate the negative effects caused by such minor perturbations. Perthept iteratively enhances the reasoning depth to reach the main purpose of the code. Perthept demonstrates robustness under the scenario where there is stochasticity or unreliability in ChatGPT’s responses. We give a comprehensive evaluation across four public datasets to show the consistent robustness improvement with our proposed methodology over other models.

List of keywords

Natural Language Processing -> NLP: Summarization
Natural Language Processing -> NLP: Tools

4932

Self-adaptive PSRO: Towards an Automatic Population-based Game Solver

Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Policy-Space Response Oracles (PSRO) as a general algorithmic framework has achieved state-of-the-art performance in learning equilibrium policies of two-player zero-sum games. However, the hand-crafted hyperparameter value selection in most of the existing works requires extensive domain knowledge, forming the main barrier to applying PSRO to different games. In this work, we make the first attempt to investigate the possibility of self-adaptively determining the optimal hyperparameter values in the PSRO framework. Our contributions are three-fold: (1) Using several hyperparameters, we propose a parametric PSRO that unifies the gradient descent ascent (GDA) and different PSRO variants. (2) We propose the self-adaptive PSRO (SPSRO) by casting the hyperparameter value selection of the parametric PSRO as a hyperparameter optimization (HPO) problem where our objective is to learn an HPO policy that can self-adaptively determine the optimal hyperparameter values during the running of the parametric PSRO. (3) To overcome the poor performance of online HPO methods, we propose a novel offline HPO approach to optimize the HPO policy based on the Transformer architecture. Experiments on various two-player zero-sum games demonstrate the superiority of SPSRO over different baselines.

List of keywords

4944

CF-Deformable DETR: An End-to-End Alignment-Free Model for Weakly Aligned Visible-Infrared Object Detection

Haolong Fu, Jin Yuan, Guojin Zhong, Xuan He, Jiacheng Lin, Zhiyong Li

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Weakly aligned visible-infrared object detection poses significant challenges due to the imprecise alignment between visible and infrared images. Most existing methods explore the alignment strategies between visible and infrared images, yielding unbearable computation costs. This paper first proposes an end-to-end alignment-free architecture Cross-modal Fusion Deformable DEtection TRansformer (“CF-Deformable DETR”) for weakly aligned visible-infrared object detection. Abandoning the traditional image alignment, CF-Deformable DETR introduces a simple yet effective cross-modal deformable attention mechanism to directly implement automatic cross-modal point mapping, generating well-aligned bimodal features with high efficiency. Moreover, we design a Point-level Feature Consistency Loss to guide the cross-modal point mapping, ensuring the consistency of paired features to support the following fusion. Extensive experiments are conducted on three benchmark datasets. The experimental results demonstrate that CF-Deformable DETR achieves close accuracy on weakly aligned and strictly aligned data as well as maintains stable performance to a certain extent against various offset degrees of weakly aligned data. Code is available at https://github.com/116508/CF-Deformable-DETR.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Multimodal learning

4959

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

Woo Kyung Kim, Minjong Yoo, Honguk Woo

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert’s patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

List of keywords

Machine Learning -> ML: Reinforcement learning

4979

A Self-explaining Neural Architecture for Generalizable Concept Learning

Sanchit Sinha, Guangzhi Xiong, Aidong Zhang

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level ‘concepts’ – abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems – lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Keeping these in mind, we propose a novel self-explaining architecture for concept learning across domains which – i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance. An appendix of the paper with more comprehensive results can also be viewed at https://arxiv.org/abs/2405.00349.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Explainable/Interpretable machine learning

4997

Personalized Federated Learning for Cross-City Traffic Prediction

Yu Zhang, Hua Lu, Ning Liu, Yonghui Xu, Qingzhong Li, Lizhen Cui

12 min. talk | August 6th at 15:00 | Session: MTA: Transportation

[+] More

[-] Less

Traffic prediction plays an important role in urban computing. However, many cities face data scarcity due to low levels of urban development. Although many approaches transfer knowledge from data-rich cities to data-scarce cities, the centralized training paradigm cannot uphold data privacy. For the sake of inter-city data privacy, Federated Learning has been used, which follows a decentralized training paradigm to enhance traffic knowledge of data-scarce cities. However, spatio-temporal data heterogeneity causes client drift, leading to unsatisfactory traffic prediction performance. In this work, we propose a novel personalized Federated learning method for Cross-city Traffic Prediction (pFedCTP). It learns traffic knowledge from multiple data-rich source cities and transfers the knowledge to a data-scarce target city while preserving inter-city data privacy. In the core of pFedCTP lies a Spatio-Temporal Neural Network (ST-Net) for clients to learn traffic representation. We decouple the ST-Net to learn space-independent traffic patterns to overcome cross-city spatial heterogeneity. Besides, pFedCTP adaptively interpolates the layer-wise global and local parameters to deal with temporal heterogeneity across cities. Extensive experiments on four real-world traffic datasets demonstrate significant advantages of pFedCTP over representative state-of-the-art methods.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Mining spatial and/or temporal data
Multidisciplinary Topics and Applications -> MTA: Transportation

5016

A Coarse-to-Fine Fusion Network for Event-Based Image Deblurring

Huan Li, Hailong Shi, Xingyu Gao

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Event-driven image deblurring is an innovative approach involving the input of events obtained from the event camera alongside blurred frames to facilitate the deblurring process. Unlike conventional cameras, event cameras in event-driven imaging exhibit low-latency characteristics and are immune to motion blur, resulting in significant advancements in image deblurring. In this paper, we propose a pioneering event-based coarse-to-fine image deblurring network named CFFNet. In contrast to existing deblurring methods, our approach incorporates event data, generating multiple coarse frames from a single frame before further refining them into a sharp image. We introduce an Event Image Fusion Block (EIFB) for the coarse fusion of events and images, producing coarse frames at different time points. Additionally, we propose a Bidirectional Frame Fusion Block (BFFB) for the fine fusion of coarse frames. CFFNet effectively harnesses the spatiotemporal information of event data through a comprehensive fusion process from coarse to fine. Experimental results on the GoPro and REBlur datasets demonstrate that our method achieves state-of-the-art performance for image deblurring task.

List of keywords

Computer Vision -> CV: Applications

5025

HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks

Nam Joon Kim, Jongho Lee, Hyun Kim

6 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

Hybrid models that combine CNNs and ViTs have recently emerged as state-of-the-art computer vision models. To efficiently deploy these hybrid models on resource-constrained mobile/edge devices, quantization is emerging as a promising solution. However, post-training quantization (PTQ), which does not require retraining or labeled data, has not been extensively studied for hybrid models. In this study, we propose a novel PTQ technique specialized for CNN-transformer hybrid models by considering the hardware design of hybrid models on AI accelerators such as GPUs and FPGAs. First, we introduce quantization-aware distribution scaling to address the large outliers caused by inter-channel variance in convolution layers. Furthermore, in the transformer block, we propose approximating the integer-only softmax with a linear function. This approach allows us to avoid costly FP32/INT32 multiplications, resulting in more efficient computations. Experimental results show that the proposed quantization method with INT8 precision demonstrated a 0.39% accuracy drop compared with the FP32 baseline on MobileViT-s with the ImageNet-1k dataset. Furthermore, when implemented on the FPGA platform, the proposed linear softmax achieved significant resource savings, reducing the look-up table and flip-flop usage by 1.8 ~ 2.1x and 1.3 ~ 1.9x, respectively, compared with the existing second-order polynomial approximation. The code is available at https://github.com/IDSL-SeoulTech/HyQ.

List of keywords

Machine Learning -> ML: Optimization
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Deep learning architectures

5040

MetaJND: A Meta-Learning Approach for Just Noticeable Difference Estimation

Miaohui Wang, Yukuan Zhu, Rong Zhang, Wuyuan Xie

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

The modeling of just noticeable difference (JND) in supervised learning for visual signals has made significant progress. However, existing JND models often suffer from limited generalization due to the need for large-scale training data and their constraints to certain image types. Moreover, these models primarily focus on a single RGB modality, ignoring the potential complementary impacts of multiple modalities. To address these challenges, we propose a new meta-learning approach for the JND modeling, called MetaJND. We introduce two key visual-sensitive modalities like saliency and depth, and leverage a self-attention mechanism for effective interdependence of multi-modal features. Additionally, we incorporate meta-learning for the modality alignment, facilitating dynamic weight generation. Furthermore, we perform hierarchical fusion through multi-layer channel and spatial feature rectification. Experimental results on four benchmark datasets demonstrate the effectiveness of our MetaJND. Moreover, we have also evaluated its performance in compression and watermarking applications, observing higher bit-rate savings and better watermark hiding capabilities.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Cognitive systems

5041

Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

Guangtao Zheng, Wenqian Ye, Aidong Zhang

6 min. talk | August 9th at 10:00 | Session: ML: Robustness

[+] More

[-] Less

Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.

List of keywords

Machine Learning -> ML: Robustness
Machine Learning -> ML: Knowledge-aided learning

5055

Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness in the Physical World

Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, Chao Chen

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introducing Detector Collapse (DC), a brand-new backdoor attack paradigm tailored for object detection. DC is designed to instantly incapacitate detectors (i.e., severely impairing detector’s performance and culminating in a denial-of-service). To this end, we develop two innovative attack schemes: Sponge for triggering widespread misidentifications and Blinding for rendering objects invisible. Remarkably, we introduce a novel poisoning strategy exploiting natural objects, enabling DC to act as a practical backdoor in real-world environments. Our experiments on different detectors across several benchmarks show a significant improvement (~10%-60% absolute and ~2-7x relative) in attack efficacy over state-of-the-art attacks.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

5059

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

6 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)

[+] More

[-] Less

Graph classification benchmarks, vital for assessing and developing graph neural network (GNN) models, have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world datasets, we found our metric to align with existing studies and intuitive assumptions. Finally, we explore the causes behind the low effectiveness of certain datasets by investigating the correlation between intrinsic graph properties and class labels, and we developed a novel technique supporting the correlation-controllable synthetic dataset generation. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Classification
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Supervised Learning

5099

BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

6 min. talk | August 9th at 10:00 | Session: ETF: Trustworthy AI

[+] More

[-] Less

3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting “triggers” to poison the training dataset, backdoor attacks manipulate the detector’s prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D triggers are injected to ensure their visibility within the sparse 3D space, rendering them easy to detect and impractical in real-world scenarios. In this paper, we delve into the robustness of 3D object detection, exploring a new backdoor attack surface through 2D cameras. Given the prevalent adoption of camera and LiDAR signal fusion for high-fidelity 3D perception, we investigate the latent potential of camera signals to disrupt the process. Although the dense nature of camera signals enables the use of nearly imperceptible small-sized triggers to mislead 2D object detection, realizing 2D-oriented backdoor attacks against 3D object detection is non-trivial. The primary challenge emerges from the fusion process that transforms camera signals into a 3D space, compromising the association with the 2D trigger to the target output. To tackle this issue, we propose an innovative 2D-oriented backdoor attack against LiDAR-camera fusion methods for 3D object detection, named BadFusion, for preserving trigger effectiveness throughout the entire fusion process. The evaluation demonstrates the effectiveness of BadFusion, achieving a significantly higher attack success rate compared to existing 2D-oriented attacks.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: 3D computer vision

5109

ReinforceNS: Reinforcement Learning-based Multi-start Neighborhood Search for Solving the Traveling Thief Problem

Tao Wu, Huachao Cui, Tao Guan, Yuesong Wang, Yan Jin

6 min. talk | August 6th at 11:30 | Session: S: Search and machine learning

[+] More

[-] Less

The Traveling Thief Problem (TTP) is a challenging combinatorial optimization problem with broad practical applications. TTP combines two NP-hard problems: the Traveling Salesman Problem (TSP) and Knapsack Problem (KP). While a number of machine learning and deep learning based algorithms have been developed for TSP and KP, there is limited research dedicated to TTP. In this paper, we present the first reinforcement learning based multi-start neighborhood search algorithm, denoted by ReinforceNS, for solving TTP. To accelerate the search, we employ a pre-processing procedure for neighborhood reduction. A TSP routing and an iterated greedy packing are independently utilized to construct a high-quality initial solution, further improved by a reinforcement learning based neighborhood search. Additionally, a post-optimization procedure is devised for continued solution improvement. We conduct extensive experiments on 60 commonly used benchmark instances with 76 to 33810 cities in the literature. The experimental results demonstrate that our proposed ReinforceNS algorithm outperforms three state-of-the-art algorithms in terms of solution quality with the same time limit. In particular, ReinforceNS achieves 12 new results for 18 instances publicly reported in a recent TTP competition. We also perform an additional experiment to validate the effectiveness of the reinforcement learning strategy.

List of keywords

Search -> S: Search and machine learning
Machine Learning -> ML: Reinforcement learning
Search -> S: Heuristic search

5130

Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation

Jiayi Chen, Rong Quan, Jie Qin

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easily lead to overfitting and overlooking intra-class appearance differences. In this paper, we propose a Doubly Matching Transformation-based Network (DMTNet) to solve the above issue. Instead of completely relying on support images, we propose Self-Matching Transformation (SMT) to construct query-specific transformation matrices based on query images themselves to transform domain-specific query features into domain-agnostic ones. Calculating query-specific transformation matrices can prevent overfitting, especially for the meta-testing stage where only one or several images are used as support images to segment hundreds or thousands of images. After obtaining domain-agnostic features, we exploit a Dual Hypercorrelation Construction (DHC) module to explore the hypercorrelations between the query image with the foreground and background of the support image, based on which foreground and background prediction maps are generated and supervised, respectively, to enhance the segmentation result. In addition, we propose a Test-time Self-Finetuning (TSF) strategy to more accurately self-tune the query prediction in unseen domains. Extensive experiments on four popular datasets show that DMTNet achieves superior performance over state-of-the-art approaches. Code is available at https://github.com/ChenJiayi68/DMTNet.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

5131

Reconstructing Missing Variables for Multivariate Time Series Forecasting via Conditional Generative Flows

Xuanming Hu, Wei Fan, Haifeng Chen, Pengyang Wang, Yanjie Fu

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

The Variable Subset Forecasting (VSF) problem, where the majority of variables are unavailable in the inference stage of multivariate forecasting, has been an important but under-explored task with broad impacts in many real-world applications. Missing values, absent inter-correlation, and the impracticality of retraining largely hinder the ability of multivariate forecasting models to capture inherent relationships among variables, impacting their performance. However, existing approaches towards these issues either heavily rely on local temporal correlation or face limitations in fully recovering missing information from the unavailable subset, accompanied by notable computational expenses. To address these problems, we propose a novel density estimation solution to recover the information of missing variables via flows-based generative framework. In particular, a novel generative network for time series, namely Time-series Reconstruction Flows (TRF), is proposed to estimate and reconstruct the missing variable subset. In addition, a novel meta-training framework, Variable-Agnostic Meta Learning, has been developed to enhance the generalization ability of TRF, enabling it to adapt to diverse missing variables situations. Finally, extensive experiments are conducted to demonstrate the superiority of our proposed method compared with baseline methods.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

5143

Vertical Symbolic Regression via Deep Policy Gradient

Nan Jiang, Md Nasim, Yexiang Xue

6 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

Vertical Symbolic Regression (VSR) has recently been proposed to expedite the discovery of symbolic equations with many independent variables from experimental data. VSR reduces the search spaces following the vertical discovery path by building from reduced-form equations involving a subset of variables to all variables. While deep neural networks have shown promise in enhancing symbolic regression, directly integrating VSR with deep networks faces challenges such as gradient propagation and engineering complexities due to the tree representation of expressions. We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants. Our VSR-DPG models symbolic regression as a sequential decision-making process, in which equations are built from repeated applications of grammar rules. The integrated deep model is trained to maximize a policy gradient objective. Experimental results demonstrate that our VSR-DPG significantly outperforms popular baselines in identifying both algebraic equations and ordinary differential equations on a series of benchmarks.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Machine Learning -> ML: Symbolic methods

5149

EC-SNN: Splitting Deep Spiking Neural Networks for Edge Devices

Di Yu, Xin Du, Linshan Jiang, Wentao Tong, Shuiguang Deng

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Deep Spiking Neural Networks (SNNs), as an advanced form of SNNs characterized by their multi-layered structure, have recently achieved significant breakthroughs in performance across various domains. The biological plausibility and energy efficiency of SNNs naturally align with the requisites of edge computing (EC) scenarios, thereby prompting increased interest among researchers to explore the migration of these deep SNN models onto edge devices such as sensors and smartphones. However, the progress of migration work has been notably challenging due to the influence of the substantial increase in model parameters and the demanding computational requirements in practical applications. In this work, we propose a deep SNN splitting framework named EC-SNN to run the intricate SNN models on edge devices. We first partition the full SNN models into smaller sub-models to allocate their model parameters on multiple edge devices. Then, we provide a channel-wise pruning method to reduce the size of each sub-model, thereby further reducing the computational load. We design extensive experiments on six datasets (i.e., four non-neuromorphic and two neuromorphic datasets) to substantiate that our approach can significantly diminish the inference execution latency on edge devices and reduce the overall energy consumption per deployed device with an average reduction of 60.7% and 27.7% respectively while keeping the effectiveness of the accuracy.

List of keywords

Machine Learning -> ML: Applications
Humans and AI -> HAI: Applications
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction

5158

VF-Detector: Making Multi-Granularity Code Changes on Vulnerability Fix Detector Robust to Mislabeled Changes

Zhenkan Fu, Shikai Guo, Hui Li, Rong Chen, Xiaochen Li, He Jiang

6 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

As software development projects increasingly rely on open-source software, users face the risk of security vulnerabilities from third-party libraries. To address label and character noise in code changes, we present VF-Detector to automatically identifying bug-fix commits in actual noise development environment. VF-Detector consists of three componments: Data Pre-processing (DP), Vulnerability Confidence Computation (VCC) and Confidence Learning Denoising (CLD). The DP component is responsible for preprocessing code change data. The VCC component calculates code change confidence value for each bug-fix by extracting features at various granularity levels. The CLD component removes noise and enhances model robustness by pruning noisy data with confidence values and performing effort-aware adjustments. Experimental results demonstrate VF-Detector’s superiority over state-of-the-art methods in EffortCost@L and Popt@L metrics on Java and Python datasets. The improvements were 6.5% and 5% for Java, and 23.4% and 17.8% for Python.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Software engineering
Agent-based and Multi-agent Systems -> MAS: Trust and reputation
Data Mining -> DM: Applications
Machine Learning -> ML: Applications

5189

SVD-AE: Simple Autoencoders for Collaborative Filtering

Seoyoung Hong, Jeongwhan Choi, Yeon-Chang Lee, Srijan Kumar, Noseong Park

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficiency, and robustness. In particular, there are no well-designed closed-form studies for balanced CF in terms of the aforementioned trade-offs. In this paper, we design SVD-AE, a simple yet effective singular vector decomposition (SVD)-based linear autoencoder, whose closed-form solution can be defined based on SVD for CF. SVD-AE does not require iterative training processes as its closed-form solution can be calculated at once. Furthermore, given the noisy nature of the rating matrix, we explore the robustness against such noisy interactions of existing CF methods and our SVD-AE. As a result, we demonstrate that our simple design choice based on truncated SVD can be used to strengthen the noise robustness of the recommendation while improving efficiency. Code is available at https://github.com/seoyoungh/svd-ae.

List of keywords

Data Mining -> DM: Collaborative filtering
Multidisciplinary Topics and Applications -> MTA: Web and social networks
Data Mining -> DM: Information retrieval

5196

Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts

Haodong Hong, Sen Wang, Zi Huang, Qi Wu, Jiajun Liu

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Current Vision-and-Language Navigation (VLN) tasks mainly employ textual instructions to guide agents. However, being inherently abstract, the same textual instruction can be associated with different visual signals, causing severe ambiguity and limiting the transfer of prior knowledge in the vision domain from the user to the agent. To fill this gap, we propose Vision-and-Language Navigation with Multi-modal Prompts (VLN-MP), a novel task augmenting traditional VLN by integrating both natural language and images in instructions. VLN-MP not only maintains backward compatibility by effectively handling text-only prompts but also consistently shows advantages with different quantities and relevance of visual prompts. Possible forms of visual prompts include both exact and similar object images, providing adaptability and versatility in diverse navigation scenarios. To evaluate VLN-MP under a unified framework, we implement a new benchmark that offers: (1) a training-free pipeline to transform textual instructions into multi-modal forms with landmark images; (2) diverse datasets with multi-modal instructions for different downstream tasks; (3) a novel module designed to process various image prompts for seamless integration with state-of-the-art VLN models. Extensive experiments on four VLN benchmarks (R2R, RxR, REVERIE, CVDN) show that incorporating visual prompts would significantly boost navigation performance. While maintaining efficiency with text-only prompts, VLN-MP enables agents to navigate in the pre-explore setting and outperform text-based models, showing its broader applicability. Code is available at https://github.com/honghd16/VLN-MP.

List of keywords

Computer Vision -> CV: Vision, language and reasoning
Computer Vision -> CV: Multimodal learning
Machine Learning -> ML: Multi-modal learning

5198

A Transformer-Based Adaptive Prototype Matching Network for Few-Shot Semantic Segmentation

Sihan Chen, Yadang Chen, Yuhui Zheng, Zhi-Xin Yang, Enhua Wu

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Few-shot semantic segmentation (FSS) aims to generate a model for segmenting novel classes using a limited number of annotated samples. Previous FSS methods have shown sensitivity to background noise due to inherent bias, attention bias, and spatial-aware bias. In this study, we propose a Transformer-Based Adaptive Prototype Matching Network to establish robust matching relationships by improving the semantic and spatial perception of query features. The model includes three modules: target enhancement module (TEM), dual constraint aggregation module (DCAM), and dual classification module (DCM). In particular, TEM mitigates inherent bias by exploring the relevance of multi-scale local context to enhance foreground features. Then, DCAM addresses attention bias through the dual semantic-aware attention mechanism to strengthen constraints. Finally, the DCM module decouples the segmentation task into semantic alignment and spatial alignment to alleviate spatial-aware bias. Extensive experiments on PASCAL-5i and COCO-20i confirm the effectiveness of our approach.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

5205

PHSIC against Random Consistency and Its Application in Causal Inference

Jue Li, Yuhua Qian, Jieting Wang, Saixiong Liu

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

The Hilbert-Schmidt Independence Criterion (HSIC) based on kernel functions is capable of detecting nonlinear dependencies between variables, making it a common method for association relationship mining. However, in situations with small samples, high dimensions, or noisy data, it may generate spurious associations, causing two unrelated variables to have certain scores. To address this issue, we propose a novel criterion, named as Pure Hilbert-Schmidt Independence Criterion (PHSIC). PHSIC is achieved by subtracting the mean HSIC obtained under random conditions from the original HSIC value. We demonstrate three significant advantages of PHSIC through theoretical and simulation experiments: (1) PHSIC has a baseline of zero, enhancing the interpretability of HSIC. (2) Compared to HSIC, PHSIC exhibits lower bias. (3) PHSIC enables a fairer comparison across different samples and dimensions. To validate the effectiveness of PHSIC, we apply it to multiple causal inference tasks to measure the independence between cause and residual. Experimental results demonstrate that the causal model based on PHSIC performs well compared to other methods in scenarios involving small sample sizes and noisy data, both in real and simulated datasets.

List of keywords

Data Mining -> DM: Exploratory data mining
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Causality

5207

Automated CPU Design by Learning from Input-Output Examples

Shuyao Cheng, Pengwei Jin, Qi Guo, Zidong Du, Rui Zhang, Xing Hu, Yongwei Zhao, Yifan Hao, Xiangtao Guan, Husheng Han, Zhengyue Zhao, Ximing Liu, Xishan Zhang, Yuejie Chu, Weilong Mao, Tianshi Chen, Yunji Chen

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Designing a central processing unit (CPU) requires intensive manual work of talented experts to implement the circuit logic from design specifications. Although considerable progress has been made in electronic design automation (EDA) to relieve human efforts, all existing EDA tools require hand-crafted formal program codes (e.g., Verilog, Chisel, or C) as the input. To automate the CPU design without human programming, we are motivated to learn the CPU design from only input-output (IO) examples. The key challenge is that the learned CPU design should have almost zero tolerance for inaccuracy, which makes well-known approximate algorithms such as neural networks ineffective. We propose a new AI approach to generate the CPU design in the form of a large-scale Boolean function, from only external IO examples instead of formal program code. This approach employs a novel graph structure called Binary Speculative Diagram (BSD) to approximate the CPU-scale Boolean function accurately. We propose an efficient BSD expansion method based on Boolean Distance, a new metric to quantitatively measure the structural similarity between Boolean functions, gradually increasing the design accuracy up to 100%. Our approach generates an industrial-scale RISC-V CPU design within 5 hours, reducing the design cycle by about 1000x without human involvement. The taped-out chip, Enlightenment-1, the world’s first CPU designed by AI, successfully runs the Linux operating system and performs comparably against the human-design Intel 80486SX CPU. Our approach even autonomously discovers human knowledge of the von Neumann architecture.

List of keywords

Machine Learning -> ML: Applications

5227

Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

6 min. talk | August 8th at 15:00 | Session: ML: Machine Learning (6/6)

[+] More

[-] Less

Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in segmentation tasks. In response, we propose an efficient Predictive Accuracy-based Active Learning (PAAL) method for medical image segmentation, first introducing predictive accuracy to define uncertainty. Specifically, PAAL mainly consists of an Accuracy Predictor (AP) and a Weighted Polling Strategy (WPS). The former is an attached learnable module that can accurately predict the segmentation accuracy of unlabeled samples relative to the target model with the predicted posterior probability. The latter provides an efficient hybrid querying scheme by combining predicted accuracy and feature representation, aiming to ensure the uncertainty and diversity of the acquired samples. Extensive experiment results on multiple datasets demonstrate the superiority of PAAL. PAAL achieves comparable accuracy to fully annotated data while reducing annotation costs by approximately 50% to 80%, showcasing significant potential in clinical applications. The code is available at https://github.com/shijun18/PAAL-MedSeg.

List of keywords

Machine Learning -> ML: Active learning
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
Uncertainty in AI -> UAI: Uncertainty representations

5237

The Distortion of Threshold Approval Matching

Mohamad Latifian, Alexandros A. Voudouris

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

We study matching settings in which a set of agents have private utilities over a set of items. Each agent reports a partition of the items into approval sets of different threshold utility levels. Given this limited information on input, the goal is to compute an assignment of the items to the agents (subject to cardinality constraints depending on the application) that (approximately) maximizes the social welfare (the total utility of the agents for their assigned items). We first consider the well-known, simple one-sided matching problem in which each of a set of agents is to be assigned exactly one item. We show tight bounds on distortion of deterministic and randomized matching algorithms that are functions of the number of threshold utility levels. We further show that our distortion bounds extend to a more general setting in which there are multiple copies of the items, each agent can be assigned a number of items (even copies of the same one) up to a capacity, and the utility of an agent for an item depends on the number of its copies that the agent is given.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Mechanism design

5243

Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

Jiahui Pan, Pengjie Shen, Hui Zhang, Xueliang Zhang

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

Multi-channel speech enhancement leverages multiple microphones to extract target speech signals amid background noise. Effectively utilizing directional cues is key for robust enhancement. While deep learning shows promise for multi-channel speech processing, most methods operate on short-time Fourier transform (STFT) coefficients directly. We propose using spherical harmonics transform (SHT) coefficients as auxiliary inputs to models. which concisely represent spatial distributions. SHT allows signals from varying numbers of microphones to be converted into coefficients of a consistent dimension. The proposed technique enables a single model to generalize to microphone arrays with varying configurations, rather than requiring a specialized model for each array layout. We present two architectures with SHT-based auxiliary inputs: parallel and serial. Specifically, the parallel model contains two encoders – one for STFT and another for SHT. By fusing both encoders’ outputs in the decoder to estimate the enhanced STFT, it effectively incorporates spatial context. For the serial approach, we first apply SHT to the signals and then take STFT of the transformed signals as network inputs. Evaluations of the TIMIT dataset under fluctuating noise and reverberation demonstrate our model outperforms established benchmarks. Remarkably, these results are attained with reduced computations and parameters. Furthermore, experiments on the MS-SNSD dataset show the proposed method can enhance the generalization ability of networks. The source code is publicly accessible at https://github.com/Pandade1997/SH_injection.

List of keywords

Natural Language Processing -> NLP: Information extraction
Machine Learning -> ML: Applications
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Trustworthy machine learning

5245

Guiding GBFS through Learned Pairwise Rankings

Mingyu Hao, Felipe Trevizan, Sylvie Thiébaux, Patrick Ferber, Jörg Hoffmann

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

We propose a new approach based on ranking to learn to guide Greedy Best-First Search (GBFS). As previous ranking approaches, ours is based on the observation that directly learning a heuristic function is overly restrictive, and that GBFS is capable of efficiently finding good plans for a much more flexible class of total quasi-orders over states. In order to learn an optimal ranking function, we introduce a new ranking framework capable of leveraging any neural network regression model and efficiently handling the training data through batching. Compared with previous ranking approaches for planning, ours does not require complex loss functions and allows training on states outside the optimal plan with minimal overhead. Our experiments on the domains of the latest planning competition learning track show that our approach substantially improves the coverage of the underlying neural network models without degrading plan quality.

List of keywords

Planning and Scheduling -> PS: Learning in planning and scheduling

5253

Diversifying Training Pool Predictability for Zero-shot Coordination: A Theory of Mind Approach

Dung Nguyen, Hung Le, Kien Do, Sunil Gupta, Svetha Venkatesh, Truyen Tran

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

The challenge in constructing artificial social agents is to enable adaptation ability to novel agents, and is called zero-shot coordination (ZSC). A promising approach is to train the adaptive agents by interacting with a diverse pool of collaborators, assuming that the greater the diversity in other agents seen during training, the better the generalisation. In this paper, we explore an alternative procedure by considering the behavioural predictability of collaborators, i.e. whether their actions and intentions are predictable, and use it to select a diverse set of agents for the training pool. More specifically, we develop a pool of agents through self-play training during which agents’ behaviour evolves and has diversity in levels of behavioural predictability (LoBP) through its evolution. We construct an observer to compute the level of behavioural predictability for each version of the collaborators. To do so, the observer is equipped with the theory of mind (ToM) capability to learn to infer the actions and intentions of others. We then use an episodic memory based on the LoBP metric to maintain agents with different levels of behavioural predictability in the pool of agents. Since behaviours that emerge at the later training phase are more complex and meaningful, the memory is updated with the latest versions of training agents. Our extensive experiments demonstrate that LoBP-based diversity training leads to better ZSC than other diversity training methods.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Multiagent Reinforcement Learning
Machine Learning -> ML: Reinforcement learning

5275

Layered Graph Security Games

Jakub Cerny, Chun Kai Ling, Christian Kroer, Garud Iyengar

12 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

Security games model strategic interactions in adversarial real-world applications. Such applications often involve extremely large but highly structured strategy sets (e.g., selecting a distribution over all patrol routes in a given graph). In this paper, we represent each player’s strategy space using a layered graph whose paths represent an exponentially large strategy space. Our formulation entails not only classic pursuit-evasion games, but also other security games, such as those modeling anti-terrorism and logistical interdiction. We study two-player zero-sum games under two distinct utility models: linear and binary utilities. We show that under linear utilities, Nash equilibrium can be computed in polynomial time, while binary utilities may lead to situations where even computing a best-response is computationally intractable. To this end, we propose a practical algorithm based on incremental strategy generation and mixed integer linear programs. We show through extensive experiments that our algorithm efficiently computes epsilon-equilibrium for many games of interest. We find that target values and graph structure often have a larger influence on running times as compared to the size of the graph per se.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

5300

A Graph-based Representation Framework for Trajectory Recovery via Spatiotemporal Interval-Informed Seq2Seq

Yaya Zhao, Kaiqi Zhao, Zhiqian Chen, Yuanyuan Zhang, Yalei Du, Xiaoling Lu

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

The prevalent issue in urban trajectory data usage, notably in low-sample rate datasets, revolves around the accuracy of travel time estimations, traffic flow predictions, and trajectory similarity measurements. Conventional methods, often relying on simplistic mixes of static road networks and raw GPS data, fail to adequately integrate both network and trajectory dimensions. Addressing this, the innovative GRFTrajRec framework offers a graph-based solution for trajectory recovery. Its key feature is a trajectory-aware graph representation, enhancing the understanding of trajectory-road network interactions and facilitating the extraction of detailed embedding features for road segments. Additionally, GRFTrajRec’s trajectory representation acutely captures spatiotemporal attributes of trajectory points. Central to this framework is a novel spatiotemporal interval-informed seq2seq model, integrating an attention-enhanced transformer and a feature differences-aware decoder. This model specifically excels in handling spatiotemporal intervals, crucial for restoring missing GPS points in low-sample datasets. Validated through extensive experiments on two large real-life trajectory datasets, GRFTrajRec has proven its efficacy in significantly boosting prediction accuracy and spatial consistency.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining

5315

FedES: Federated Early-Stopping for Hindering Memorizing Heterogeneous Label Noise

Bixiao Zeng, Xiaodong Yang, Yiqiang Chen, Zhiqi Shen, Hanchao Yu, Yingwei Zhang

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

Federated learning (FL) facilitates collaborative model training across distributed clients while maintaining privacy. Federated noisy label learning (FNLL) is more of a challenge for data inaccessibility and noise heterogeneity. Existing works primarily assume clients are either noisy or clean, which may lack the flexibility to adapt to diverse label noise across different clients, especially when entirely clean or noisy clients are not the majority. To address this, we propose a general noise-robust federated learning framework called Federated Early-Stopping (FedES), which adaptively updates critical parameters of each local model based on their noise rates, thereby avoiding overfitting to noisy labels. FedES is composed of two stages: federated noise estimation and parameter-adaptive local updating \& global aggregation. We introduce a signed distance based on local and global gradients during a federated round to estimate clients’ noise rates without requiring additional information. Based on this measure, we employ various degrees of early-stopping during local updating on the clients, and further, a noise-aware global aggregation is employed to achieve noise-robust learning. Extensive experiments conducted on varying synthetic and real-world label noise demonstrate the superior performance of FedES over the state-of-the-art methods.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Applications
Machine Learning -> ML: Weakly supervised learning

5337

PDENNEval: A Comprehensive Evaluation of Neural Network Methods for Solving PDEs

Ping Wei, Menghan Liu, Jianhuan Cen, Ziyang Zhou, Liao Chen, Qingsong Zou

12 min. talk | August 7th at 10:00 | Session: ML: Evaluation

[+] More

[-] Less

The rapid development of neural network (NN) methods for solving partial differential equations (PDEs) has created an urgent need for evaluation and comparison of these methods. In this study, we propose PDENNEval, a comprehensive and systematic evaluation of 12 NN methods for PDEs. These methods are classified into function learning type and operator learning type based on their different mathematical foundations. The evaluation is implemented using a diverse dataset comprising 19 distinct PDE problems selected from various scientific fields such as fluid, materials, finance, and electromagnetic. Several evaluation results are reported, aiming to provide guidance for further research in this field. Our code and data are publicly available at https://github.com/zhouzy36/PDENNEval.

List of keywords

Machine Learning -> ML: Evaluation
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MTA: Physical sciences

5345

NanoAdapt: Mitigating Negative Transfer in Test Time Adaptation with Extremely Small Batch Sizes

Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang

12 min. talk | August 9th at 10:00 | Session: ML: Multi-task and transfer learning

[+] More

[-] Less

Test Time Adaptation (TTA) has garnered significant attention in recent years, with the research focus on addressing distribution shifts during test time. As one fundamental component of many TTA methods, the Batch Normalization (BN) layer plays a crucial role in enabling the model adaptability. However, existing BN strategies can prove detrimental when the batch size is (extremely) small. In numerous real-world scenarios, limited hardware resources or just-in-time demand often necessitates adjusting models with very small batch sizes, making existing methods less practical. In this paper, we first showcase and thoroughly analyze the negative transfer phenomenon in previous TTA methods encountering extremely small batch sizes. Subsequently, we propose a novel batch size-agnostic method called NanoAdapt to effectively mitigate the negative transfer even with batch size 1. NanoAdapt is composed of three key components: a dynamic BN calibration strategy that leverages historical information and the Taylor series to refine the statistics estimations, an entropy-weighted gradient accumulation strategy that uses the entropy of each sample’s label prediction to weigh and accumulate the loss for backpropagation, and a novel proxy computation graph to capture the sample interactions. Extensive experiments are conducted to validate the superiority of NanoAdapt, showing its consistent efficacy in improving existing TTA methods.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

5353

Metric Distortion with Elicited Pairwise Comparisons

Soroush Ebadian, Daniel Halpern, Evi Micha

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

In many social choice applications, information about individuals’ preferences can only be elicited using a limited number of pairwise comparisons. In these cases, the task is twofold: we must first choose the queries, and then second, we must aggregate the responses to choose an outcome. We study the problem of designing algorithms for this setting. To compare the effectiveness of different outcomes, we use the metric distortion framework. In addition, we consider various constraints on the query algorithms, namely, placing restrictions on how the choice of the next query may depend on previous answers. Our main contributions are nearly optimal algorithms for all settings considered.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

5354

Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO

Jasmin Brandt, Marcel Wever, Viktor Bengs, Eyke Hüllermeier

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

List of keywords

Machine Learning -> ML: Multi-armed bandits
Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Incremental learning

5363

Multimodal Representation Distribution Learning for Medical Image Segmentation

Chao Huang, Weichao Cai, Qiuping Jiang, Zhihua Wang

12 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Medical image segmentation is one of the most critical tasks in medical image analysis. However, the performance of existing methods is limited by the lack of high-quality labeled data due to the expensive data annotation. To alleviate this limitation, we propose a novel multi-modal learning method for medical image segmentation. In our method, medical text annotation is incorporated to compensate for the quality deficiency in image data. Moreover, previous multi-modal fusion methods ignore the commonalities and differences between different modalities. Ideally, the fused features should maximize valuable information while minimizing redundant information. To achieve this goal, we propose a multimodal feature distribution learning method. It is adopted to model the commonalities and differences between text and image. Since medical image segmentation needs to predict detailed segmentation boundaries, we also design a prompt encoder to achieve fine-grained segmentation. Experimental results on three datasets show that the proposed method obtains superior segmentation performance. Source codes will be available at https://github.com/GPIOX/Multimodal.git.

List of keywords

Machine Learning -> ML: Multi-modal learning
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Representation learning

5391

Efficient Multi-view Unsupervised Feature Selection with Adaptive Structure Learning and Inference

Chenglong Zhang, Yang Fang, Xinyan Liang, Han Zhang, Peng Zhou, Xingyu Wu, Jie Yang, Bingbing Jiang, Weiguo Sheng

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

As data with diverse representations become high-dimensional, multi-view unsupervised feature selection has been an important learning paradigm. Generally, existing methods encounter the following challenges: (i) traditional solutions either concatenate different views or introduce extra parameters to weight them, affecting the performance and applicability; (ii) emphasis is typically placed on graph construction, yet disregarding the clustering information of data; (iii) exploring the similarity structure of all samples from the original features is suboptimal and extremely time-consuming. To solve this dilemma, we propose an efficient multi-view unsupervised feature selection (EMUFS) to construct bipartite graphs between samples and anchors. Specifically, a parameter-free manner is devised to collaboratively fuse the membership matrices and graphs to learn the compatible structure information across all views, naturally balancing different views. Moreover, EMUFS leverages the similarity relations of data in the feature subspace induced by l2,0-norm to dynamically update the graph. Accordingly, the cluster information of anchors can be accurately propagated to samples via the graph structure and further guide feature selection, enhancing the quality of selected features and the computational costs in solution processes. A convergent optimization is developed to solve the formulated problem, and experiments demonstrate the effectiveness and efficiency of EMUFS.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Machine Learning -> ML: Unsupervised learning

5395

Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models

Yixuan Tang, Anthony K. H. Tung

12 min. talk | August 8th at 10:00 | Session: NLP: Speech

[+] More

[-] Less

Automatic Speech Recognition (ASR) systems have witnessed notable advancements in recent years. Contextualized ASR tasks require recognizing speech not as isolated utterances but within the broader context in which they occur. Conventional approaches often employ a second-pass paradigm to re-rank initial transcriptions, yet they risk propagating errors across candidate hypotheses, thereby compromising recognition precision. In this study, we introduce a novel framework that diverges from typical second-pass rescoring methods. Given n-best hypotheses, we leverage prompting with a large language model for contextualized second-pass generation. Besides pursuing higher accuracy, we aim to explore the performance boundaries without substantially altering the underlying pre-trained speech and language models. We investigate the effectiveness of the proposed paradigm through zero-shot prompting and strategic low-rank adaptation tuning. On the multi-accent spoken reading comprehension benchmark SQuAD-SRC, both prompting and fine-tuned models outperform the 1-best ASR hypothesis, achieving notable relative Word Error Rate (WER) improvements of 13.6% and 45.9%, respectively. The results suggest that the proposed approach enhances transcription accuracy and contextual understanding.

List of keywords

Natural Language Processing -> NLP: Speech

5455

Intention Progression with Temporally Extended Goals

Yuan Yao, Natasha Alechina, Brian Logan

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

The Belief-Desire-Intention (BDI) approach to agent development has formed the basis for much of the research on architectures for autonomous agents. A key advantage of the BDI approach is that agents may pursue multiple intentions in parallel. However, previous approaches to managing possible interactions between concurrently executing intentions are limited to interactions between simple achievement goals (and in some cases maintenance goals). In this paper we present a new approach to intention progression for agents with temporally extended goals which allow mixing reachability and invariant properties, e.g., “travel to location A while not exceeding a gradient of 5%”. Temporally extended goals may be specified at run-time (top-level goals), and as subgoals in plans. In addition, our approach allows human-authored plans and plans implemented as RL policies to be freely mixed in an agent program, allowing the development of agents with `neuro-symbolic’ architectures.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools

5461

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

Xu Wang, Cheng Li, Yi Chang, Jindong Wang, Yuan Wu

12 min. talk | August 7th at 10:00 | Session: NLP: Sentiment analysis, stylistic analysis, and argument mining

[+] More

[-] Less

Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt’s influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Applications

5467

CIC: A Framework for Culturally-Aware Image Captioning

Youngsik Yun, Jihie Kim

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, Culturally-aware Image Captioning (CIC), that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions when compared to the image captioning baseline based on VLPs. Resources can be found at https://shane3606.github.io/cic.

List of keywords

Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision, language and reasoning

5468

DBPNet: Dual-Branch Parallel Network with Temporal-Frequency Fusion for Auditory Attention Detection

Qinke Ni, Hongyu Zhang, Cunhang Fan, Shengbing Pei, Chang Zhou, Zhao Lv

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Auditory attention decoding (AAD) aims to recognize the attended speaker based on electroencephalography (EEG) signals in multi-talker environments. Most AAD methods only focus on the temporal or frequency domain, but neglect the relationships between these two domains, which results in the inability to simultaneously consider both time-varying and spectral-spatial information. To address this issue, this paper proposes a dual-branch parallel network with temporal-frequency fusion for AAD, named DBPNet, which consists of the temporal attentive branch and the frequency residual branch. Specifically, the temporal attentive branch aims to capture the time-varying features in the EEG time-series signal. The frequency residual branch aims to extract spectral-spatial features of multi-band EEG signals by the residual convolution. Finally, these dual branches are fused to consider both EEG signals time-varying and spectral-spatial features and get classification results. Experimental results show that compared with the best baseline, DBPNet achieves a relative improvement of 20.4% with a 0.1-second decision window for the MM-AAD dataset, but the number of trainable parameters is reduced by about 91 times.

List of keywords

Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive modeling

5479

Sample Quality Heterogeneity-aware Federated Causal Discovery through Adaptive Variable Space Selection

Xianjie Guo, Kui Yu, Hao Wang, Lizhen Cui, Han Yu, Xiaoxiao Li

12 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Federated causal discovery (FCD) aims to uncover causal relationships among variables from decentralized data across multiple clients, while preserving data privacy. In practice, the sample quality of each client’s local data may vary across different variable spaces, referred to as sample quality heterogeneity. Thus, data from different clients might be suitable for learning different causal relationships among variables. Model aggregated under existing FCD methods requires the entire model parameters from each client, thereby being unable to handle the sample quality heterogeneity issue. In this paper, we propose the Federated Adaptive Causal Discovery (FedACD) method to bridge this gap. During federated model aggregation, it adaptively selects the causal relationships learned under the "good" variable space (i.e., one with high-quality samples) from each client, while masking those learned under the "bad" variable space (i.e., one with low-quality samples). This way, each client only needs to send the optimal learning results to the server, achieving accurate FCD. Extensive experiments on various types of datasets demonstrate significant advantages of FedACD over existing methods. The source code is available at https://github.com/Xianjie-Guo/FedACD.

List of keywords

Machine Learning -> ML: Federated learning
Knowledge Representation and Reasoning -> KRR: Causality
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

5508

Redefining Contributions: Shapley-Driven Federated Learning

Nurbek Tastan, Samar Fares, Toluwani Aremu, Samuel Horváth, Karthik Nandakumar

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

Federated learning (FL) has emerged as a pivotal approach in machine learning, enabling multiple participants to collaboratively train a global model without sharing raw data. While FL finds applications in various domains such as healthcare and finance, it is challenging to ensure global model convergence when participants do not contribute equally and/or honestly. To overcome this challenge, principled mechanisms are required to evaluate the contributions made by individual participants in the FL setting. Existing solutions for contribution assessment rely on general accuracy evaluation, often failing to capture nuanced dynamics and class-specific influences. This paper proposes a novel contribution assessment method called ShapFed for fine-grained evaluation of participant contributions in FL. Our approach uses Shapley values from cooperative game theory to provide a granular understanding of class-specific influences. Based on ShapFed, we introduce a weighted aggregation method called ShapFed-WA, which outperforms conventional federated averaging, especially in class-imbalanced scenarios. Personalizing participant updates based on their contributions further enhances collaborative fairness by delivering differentiated models commensurate with the participant contributions. Experiments on CIFAR-10, Chest X-Ray, and Fed-ISIC2019 datasets demonstrate the effectiveness of our approach in improving utility, efficiency, and fairness in FL systems. The code can be found at \href{https://github.com/tnurbek/shapfed}{https://github.com/tnurbek/shapfed}.

List of keywords

Machine Learning -> ML: Federated learning
Computer Vision -> CV: Bias, fairness and privacy
Machine Learning -> ML: Trustworthy machine learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

5533

Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Silvan Mertes, Tobias Huber, Christina Karle, Katharina Weitz, Ruben Schlagowski, Cristina Conati, Elisabeth André

12 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI’s decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user’s mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI’s input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI’s decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Bias
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Classification

5535

Computational Aspects of Progression for Temporal Equilibrium Logic

Thomas Eiter, Davide Soldà

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

Temporal logic plays a crucial role in specifying and reasoning about dynamic systems, where temporal constraints and properties to be monitored are essential. Traditional approaches like LTL-monitoring assume monotonicity, which limits their applicability to scenarios involving non-monotonic temporal properties. We delve into complexity aspects of monitoring temporal specifications using non-monotonic Temporal Equilibrium Logic (TEL), a temporal extension of Answer Set Programming defined over Temporal Here and There Logic (THT) with a minimality criterion enforcing stable models. Notably, we study the complexity gap between monitoring properties in THT and TEL semantics, and the complexity of monitoring approximations based on progression, which is widely used in verification and in AI. In that, we pay particular attention to the fragment of temporal logic programs.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis

5536

Task-Agnostic Self-Distillation for Few-Shot Action Recognition

Bin Zhang, Yuanjie Dang, Peng Chen, Ronghua Liang, Nan Gao, Ruohong Huan, Xiaofei He

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Task-oriented matching is one of the core aspects of few-shot Action Recognition. Most previous works leverage the metric features within the support and query sets of individual tasks, without considering the metric information across different matching tasks. This oversight represents a significant limitation in this task. Specifically, the task-specific metric feature can decrease the generalization ability and ignore the general matching feature applicable across different tasks. To address these challenges, we propose a novel meta-distillation framework for few-shot action recognition that learns the task-agnostic metric features and generalizes them to different tasks. First, to extract the task-agnostic metric information, we design a task-based self-distillation framework to learn the metric features from the training process progressively. Additionally, to enable the model with fine-grained matching capabilities, we design a multi-dimensional distillation module that extracts more detailed relations from the temporal, spatial, and channel dimensions within video pairs and improves the representative performance of metric features for each individual task. After that, the few-shot predictions can be obtained by feeding the embedded task-agnostic metric features to a common feature matcher. Extensive experimental results on standard datasets demonstrate our method’s superior performance compared to existing state-of-the-art methods.

List of keywords

Machine Learning -> ML: Few-shot learning
Computer Vision -> CV: Action and behavior recognition
Machine Learning -> ML: Meta-learning

5543

1DFormer: A Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking

Shi Yin, Shijie Huang, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, Baocai Yin, Cong Liu

6 min. talk | August 7th at 11:30 | Session: CV: Biometrics, face, gesture and pose recognition

[+] More

[-] Less

Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative 1D landmark representations by capturing the dynamic and the geometric patterns of landmarks via token communications in both temporal and spatial dimensions for facial landmark tracking. For temporal modeling, we propose a confidence-enhanced multi-head attention mechanism with a recurrently token mixing strategy to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group geometric encoding mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers. Experimental results on the 300VW and the TF databases show that 1DFormer successfully models the long-range sequential patterns as well as the inherent facial structures to learn informative 1D representations of landmark sequences, and achieves state-of-the-art performance on facial landmark tracking. Codes of our model are available in the supplementary materials.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Representation learning

5552

Deep Neural Networks via Complex Network Theory: A Perspective

Emanuele La Malfa, Gabriele La Malfa, Giuseppe Nicosia, Vito Latora

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures. However, classic works adapt CNT metrics that only permit a topological analysis as they do not account for the effect of the input data. In addition, CNT metrics have been applied to a limited range of architectures, mainly including Fully Connected neural networks. In this work, we extend the existing CNT metrics with measures that sample from the DNNs’ training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning. For the novel metrics, in addition to the existing ones, we provide a mathematical formalisation for Fully Connected, AutoEncoder, Convolutional and Recurrent neural networks, of which we vary the activation functions and the number of hidden layers. We show that these metrics differentiate DNNs based on the architecture, the number of hidden layers, and the activation function. Our contribution provides a method rooted in physics for interpreting DNNs that offers insights beyond the traditional input-output relationship and the CNT topological analysis.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Applications
Machine Learning -> ML: Other

5558

Rethinking the Soft Conflict Pseudo Boolean Constraint on MaxSAT Local Search Solvers

Jiongzhi Zheng, Zhuo Chen, Chu-Min Li, Kun He

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

MaxSAT is an optimization version of the famous NP-complete Satisfiability problem (SAT). Algorithms for MaxSAT mainly include complete solvers and local search incomplete solvers. In many complete solvers, once a better solution is found, a Soft conflict Pseudo Boolean (SPB) constraint will be generated to enforce the algorithm to find better solutions. In many local search algorithms, clause weighting is a key technique for effectively guiding the search directions. In this paper, we propose to transfer the SPB constraint into the clause weighting system of the local search method, leading the algorithm to better solutions. We further propose an adaptive clause weighting strategy that breaks the tradition of using constant values to adjust clause weights. Based on the above methods, we propose a new local search algorithm called SPB-MaxSAT that provides new perspectives for clause weighting on MaxSAT local search solvers. Extensive experiments demonstrate the excellent performance of the proposed methods.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Search -> S: Local search

5560

Decoupling Breaks Data Barriers: A Decoupled Pre-training Framework for Multi-intent Spoken Language Understanding

Libo Qin, Qiguang Chen, Jingxuan Zhou, Qinzheng Li, Chunlin Lu, Wanxiang Che

6 min. talk | August 6th at 11:30 | Session: NLP: Dialogue and interactive systems

[+] More

[-] Less

Multi-intent Spoken Language Understanding (Multi-intent SLU) can extract multiple intents in a single utterance, gaining increasing attention. Nevertheless, current multi-intent SLU approaches still heavily rely on large amounts of annotated multi-intent SLU data, which makes it hard to be satisfied in real-world scenarios without sufficient data. Motivated by this, we introduce a novel decoupled pre-training framework (DPF) to address the data-scarcity problem, achieving to leverage of abundant multi-intent-free SLU data to enhance multi-intent SLU. Specifically, DPF first decouples the multi-intent SLU task into two abilities: (1) task-agnostic ability to locate the task-agnostic slot entity span and (2) task-specific ability to predict the task-specific slot and intent labels simultaneously. The key insight of DPF is that such decomposition allows us to design a two-stage decoupled pre-training procedure to enhance both task-agnostic ability and task-specific ability with abundant multi-intent-free SLU data (i.e., NER and single-intent SLU data), respectively. Experimental results on two standard benchmarks (e.g., MixATIS and MixSNIPS) demonstrate the effectiveness of DPF by achieving superior performance. In addition, extensive analyses reveal that utilizing the multi-intent-free data can effectively enhance multi-intent SLU.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems

5579

Interpretable Tensor Fusion

Saurabh Varshneya, Antoine Ledent, Philipp Liznerski, Andriy Balinskyy, Purvanshi Mehta, Waleed Mustafa, Marius Kloft

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method training a neural network to simultaneously learn multiple data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of the data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations, respectively. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on four real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Representation learning
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining

5585

SpecAR-Net: Spectrogram Analysis and Representation Network for Time Series

Yi Dong, Liwen Zhang, Youcheng Zhang, Shi Peng, Wen Chen, Zhe Ma

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

Representing temporal-structured samples is essential for effective time series analysis tasks. So far, recurrent networks, convolution networks and transformer-style models have been successively applied in temporal data representation, yielding notable results. However, most existing methods primarily focus on modeling and representing the variation patterns within time series in the time domain. As a highly abstracted information entity, 1D time series couples various patterns such as trends, seasonality, and dramatic changes (instantaneous high dynamic), it is difficult to exploit these highly coupled properties merely by analysis tools on purely time domain. To this end, we present Spectrogram Analysis and Representation Network (SpecAR-Net). SpecAR-Net aims at learning more comprehensive representations by modeling raw time series in both time and frequency domain, where an efficient joint extraction of time-frequency features is achieved through a group of learnable 2D multi-scale parallel complex convolution blocks. Experimental results show that the SpecAR-Net achieves excellent performance on 5 major downstream tasks i.e., classification, anomaly detection, imputation, long- and short-term forecasting. Code and appendix are available at https://github.com/Dongyi2go/SpecAR_Net.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Convolutional networks

5590

Contract Scheduling with Distributional and Multiple Advice

Spyros Angelopoulos, Marcin Bienkowski, Christoph Dürr, Bertrand Simon

12 min. talk | August 9th at 10:00 | Session: ML: Optimization

[+] More

[-] Less

Contract scheduling is a widely studied framework for designing real-time systems with interruptible capabilities. Previous work has showed that a prediction on the interruption time can help improve the performance of contract-based systems, however it has relied on a single prediction that is provided by a deterministic oracle. In this work, we introduce and study more general and realistic learning-augmented settings in which the prediction is in the form of a probability distribution, or it is given as a set of multiple possible interruption times. For both prediction settings, we design and analyze schedules which perform optimally if the prediction is accurate, while simultaneously guaranteeing the best worst-case performance if the prediction is adversarial. We also provide evidence that the resulting system is robust to prediction errors in the distributional setting. Last, we present an experimental evaluation that confirms the theoretical findings, and illustrates the performance improvements that can be attained in practice.

List of keywords

Machine Learning -> ML: Optimization
Planning and Scheduling -> PS: Scheduling
Planning and Scheduling -> PS: Learning in planning and scheduling
Uncertainty in AI -> UAI: Sequential decision making

5630

Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching

Zhuoran Li, Chunming Hu, Junfan Chen, Zhijun Chen, Xiaohui Guo, Richong Zhang

6 min. talk | August 9th at 11:30 | Session: NLP: Natural Language Processing (3/3)

[+] More

[-] Less

Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switching text samples will negatively hurt the models’ cross-lingual transferability. To this end, we propose a Progressive Code-Switching (PCS) method to gradually generate moderately difficult code-switching examples for the model to discriminate from easy to hard. The idea is to incorporate progressively the preceding learned multilingual knowledge using easier code-switching data to guide model optimization on succeeding harder code-switching data. Specifically, we first design a difficulty measurer to measure the impact of replacing each word in a sentence based on the word relevance score. Then a code-switcher generates the code-switching data of increasing difficulty via a controllable temperature variable. In addition, a training scheduler decides when to sample harder code-switching data for model training. Experiments show our model achieves state-of-the-art results on three different zero-shot cross-lingual transfer tasks across ten languages.

List of keywords

Natural Language Processing -> NLP: Machine translation and multilinguality
Natural Language Processing -> NLP: Applications

5638

Instantiations and Computational Aspects of Non-Flat Assumption-based Argumentation

Tuomo Lehtonen, Anna Rapberger, Francesca Toni, Markus Ulbricht, Johannes P. Wallner

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. In this paper, we study an instantiation-based approach for reasoning in possibly non-flat ABA. We make use of a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). By utilizing compilability theory, we establish that the constructed BAFs will in general be of exponential size. To keep the number of arguments and computational cost low, we present three ways of identifying redundant arguments. Moreover, we identify fragments of ABA which admit a poly-sized instantiation. We propose two algorithmic approaches for reasoning in non-flat ABA; the first utilizes the BAF instantiation while the second works directly without constructing arguments. An empirical evaluation shows that the former outperforms the latter on many instances, reflecting the lower complexity of BAF reasoning. This result is in contrast to flat ABA, where direct approaches dominate instantiation-based solvers.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

5652

Learning in CubeRes Model Space for Anomaly Detection in 3D GPR Data

Xiren Zhou, Shikang Liu, Ao Chen, Huanhuan Chen

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Three-dimensional Ground Penetrating Radar (3D GPR) data offer comprehensive views of the subsurface, yet identifying and classifying underground anomalies from this data is challenging due to limitations like scarce training data and variable underground environments. In response, we introduce learning in the Cube Reservoir Network (CubeRes) model space for efficient and accurate subsurface anomaly detection. CubeRes, incorporating three reservoirs, captures the dynamics in both horizontal and vertical directions inherent in the 3D GPR data. Fitting the data with CubeRes, representing the data with the compact fitted model, and measuring the difference between models by a proposed parameterized model metric, the original data is transformed from the data space to the CubeRes model space. Subsequently, we introduce an optimization strategy in this model space, aimed at bolstering fitting accuracy and improving category discrimination. This enhancement facilitates a more nuanced differentiation of dynamics across various GPR data categories, thereby enabling effective classification on the models rather than the original data. Experiments on real-world data validate our method’s effectiveness and superiority, particularly in data-limited scenarios.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Classification
Machine Learning -> ML: Recurrent networks

5665

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.

List of keywords

Machine Learning -> ML: Reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Machine Learning -> ML: Model-based and model learning reinforcement learning
Machine Learning -> ML: Trustworthy machine learning

5683

MCM: Multi-condition Motion Synthesis Framework

Zeyu Ling, Bo Han, Yongkang Wong, Han Lin, Mohan Kankanhalli, Weidong Geng

12 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptly apprehends the spatial intricacies and inter-joint correlations inherent in motion sequences, facilitated by the integration of multi-wise self-attention modules. Extensive experiments show that our method achieves competitive results in single-condition and multi-condition HMS tasks.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications

5690

Constrained Intrinsic Motivation for Reinforcement Learning

Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose Constrained Intrinsic Motivation (CIM) for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.

List of keywords

Machine Learning -> ML: Reinforcement learning
Robotics -> ROB: Behavior and control
Robotics -> ROB: Learning in robotics

5700

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Matteo Bianchi, Antonio De Santis, Andrea Tocchetti, Marco Brambilla

12 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model’s decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Computer Vision -> CV: Interpretability and transparency
Humans and AI -> HAI: Human computation and crowdsourcing

5701

Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen

12 min. talk | August 6th at 11:30 | Session: NLP: Dialogue and interactive systems

[+] More

[-] Less

The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems

5729

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

Dylan Cope, Peter McBurney

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

In Emergent Communication (EC) agents learn to communicate with one another, but the protocols that they develop are specialised to their training community. This observation led to research into Zero-Shot Coordination (ZSC) for learning communication strategies that are robust to agents not encountered during training. However, ZSC typically assumes that no prior data is available about the agents that will be encountered in the zero-shot setting. In many cases, this presents an unnecessarily hard problem and rules out communication via preestablished conventions. We propose a novel AI challenge called a Cooperative Language Acquisition Problem (CLAP) in which the ZSC assumptions are relaxed by allowing a ‘joiner’ agent to learn from a dataset of interactions between agents in a target community. We propose and compare two methods for solving CLAPs: Behaviour Cloning (BC), and Emergent Communication pretraining and Translation Learning (ECTL), in which an agent is trained in self-play with EC and then learns to translate between an emergent protocol and the target community’s protocol.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Multiagent Reinforcement Learning

5737

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt (PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art (SOTA) methods across various settings for multi-label image classification tasks. The code is available at https://github.com/njustkmg/PVP.

List of keywords

Machine Learning -> ML: Multi-label learning
Computer Vision -> CV: Multimodal learning
Machine Learning -> ML: Multi-modal learning

5740

Perturbation Guiding Contrastive Representation Learning for Time Series Anomaly Detection

Liaoyuan Tang, Zheng Wang, Guanxiong He, Rong Wang, Feiping Nie

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

Time series anomaly detection is a critical task with applications in various domains. Due to annotation challenges, self-supervised methods have become the mainstream approach for time series anomaly detection in recent years. However, current contrastive methods categorize data perturbations into binary classes, normal or anomaly, which lack clarity on the specific impact of different perturbation methods. Inspired by the hypothesis that "the higher the probability of misclassifying perturbation types, the higher the probability of anomalies", we propose PCRTA, our approach firstly devises a perturbation classifier to learn the pseudo-labels of data perturbations. Furthermore, for addressing "class collapse issue" in contrastive learning, we propose a perturbation guiding positive and negative samples selection strategy by introducing learnable perturbation classification networks. Extensive experiments on six realworld datasets demonstrate the significant superiority of our model over thirteen state-of-the-art competitors, and obtains average 5.14%, 8.24% improvement in F1 score and AUC-PR, respectively.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Unsupervised learning

5761

Approximate Dec-POMDP Solving Using Multi-Agent A*

Wietze Koops, Sebastian Junges, Nils Jansen

6 min. talk | August 9th at 11:30 | Session: PS: Planning and Scheduling (2/2)

[+] More

[-] Less

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

List of keywords

Planning and Scheduling -> PS: Planning under uncertainty
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: POMDPs

5766

Effective Approach to LTLf Best-Effort Synthesis in Multi-Tier Environments

Benjamin Aminof, Giuseppe De Giacomo, Gianmarco Parretti, Sasha Rubin

6 min. talk | August 9th at 11:30 | Session: KRR: Reasoning about actions

[+] More

[-] Less

We consider an agent acting in a complex environment modeled through a multi-tiered specification, in which each tier adds nondeterminism in the environment response to the agent actions. In this setting, we devise an effective approach to best-effort synthesis, i.e., synthesizing agent strategies that win against a maximal set of possible environment responses in each tier. We do this in a setting where both the multi-tier environment and agent goal are specified in the linear temporal logic on finite traces (LTLf). While theoretical solution techniques based on automata on infinite trees have been developed previously, we completely side-step them here and focus on a DFA-based game-theoretic technique, which can be effectively implemented symbolically. Specifically, we present a provably correct algorithm that is based on solving separately DFA-based games for each tier and then combining the obtained solutions on-the-fly. This algorithm is linear, as opposed to being exponential, in the number of tiers and thus, it can graciously handle multi-tier environments formed of several tiers.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis

5793

Putting Back the Stops: Integrating Syntax with Neural Topic Models

Mayank Nagda, Sophie Fellenz

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Syntax and semantics are two key concepts for language understanding. Topic models typically represent the semantics of a text corpus, while removing syntactic information during preprocessing. Without preprocessing, the generated topics become uninterpretable because the syntactic words dominate generated topics. To learn interpretable topics while keeping valuable syntactic information, we propose a novel framework that can simultaneously learn both syntactic and semantic topics from the corpus without requiring any preprocessing. A context network leverages textual dependencies to distinguish between syntactic and semantic words, while a composite VAE topic model learns two sets of topics. We demonstrate on seven datasets that our proposed method effectively captures both syntactic and semantic representations of a corpus while outperforming state-of-the-art neural topic models and statistical topic models in terms of topic quality.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Machine Learning -> ML: Applications
Machine Learning -> ML: Generative models
Natural Language Processing -> NLP: Information extraction

5799

Zeta*-SIPP: Improved Time-Optimal Any-Angle Safe-Interval Path Planning

Yiyuan Zou, Clark Borst

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

Any-angle path planning is an extension of traditional path-planning algorithms that aims to generate smoother and shorter paths in graphs by allowing any-angle moves between vertices, rather than being restricted by edges. Many any-angle path-planning algorithms have been proposed, such as Theta*, Block A* and Anya, but most of them are designed only for static environments, which is not applicable when dynamic obstacles are present. Time-Optimal Any-Angle Safe-Interval Path Planning (TO-AA-SIPP) was developed to fill this gap, which can find an optimal collision-free any-angle path that minimizes the traversal time. However, as indicated by its authors, TO-AA-SIPP may not be efficient enough to be used in multi-agent pathfinding (MAPF). Therefore, this paper presents a new algorithm Zeta*-SIPP to improve TO-AA-SIPP by means of 1) reducing useless search nodes that have no contribution to finding optimal solutions, and 2) introducing Field of View (FoV) instead of Line of Sight (LoS) to speed up visibility checks with static obstacles. Benchmark experiments showed that Zeta*-SIPP reduced the computation time of TO-AA-SIPP by around 70%-90% on average.

List of keywords

Planning and Scheduling -> PS: Planning algorithms
Robotics -> ROB: Motion and path planning
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Search -> S: Heuristic search

5802

ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

Zhaopei Huang, Jinming Zhao, Qin Jin

6 min. talk | August 7th at 10:00 | Session: NLP: Sentiment analysis, stylistic analysis, and argument mining

[+] More

[-] Less

Understanding the process of emotion generation is crucial for analyzing the causes behind emotions. Causal Emotion Entailment (CEE), an emotion-understanding task, aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance. However, current works in CEE mainly focus on modeling semantic and emotional interactions in conversations, neglecting the exploration of the emotion-generation process. This hinders the models from deeply understanding emotions, restricting their ability to produce explainable predictions. In this work, inspired by the emotion generation process of "stimulus-appraisal-emotion" in the cognitive appraisal theory, we introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations. Specifically, we first introduce the ECR-Chain to ChatGPT via few-shot prompting, which significantly improves its performance on the CEE task. We further propose an automated construction process to utilize ChatGPT in building an ECR-Chain set, which can enhance the reasoning abilities of smaller models through supervised training and assist the Vicuna-7B model in achieving state-of-the-art CEE performance. Moreover, our methods can enable these generative language models to effectively perform emotion-cause reasoning in an explainable manner. Our code, data and more details are at https://github.com/hzp3517/ECR-Chain.

List of keywords

Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining

5812

MMGNN: A Molecular Merged Graph Neural Network for Explainable Solvation Free Energy Prediction

Wenjie Du, Shuai Zhang, Di Wu, Jun Xia, Ziyuan Zhao, Junfeng Fang, Yang Wang

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

In this paper, we address the challenge of accurately modeling and predicting Gibbs free energy in solute-solvent interactions, a pivotal yet complex aspect in the field of chemical modeling. Traditional approaches, primarily relying on deep learning models, face limitations in capturing the intricate dynamics of these interactions. To overcome these constraints, we introduce a novel framework, molecular modeling graph neural network (MMGNN), which more closely mirrors real-world chemical processes.Specifically, MMGNN explicitly models atomic interactions such as hydrogen bonds by initially forming indiscriminate connections between intermolecular atoms, which are then refined using an attention-based aggregation method, tailoring to specific solute-solvent pairs. To address the challenges of non-interactive or repulsive atomic interactions, MMGNN incorporates interpreters for nodes and edges in the merged graph, enhancing explainability and reducing redundancy. MMGNN stands as the first framework to explicitly align with real chemical processes, providing a more accurate and scientifically sound approach to modeling solute-solvent interactions. The infusion of explainability allows for the extraction of key subgraphs, which are pivotal for further research in solute-solvent dynamics. Extensive experimental validation confirms the efficacy and enhanced explainability of MMGNN.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Data Mining -> DM: Applications
Machine Learning -> ML: Applications

5817

Combinatorial Games with Incomplete Information

Junkang Li, Bruno Zanuttini, Véronique Ventos

12 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

Games with incomplete information model multi-agent interaction in which players do not have common knowledge of the game they play. We propose a minimal generalisation of combinatorial games to incorporate incomplete information, called combinatorial game with incomplete information (CGII). The most important feature of CGIIs is that all actions are public, which allows better visualisation of each player’s knowledge and incomplete information. To further motivate the study of this new formalism, we show that computing optimal strategies for CGIIs has the same computational complexity as for general extensive-form games.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Uncertainty in AI -> UAI: Sequential decision making

5819

Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Iterative Refinement Labeling

Liang Zeng, Lei Wang, Hui Niu, Ruchen Zhang, Ling Wang, Jian Li

6 min. talk | August 7th at 10:00 | Session: MTA: Finance

[+] More

[-] Less

Price movement forecasting, aimed at predicting financial asset trends based on current market information, has achieved promising advancements through machine learning (ML) methods. Most existing ML methods, however, struggle with the extremely low signal-to-noise ratio and stochastic nature of financial data, often mistaking noises for real trading signals without careful selection of potentially profitable samples. To address this issue, we propose LARA, a novel price movement forecasting framework with two main components: Locality-Aware Attention (LA-Attention) and Iterative Refinement Labeling (RA-Labeling). (1) LA-Attention, enhanced by metric learning techniques, automatically extracts the potentially profitable samples through masked attention scheme and task-specific distance metrics. (2) RA-Labeling further iteratively refines the noisy labels of potentially profitable samples, and combines the learned predictors robust to the unseen and noisy samples. In a set of experiments on three real-world financial markets: stocks, cryptocurrencies, and ETFs, LARA significantly outperforms several machine learning based methods on the Qlib quantitative investment platform. Extensive ablation studies confirm LARA’s superior ability in capturing more reliable trading opportunities.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Finance
Machine Learning -> ML: Applications

5880

To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty

Chin-wing Leung, Tom Lenaerts, Paolo Turrini

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

If given the choice, what strategy should agents use to switch partners in strategic social interactions? While many analyses have been performed on specific switching heuristics, showing how and when these lead to more cooperation, no insights have been provided into which rule will actually be learnt by agents when given the freedom to do so. Starting from a baseline model that has demonstrated the potential of rewiring for cooperation, we provide answers to this question over the full spectrum of social dilemmas. Multi-agent Q-learning with Boltzmann exploration is used to learn when to sever or maintain an association. In both the Prisoner’s Dilemma and the Stag Hunt games we observe that the Out-for-Tat rewiring rule, breaking ties with other agents choosing socially undesirable actions, becomes dominant, confirming at the same time that cooperation flourishes when rewiring is fast enough relative to imitation. Nonetheless, in the transitory region before full cooperation, a Stay strategy, keeping a connection at all costs, remains present, which shows that loyalty needs to be overcome for full cooperation to emerge. In conclusion, individuals learn cooperation-promoting rewiring rules but need to overcome a kind of loyalty to achieve full cooperation in the full spectrum of social dilemmas.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Agent societies
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

5887

A Consistency and Integration Model with Adaptive Thresholds for Weakly Supervised Object Localization

Hao Su, Meng Yang

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Weakly Supervised Object Localization (WSOL) is a challenging task, which aims to learn object localization with less costly image-level labels. Existing convolution neural network (CNN) based methods tend to focus on discriminative regions of objects, while transformer-based methods overemphasize deep global features powerful for classification and lack the capability to perceive object details, leading to prediction results far from the object boundary. In this paper, we propose a novel Consistency and Integration Model with Adaptive Thresholds (CIAT) that exploits the spatial-semantic consistency between shallow and deep features to activate more object regions and detects the object regions adaptively in different images. First, we introduce a simple plug-and-play consistency and integration module of shallow-deep features (CISD), which utilizes shallow features efficiently to enhance the entire object perception. Then, we design an online adaptive threshold (OAT) based on Bayesian decision theory, which computes a reasonable segmentation threshold adaptive for the localization map of each image, making the predicted bounding box closer to the ground truth. Extensive experiments on two widely used CUB-200-2011 and ILSVRC datasets verify the effectiveness of our methods.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Interpretability and transparency
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Segmentation

5891

Long Short-Term Dynamic Prototype Alignment Learning for Video Anomaly Detection

Chao Huang, Jie Wen, Chengliang Liu, Yabo Liu

12 min. talk | August 9th at 11:30 | Session: CV: Machine learning for vision

[+] More

[-] Less

Video anomaly detection (VAD) is the core problem of intelligent video surveillance. Previous methods commonly adopt the unsupervised paradigm of frame reconstruction or prediction. However, the lack of mining of temporal dependent relationships and diversified event patterns within videos limit the performance of existing methods. To tackle these problems, we propose a novel prototype-guided and dynamic-aware long-distance frame prediction paradigm for VAD. Specifically, we develop a prototype-guided dynamics matching network (PDM-Net) to enhance the discriminant and robustness of anomaly detector. To explore the temporal contexts, we equip PDM-Net with a long short-term dynamic prototype alignment learning mechanism, which stores long-term dynamic prototypes into memory bank and learns how to recall long-term dynamic prototypes with short-term dynamics. As a result, the short input sequences can recall long-term dynamic prototypes stored in the memory bank to achieve the task of long-distance frame prediction. Besides, a feature discrimination module is adopted to extract the representative dynamic features of various normal events meanwhile preserving the diversity of normal patterns. Experimental results on four public datasets demonstrate the superiority of our method.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Multimodal learning
Machine Learning -> ML: Unsupervised learning

5932

Aggregation of Continuous Preferences in One Dimension

Alberto Del Pia, Dušan Knop, Alexandra Lassota, Krzysztof Sornat, Nimrod Talmon

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

We develop a general, formal model of social choice in which voters have continuous preferences over a one-dimensional space. Our model is parameterized by different restrictions that we introduce regarding the way voter preferences change in time as well as the optimization criteria (that correspond to a normative continuum of fairness definitions) desired from an aggregation method—that outputs a continuous, one-dimensional curve—given such inputs. We discuss the applicability of the model to different real-world situations and, as a first step towards an analysis of the different model realizations, we concentrate on identifying those cases that are computationally feasible to compute.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

5934

General Epistemic Abstract Argumentation Framework: Semantics and Complexity

Gianvincenzo Alfano, Sergio Greco, Francesco Parisi, Irina Trubitsyna

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

Epistemic Abstract Argumentation Framework (EAAF) extends Dung’s framework (AAF)—a central formalism in AI for modeling disputes among agents—by allowing the representation of epistemic knowledge. In particular, EAAF augments AAF with weak and strong epistemic attacks whose intuitive meaning is that an argument a defeats an argument b by means of a weak (resp. strong) epistemic attack if a is true in every (resp. at least one) extension. So far, the semantics of EAAF has been defined only for a restricted class of frameworks, namely acyclic EAAF, where epistemic attacks do not occur in any cycle. In this paper, we provide an intuitive semantics for (general) EAAF that naturally extends that for AAF as well as that for acyclic EAAF. After providing some fundamental properties and giving an algorithm that enables the computation of EAAF semantics, by relying on state-of-the-art AAF-solvers, we investigate the complexity of canonical argumentation problems.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation

5935

Online Learning of Partitions in Additively Separable Hedonic Games

Saar Cohen, Noa Agmon

12 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

Coalition formation involves partitioning agents into disjoint coalitions based on their preferences over other agents. In reality, agents may lack enough information to assess their preferences before interacting with others. This motivates us to initiate the research on coalition formation from the viewpoint of online learning. At each round, a possibly different subset of a given set of agents arrives, that a learner then partitions into coalitions. Only afterwards, the agents’ preferences, which possibly change over time, are revealed. The learner’s goal is optimizing social cost by minimizing his (static or dynamic) regret. We show that even no-static regret is hard to approximate, and constant approximation in polynomial time is unattainable. Yet, for a fractional relaxation of our problem, we devise an algorithm that simultaneously gives the optimal static and dynamic regret. We then present a rounding scheme with an optimal dynamic regret, which converts our algorithm’s output into a solution for our original problem.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Cooperative games
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Machine Learning -> ML: Online learning
Machine Learning -> ML: Optimization

5939

Deep Hierarchical Graph Alignment Kernels

Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK).

List of keywords

Machine Learning -> ML: Kernel methods

5943

A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning

Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos

12 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the dynamics of the source environment, resulting in the discovery of policies that excel in this environment but fail to provide reasonable performance in the target one. In the few-shot framework, a limited number of transitions from the target environment are introduced to facilitate a more effective transfer. Addressing this challenge, we propose an innovative approach inspired by recent advancements in Imitation Learning and Conservative RL algorithms. This method introduces a penalty to regulate the trajectories generated by the source-trained policy. We evaluate our method across various environments representing diverse off-dynamics conditions, where access to the target environment is extremely limited. These experiments include high-dimensional systems relevant to real-world applications. Across most tested scenarios, our proposed method demonstrates performance improvements compared to existing baselines.

List of keywords

Machine Learning -> ML: Reinforcement learning

5954

Temporal Domain Generalization via Learning Instance-level Evolving Patterns

Yujie Jin, Zhibang Yang, Xu Chu, Liantao Ma

6 min. talk | August 9th at 10:00 | Session: ML: Multi-task and transfer learning

[+] More

[-] Less

Temporal Domain Generalization (TDG) aims at learning models under temporally evolving data distributions and achieving generalization to unseen future data distributions following the evolving trend. Existing advanced TDG methods learn the evolving patterns through the collective behaviors observed at the population-level of instances, such as time-varying statistics and parameters, tending to overlook the impact of individual-level instance evolving processes on the decision boundary. However, a major obstacle is that datasets at different timestamps may comprise unrelated instances and there is no inherent existence of the instance-level evolving trajectories, which hinders us from learning how the decision boundary changes. To address the above challenges, we propose a Continuous-Time modelling Optimal Transport trajectories (CTOT) framework in this paper. Specifically, we utilize optimal transport to align the data distributions between each pair of adjacent source domains to construct instance evolving trajectories. Subsequently, they are modelled by a continuous-time model and extrapolated to generate future virtual instances, which help the model to adapt its decision boundary to the future domain. Extensive experiments on multiple classification and regression benchmarks demonstrate the effectiveness of the proposed CTOT framework. The code and appendix are both available on https://github.com/JinYujie99/CTOT.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Incremental learning
Machine Learning -> ML: Regression

5963

SEMv3: A Fast and Robust Approach to Table Separation Line Detection

Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The "split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines. During the split stage, we introduce a Keypoint Offset Regression (KOR) module, which effectively detects table separation lines by directly regressing the offset of each line relative to its keypoint proposals. Moreover, in the merge stage, we define a series of merge actions to efficiently describe the table structure based on table grids. Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 cTDaR Historical and iFLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. The code is available at https://github.com/Chunchunwumu/SEMv3.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications

5964

Nonconvex Multiview Subspace Clustering Framework with Efficient Method Designs and Theoretical Analysis

Zhi Wang, Zhuo Liu, Dong Hu, Tao Jia

6 min. talk | August 8th at 10:00 | Session: ML: Clustering

[+] More

[-] Less

Multi-view subspace clustering (MvSC) is one of the most effective methods for understanding and processing high-dimensional data. However, existing MvSC methods still have two shortcomings: (1) they adopt the nuclear norm as the low-rank constraint, which makes it impossible to fully exploit the mutually complementary subspace information, and (2) they do not handle disjoint and confounding points carefully, which may degrade the purity and distinctiveness of cross-view fusion. To address these issues, in this paper we propose a novel MvSC model with nonconvex ℓq regularization. Specially, our proposed model can not only effectively capture the intrinsic global low-rank structure, but also accurately cluster disjoint and confounding data samples into corresponding subspaces. Then, an efficient algorithm is developed with convergence guarantee. Furthermore, we prove that the sequence generated by our proposed algorithm converges to the desirable Karush-Kuhn-Tucker (KKT) critical point. Extensive experiments on various datasets verify the superiority of our proposed model. MATLAB code is available at https://github.com/wangzhi-swu/NLRSC-MvSC.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Matrix/tensor methods
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Optimization

5967

Data Complexity in Expressive Description Logics with Path Expressions

Bartosz Bednarczyk

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHbSelfregOIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

5968

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

Donghoon Kim, Minjong Yoo, Honguk Woo

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Goal-conditioned (GC) policy learning often faces a challenge arising from the sparsity of rewards, when confronting long-horizon goals. To address the challenge, we explore skill-based GC policy learning in offline settings, where skills are acquired from existing data and long-horizon goals are decomposed into sequences of near-term goals that align with these skills. Specifically, we present an `offline GC policy learning via skill-step abstraction’ framework (GLvSA) tailored for tackling long-horizon GC tasks affected by goal distribution shifts. In the framework, a GC policy is progressively learned offline in conjunction with the incremental modeling of skill-step abstractions on the data. We also devise a GC policy hierarchy that not only accelerates GC policy learning within the framework but also allows for parameter-efficient fine-tuning of the policy. Through experiments with the maze and Franka kitchen environments, we demonstrate the superiority and efficiency of our GLvSA framework in adapting GC policies to a wide range of long-horizon goals. The framework achieves competitive zero-shot and few-shot adaptation performance, outperforming existing GC policy learning and skill-based methods.

List of keywords

Machine Learning -> ML: Reinforcement learning

5983

OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images

Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A.H. Welsh, Aiyi Liu, Bo Fu, Catherine C. Liu, Xingtao Zhou

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN which can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs. The code and the supplementary materials are available at: github.com/Charley-HUANG/OUCopula.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Multi-label learning
Multidisciplinary Topics and Applications -> MTA: Health and medicine

5996

Multi-Relational Graph Attention Network for Social Relationship Inference from Human Mobility Data

Guangming Qin, Jianpeng Qi, Bin Wang, Guiyuan Jiang, Yanwei Yu, Junyu Dong

6 min. talk | August 7th at 10:00 | Session: DM: Mining spatial and/or temporal data (1/2)

[+] More

[-] Less

Inferring social relationships from human mobility data holds significant value in real-life spatio-temporal applications, which inspires the development of a series of graph-based methods for inferring social relationships. Despite their effectiveness, we argue that previous methods either rely solely on direct relations between users, neglecting valuable user mobility patterns, or have not fully harnessed the indirect interactions, thereby struggling to capture users’ mobility preferences. To address these issues, in this work, we propose the Multi-Relational Graph Attention Network (MRGAN), a novel graph attention network, which is able to explicitly model indirect relations and effectively capture their different impact. Specifically, we first extract a multi-relational graph from heterogeneous mobility graph to explicitly model the direct and indirect relations,and then utilize influence attention and cross-relation attention to further capture the different influence between users, and different importance of relations for each user. Comprehensive experiments on three real-world mobile datasets demonstrate that the proposed model significantly outperforms state-of-the-art models in predicting social relationships between users. The source code of our model is available at https://github.com/qinguangming1999/MRGAN_IJCAI.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Mining graphs

5999

Offline Reinforcement Learning with Behavioral Supervisor Tuning

Padmanaba Srinivasan, William Knottenbelt

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to prior methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.

List of keywords

Machine Learning -> ML: Offline reinforcement learning
Machine Learning -> ML: Reinforcement learning

6007

Contrastive Representation Learning for Self-Supervised Taxonomy Completion

Yuhang Niu, Hongyuan Xu, Ciyi Liu, Yanlong Wen, Xiaojie Yuan

12 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Taxonomy completion, a self-supervised task, aims to add new concepts to an existing taxonomy by attaching them to appropriate hypernym and hyponym pairs. Researchers have proposed several approaches to capture the essential relationships in taxonomy using semantic or structural information. However, they either construct training signals from a single view or simply use a random sampling strategy, making it insufficient to capture various relations in taxonomic structure and learn quality representations. To address this, we propose CoSTC, a contrastive learning framework that captures diverse relations and improves representations for taxonomy completion. It uses two contrasting views, namely intra-view and inter-view, to provide rich self-supervised signals. In intra-view contrasting, we exploit the correlations within queries and within positions by performing instance-level discrimination task. In inter-view contrasting, we use a sampling strategy that considers diversity and hardness to select representative pairs, enhancing the learning of fine-grained query-position relations. Experimental results on three datasets verify the effectiveness of our approach. Our code is available at https://github.com/nyh-a/CoSTC.

List of keywords

Natural Language Processing -> NLP: Information extraction

6013

Gradformer: Graph Transformer with Exponential Decay

Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph’s inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph’s local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models. Codes are available at https://github.com/LiuChuang0059/Gradformer.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Classification

6016

FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients

Zhuohua Li, Maoli Liu, John C. S. Lui

12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based federated conversational bandit algorithm, where M agents collaboratively solve a global contextual linear bandit problem with the help of a central server while ensuring secure data management. To effectively coordinate all the clients and aggregate their collected data, FedConPE uses an adaptive approach to construct key terms that minimize uncertainty across all dimensions in the feature space. Furthermore, compared with existing federated linear bandit algorithms, FedConPE offers improved computational and communication efficiency as well as enhanced privacy protections. Our theoretical analysis shows that FedConPE is minimax near-optimal in terms of cumulative regret. We also establish upper bounds for communication costs and conversation frequency. Comprehensive evaluations demonstrate that FedConPE outperforms existing conversational bandit algorithms while using fewer conversations.

List of keywords

Machine Learning -> ML: Multi-armed bandits
Data Mining -> DM: Recommender systems
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Online learning

6017

Rank and Align: Towards Effective Source-free Graph Domain Adaptation

Junyu Luo, Zhiping Xiao, Yifan Wang, Xiao Luo, Jingyang Yuan, Wei Ju, Langechuan Liu, Ming Zhang

12 min. talk | August 9th at 10:00 | Session: ML: Multi-task and transfer learning

[+] More

[-] Less

Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target domain. To solve this problem, we introduce a novel GNN-based approach called Rank and Align (RNA), which ranks graph similarities with spectral seriation for robust semantics learning, and aligns inharmonic graphs with harmonic graphs which close to the source domain for subgraph extraction. In particular, to overcome label scarcity, we employ the spectral seriation algorithm to infer the robust pairwise rankings, which can guide semantic learning using a similarity learning objective. To depict distribution shifts, we utilize spectral clustering and the silhouette coefficient to detect harmonic graphs, which the source model can easily classify. To reduce potential domain discrepancy, we extract domain-invariant subgraphs from inharmonic graphs by an adversarial edge sampling process, which guides the invariant learning of GNNs. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed RNA.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Data Mining -> DM: Mining graphs

6028

Improved Encodings of Acyclicity for Translating Answer Set Programming into Integer Programming

Masood Feyzbakhsh Rankooh, Tomi Janhunen

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

In this work, we introduce novel translations of Answer Set Programming (ASP) into Integer Programming (IP). While building upon a previously introduced IP translation, we revisit the translation of acyclicity constraints essential for capturing answer sets precisely. By leveraging vertex elimination graphs, we demonstrate that a new translation of acyclicity can yield integer programs with a more restrictive linear relaxation compared to previous methods. This enhancement enables IP solvers to prune the search space more efficiently. Furthermore, we show how acyclicity can be expressed more concisely in IP given any feedback vertex set of the underlying dependency graph. Experimental results underscore the improved efficiency of our methods over the previously implemented translation. The new vertex elimination based translation with Gurobi as the back-end solver turns out competitive against Clingo, a state-of-the-art native ASP solver, in a number of non-tight Answer Set Optimization (ASO) benchmarks.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction

6035

Breaking Barriers of System Heterogeneity: Straggler-Tolerant Multimodal Federated Learning via Knowledge Distillation

Jinqian Chen, Haoyu Tang, Junhao Cheng, Ming Yan, Ji Zhang, Mingzhu Xu, Yupeng Hu, Liqiang Nie

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Internet of Things (IoT) devices possess valuable yet private multimodal data, calling for a decentralized machine learning scheme. Though several multimodal federated learning (MFL) methods have been proposed, most of them merely overlook the system heterogeneity across IoT devices, resulting in the inadaptability to real world applications. Aiming at this, we conduct theoretical analysis and exploration experiments on straggler impacts and uncover the fact that stragglers caused by system heterogeneity are fatal to MFL, resulting in catastrophic time overhead. Motivated by this, we propose a novel Multimodal Federated Learning with Accelerated Knowledge Distillation (MFL-AKD) framework, which is the first attempt to integrate knowledge distillation to combat stragglers in complex multimodal federated scenarios. Concretely, given the pretrained large-scale vision-language models deployed in the central server, we apply a fast knowledge transfer mechanism to conduct early training of local models with part of the local data. The early-trained model is then enhanced through the distillation of the pretrained large model and further trained on the remaining data. Extensive experiments on two datasets for video moment retrieval and two datasets for image-text retrieval demonstrate that our method achieves superior results with high straggler robustness.

List of keywords

Machine Learning -> ML: Multi-modal learning

6053

A General Black-box Adversarial Attack on Graph-based Fake News Detectors

Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang

6 min. talk | August 7th at 10:00 | Session: ETF: Safety and robustness

[+] More

[-] Less

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i.e., General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Robustness
Machine Learning -> ML: Sequence and graph learning

6056

DANCE: Dual-View Distribution Alignment for Dataset Condensation

Hansong Zhang, Shikun Li, Fanzhao Lin, Weiping Wang, Zhenxing Qian, Shiming Ge

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Dataset condensation addresses the problem of data burden by learning a small synthetic training set that preserves essential knowledge from the larger real training set. To date, the state-of-the-art (SOTA) results are often yielded by optimization-oriented methods, but their inefficiency hinders their application to realistic datasets. On the other hand, the Distribution-Matching (DM) methods show remarkable efficiency but sub-optimal results compared to optimization-oriented methods. In this paper, we reveal the limitations of current DM-based methods from the inner-class and inter-class views, i.e., Persistent Training and Distribution Shift. To address these problems, we propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE), which exploits a few pre-trained models to improve DM from both inner-class and inter-class views. Specifically, from the inner-class view, we construct multiple “mid encoders” to perform pseudo long-term distribution alignment, making the condensed set a good proxy of the real one during the whole training process; while from the inter-class view, we use the expert models to perform distribution calibration, ensuring the synthetic data remains in the real class region during condensing. Experiments demonstrate the proposed method achieves a SOTA performance while maintaining comparable efficiency with the original DM across various scenarios. Source codes are available at https://github.com/Hansong-Zhang/DANCE.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Optimization

6057

Self-Supervised Learning for Enhancing Spatial Awareness in Free-Hand Sketches

Xin Wang, Tengjie Li, Sicong Zang, Shikui Tu, Lei Xu

6 min. talk | August 7th at 11:30 | Session: ML: Self-supervised Learning

[+] More

[-] Less

Free-hand sketch, as a versatile medium of communication, can be viewed as a collection of strokes arranged in a spatial layout to convey a concept. Due to the abstract nature of the sketches, changes in stroke position may make them difficult to recognize. Recently, Graphic sketch representations are effective in representing sketches. However, existing methods overlook the significance of the spatial layout of strokes and the phenomenon of strokes being drawn in the wrong positions is common. Therefore, we developed a self-supervised task to correct stroke placement and investigate the impact of spatial layout on learning sketch representations. For this task, we propose a spatially aware method, named SketchGloc, utilizing multiple graphs for graphic sketch representations. This method utilizes grids for each stroke to describe the spatial layout with other strokes, allowing for the construction of multiple graphs. Unlike other methods that rely on a single graph, this design conveys more detailed spatial layout information and alleviates the impact of misplaced strokes. The experimental results demonstrate that our model outperforms existing methods in both our proposed task and the traditional controllable sketch synthesis task. Additionally, we found that SketchGloc can learn more robust representations under our proposed task setting. The source code is available at https://github.com/CMACH508/SketchGloc.

List of keywords

Machine Learning -> ML: Generative models
Computer Vision -> CV: Representation learning
Humans and AI -> HAI: Applications
Multidisciplinary Topics and Applications -> MTA: Arts and creativity

6062

Polynomial Time Presolve Algorithms for Rotation-Based Models Solving the Robust Stable Matching Problem

Sulian Le Bozec-Chiffoleau, Charles Prud’homme, Gilles Simonin

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

The Robust Stable Matching (RSM) problem involves finding a stable matching that allows getting another stable matching within a minimum number of changes when a pair becomes forbidden. It has been shown that such a problem is NP-Hard. In this paper, we enrich the mathematical model for the RSM problem based on new theoretical properties. We derive from these properties new polynomial time pre-solving algorithms which both reduce the search space and speed up the exploration. We also extend our results to the instances of the Many-to-Many problem and give its corresponding constraint programming (CP) model. We show how the use of our algorithms improve the state-of-the-art results and make it possible to obtain proofs of optimality on large instances via the CP model.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Constraint Satisfaction and Optimization -> CSO: Modeling

6095

History Repeats Itself: A Baseline for Temporal Knowledge Graph Forecasting

Julia Gastinger, Christian Meilicke, Federico Errica, Timo Sztyler, Anett Schülke, Heiner Stuckenschmidt

12 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs. To this day, standardized evaluation protocols and rigorous comparison across TKG models are available, but the importance of simple baselines is often neglected in the evaluation, which prevents researchers from discerning actual and fictitious progress. We propose to close this gap by designing an intuitive baseline for TKG Forecasting based on predicting recurring facts. Compared to most TKG models, it requires little hyperparameter tuning and no iterative training. Further, it can help to identify failure modes in existing approaches. The empirical findings are quite unexpected: compared to 11 methods on five datasets, our baseline ranks first or third in three of them, painting a radically different picture of the predictive quality of the state of the art.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Knowledge graphs and knowledge base completion
Machine Learning -> ML: Evaluation

6099

Enabling Sustainable Freight Forwarding Network via Collaborative Games

Pang-Jin Tan, Shih-Fen Cheng, Richard Chen

12 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

Freight forwarding plays a crucial role in facilitating global trade and logistics. However, as the freight forwarding market is extremely fragmented, freight forwarders often face the issue of not being able to fill the available shipping capacity. This recurrent issue motivates the creation of various freight forwarding networks that aim at exchanging capacities and demands so that the resource utilization of individual freight forwarders can be maximized. In this paper, we focus on how to design such a collaborative network based on collaborative game theory, with the Shapley value representing a fair scheme for profit sharing. Noting that the exact computation of Shapley values is intractable for large-scale real-world scenarios, we incorporate the observation that collaboration among two forwarders is only possible if their service routes and demands overlap. This leads to a new class of collaborative games called the Locally Collaborative Games (LCGs), where agents can only collaborate with their neighbors. We propose an efficient approach to compute Shapley values for LCGs, and numerically demonstrate that our approach significantly outperforms the state-of-the-art approach for a wide variety of network structures.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Cooperative games
Multidisciplinary Topics and Applications -> MTA: Transportation
Agent-based and Multi-agent Systems -> MAS: Applications

6111

Joint Input and Output Coordination for Class-Incremental Learning

Shuai Wang, Yibing Zhan, Yong Luo, Han Hu, Wei Yu, Yonggang Wen, Dacheng Tao

12 min. talk | August 7th at 10:00 | Session: ML: Incremental learning

[+] More

[-] Less

Incremental learning is nontrivial due to severe catastrophic forgetting. Although storing a small amount of data on old tasks during incremental learning is a feasible solution, current strategies still do not 1) adequately address the class bias problem, and 2) alleviate the mutual interference between new and old tasks, and 3) consider the problem of class bias within tasks. In light of the above issues, we analyze the cause of class bias in incremental learning, as well as the drawbacks of existing approaches, and propose a joint input and output coordination (JIOC) mechanism to address these issues. This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks. The proposed mechanism is general and flexible, and can be incorporated into different incremental learning approaches that use memory storage. Extensive experiments show that our mechanism can significantly improve their performance.

List of keywords

Machine Learning -> ML: Incremental learning

6113

Quantitative Reasoning over Incomplete Abstract Argumentation Frameworks

Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Giuseppina Monterosso

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

We introduce PERCVER and PERCACC, the problems asking for the percentages of the completions of an incomplete Abstract Argumentation Framework (iAAF) where a set of arguments S is an extension and an argument a is accepted, respectively. These problems give insights into the status of S and a more precise than the “traditional” verification and acceptance tests under the possible and necessary perspectives, that decide if S is an extension and a is accepted in at least one or every completion, respectively. As a first contribution, we investigate the relationship between the proposed framework and probabilistic AAFs (prAAFs) under the constellations approach (that, at first sight, seem to be suitable for starightforwardly encoding the quantitative reasoning underlying PERCVER and PERCACC). In this regard, we show that translating an iAAF into an equivalent prAAF requires a heavy computational cost: this backs the study of PERCVER and PERCACC as new distinguished problems. Then, we investigate the complexity of PERCVER and PERCACC, and we identify interesting islands of tractability.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

6118

Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval

Tiantian Gong, Junsheng Wang, Liyan Zhang

6 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

Traditional text-image person retrieval methods heavily rely on fully matched and identity-annotated multimodal data, representing an ideal yet limited scenario. The issues of handling incomplete multimodal data and the complexities of labeling multimodal data are common challenges encountered in real-world applications. In response to these challenges encountered, we consider a more robust and pragmatic setting termed unsupervised incomplete text-image person retrieval, where person images and text descriptions are not fully matched and lack the supervision of identity labels. To tackle these two problems, we propose the Enhancing Cross-modal Completion and Alignment (ECCA) method. Specifically, we propose a feature-level cross-modal completion strategy for incomplete data. This approach leverages the available cross-modal high semantic similarity features to construct relational graphs for missing modal data, which can generate more reliable completion features. Additionally, to address the cross-modal matching ambiguity, we propose weighted inter-instance granularity alignment as well as enhanced prototype-wise granularity alignment modules that can map semantically similar image-text pairs more compact in the common embedding space. Extensive experiments on public datasets, fully demonstrate the consistent superiority of our method over SOTA text-image person retrieval methods.

List of keywords

Computer Vision -> CV: Multimodal learning

6136

Simple Contrastive Multi-View Clustering with Data-Level Fusion

Caixuan Luo, Jie Xu, Yazhou Ren, Junbo Ma, Xiaofeng Zhu

6 min. talk | August 6th at 11:30 | Session: ML: Multi-modal learning

[+] More

[-] Less

Previous deep multi-view clustering methods usually design un-shared encoders to explore the cluster information among multi-view data, but they are difficult to customize the encoders for individual views and easily increase information loss. To address these issues, we propose a simple yet effective contrastive multi-view clustering framework. Specifically, different from using feature-level fusion in previous methods, we first propose a data-level fusion method to fuse multi-view information, which produces a fused data to replace all views and thus avoids customizing networks for different views. Then, we simulate the data noise and unavailability in multiple views to design two kinds of data augmentation for the fused data, making a shared encoder with simple contrastive learning to learn robust features and achieve the interaction across views. As a result, our method is a general framework and we base on it to conduct feature clustering and end-to-end clustering. Extensive experiments demonstrate that our method can explore the discriminative information in multi-view data and achieve superior clustering performance.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Self-supervised Learning

6163

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Till Hofmann, Hector Geffner

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative planning FOND planning method that searches for solutions not in the given state space but in an abstract state space defined by features that must be learned as well.

List of keywords

Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Planning under uncertainty

6164

Denoising Diffusion-Augmented Hybrid Video Anomaly Detection via Reconstructing Noised Frames

Kai Cheng, Yaning Pan, Yang Liu, Xinhua Zeng, Rui Feng

6 min. talk | August 6th at 11:30 | Session: CV: Video analysis and understanding

[+] More

[-] Less

Video Anomaly Detection (VAD) is crucial for enhancing security and surveillance systems through automatic identification of irregular events, thereby enabling timely responses and augmenting overall situational awareness. Although existing methods have achieved decent detection performances on benchmarks, their predicted objects still remain ambiguous in terms of the semantic aspect. To overcome this limitation, we propose the Denoising diffusion-augmented Hybrid Video Anomaly Detection (DHVAD) framework. The proposed Denoising diffusion-based Reconstruction Unit (DRU) enhances the understanding of semantically accurate normality as a crucial component in DHVAD. Meanwhile, we propose a detection strategy that integrates the advantages of a prediction-based Frame Prediction Unit (FPU) with DRU by exploring the spatial-temporal consistency seamlessly. The competitive performance of DHVAD compared with state-of-the-art methods on three benchmark datasets proves the effectiveness of our framework. The extended experimental analysis demonstrates that our framework can gain a better understanding of the normality in terms of semantic accuracy for VAD and efficiently leverage the strengths of both components.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding

6180

DGCD: An Adaptive Denoising GNN for Group-level Cognitive Diagnosis

Haiping Ma, Siyu Song, Chuan Qin, Xiaoshan Yu, Limiao Zhang, Xingyi Zhang, Hengshu Zhu

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

Group-level cognitive diagnosis, pivotal in intelligent education, aims to effectively assess group-level knowledge proficiency by modeling the learning behaviors of individuals within the group. Existing methods typically conceptualize the group as an abstract entity or aggregate the knowledge levels of all members to represent the group’s overall ability. However, these methods neglect the high-order connectivity among groups, students, and exercises within the context of group learning activities, along with the noise present in their interactions, resulting in less robust and suboptimal diagnosis performance. To this end, in this paper, we propose DGCD, an adaptive Denoising graph neural network for realizing effective Group-level Cognitive Diagnosis. Specifically, we first construct a group-student-exercise (GSE) graph to explicitly model higher-order connectivity among groups, students, and exercises, contributing to the acquisition of informative representations. Then, we carefully design an adaptive denoising module, integrated into the graph neural network, to model the reliability distribution of student-exercise edges for mining purer interaction features. In particular, edges of lower reliability are more prone to exclusion, thereby reducing the impact of noisy interactions. Furthermore, recognizing the relational imbalance in the GSE graph, which could potentially introduce bias during message passing, we propose an entropy-weighted balance module to mitigate such bias. Finally, extensive experiments conducted on four real-world educational datasets clearly demonstrate the effectiveness of our proposed DGCD model. The code is available at https://github.com/BIMK/Intelligent-Education/tree/main/DGCD.

List of keywords

Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MTA: Education

6185

TSESNet: Temporal-Spatial Enhanced Breast Tumor Segmentation in DCE-MRI Using Feature Perception and Separability

Jiezhou He, Xue Zhao, Zhiming Luo, Songzhi Su, Shaozi Li, Guojun Zhang

6 min. talk | August 8th at 10:00 | Session: CV: Segmentation

[+] More

[-] Less

Accurate segmentation of breast tumors in dynamic contrast-enhanced magnetic resonance images (DCE-MRI) is critical for early diagnosis of breast cancer. However, this task remains challenging due to the wide range of tumor sizes, shapes, and appearances. Additionally, the complexity is further compounded by the high dimensionality and ill-posed artifacts present in DCE-MRI data. Furthermore, accurately modeling features in DCE-MRI sequences presents a challenge that hinders the effective representation of essential tumor characteristics. Therefore, this paper introduces a novel Temporal-Spatial Enhanced Network (TSESNet) for breast tumor segmentation in DCE-MRI. TSESNet leverages the spatial and temporal dependencies of DCE-MRI to provide a comprehensive representation of tumor features. To address sequence modeling challenges, we propose a Temporal-Spatial Contrastive Loss (TSCLoss) that maximizes the distance between different classes and minimizes the distance within the same class, thereby improving the separation between tumors and the background. Moreover, we design a novel Temporal Series Feature Fusion (TSFF) module that effectively integrates temporal MRI features from multiple time points, enhancing the model’s ability to handle temporal sequences and improving overall performance. Finally, we introduce a simple and effective Tumor-Aware (TA) module that enriches feature representation to accommodate tumors of various sizes. We conducted comprehensive experiments to validate the proposed method and demonstrate its superior performance compared to recent state-of-the-art segmentation methods on two breast cancer DCE-MRI datasets.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Biomedical image analysis

6212

DWLR: Domain Adaptation under Label Shift for Wearable Sensor

Juren Li, Yang Yang, Youmin Chen, Jianfeng Zhang, Zeyu Lai, Lujia Pan

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Wearable sensors play a crucial role in real-world scenarios, such as human activity recognition, sleep monitoring and electrocardiogram monitoring. However, deploying classifiers on them is challenged by distribution shifts across users and devices. Unsupervised domain adaptation (UDA) is proposed to address this, yet existing methods mostly focus on feature distribution shift, neglecting the potential misclassification due to label shift. In this paper, we propose Domain adaptation under label shift for Wearable sensor with Learnable Reweighting (DWLR) to handle both feature and label shifts. Specifically, DWLR employs learnable reweighting to align label distributions between source and target domains. It incorporates elements of information gain during the reweighting process to counter potential distribution shift that could emerge from over-reliance on data with high-confidence pseudo labels. Importantly, since wearable sensor data is time-series data, and can be subjected to distribution shifts originating from either the time domain, the frequency domain, or both, DWLR performs reweighting and alignment separately in these two domains to more robustly handle potential feature distribution shifts. Extensive experiments on three distinct wearable sensor datasets demonstrate the effectiveness of DWLR, yielding a remarkable average performance improvement of 5.85%.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Deep learning architectures

6217

Beyond What If: Advancing Counterfactual Text Generation with Structural Causal Modeling

Ziao Wang, Xiaofeng Zhang, Hongwei Du

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Exploring the realms of counterfactuals, this paper introduces a versatile approach in text generation using structural causal models (SCM), broadening the scope beyond traditional singular causal studies to encompass complex, multi-layered relationships. To comprehensively explore these intricate, multi-layered causal relationships in text generation, we introduce a generalized approach based on the structural causal model (SCM), adept at handling complex causal interactions in a spectrum ranging from everyday stories to financial reports.Specifically, our method begins by disentangling each component of the text into pairs of latent variables, representing elements that remain unchanged and those subject to variation. Subsequently, counterfactual interventions are applied to these latent variables, facilitating the generation of outcomes that are influenced by complex causal dynamics. Extensive experiments have been conducted on both a public story generation dataset and a specially constructed dataset in the financial domain. The experimental results demonstrate that our approach achieves state-of-the-art performance across a range of automatic and human evaluation criteria, underscoring its effectiveness and versatility in diverse text generation contexts.

List of keywords

Natural Language Processing -> NLP: Language generation
Machine Learning -> ML: Causality
Natural Language Processing -> NLP: Applications

6218

Enhancing Multimodal Knowledge Graph Representation Learning through Triple Contrastive Learning

Yuxing Lu, Weichen Zhao, Nan Sun, Jinzhuo Wang

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Multimodal knowledge graphs incorporate multimodal information rather than pure symbols, which significantly enhance the representation of knowledge graphs and their capacity to understand the world. Despite these advancements, existing multimodal fusion techniques still face significant challenges in representing modalities and fully integrating the diverse attributes of entities, particularly when dealing with more than one modality. To address this issue, this article proposes a Knowledge Graph Multimodal Representation Learning (KG-MRI) method. This method utilizes foundation models to represent different modalities and incorporates a triple contrastive learning model and a dual-phase training strategy to effectively fuse the different modalities with knowledge graph embeddings. We conducted comprehensive comparisons with several different knowledge graph embedding methods to validate the effectiveness of our KG-MRI model. Furthermore validation on a real-world Non-Alcohol Fatty Liver Disease (NAFLD) cohort demonstrated that the vector representations learned through our methodology possess enhanced representational capabilities, showing promise for broader applications in complex multimodal environments.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Data Mining -> DM: Knowledge graphs and knowledge base completion
Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MTA: Life sciences

6246

Improving Paratope and Epitope Prediction by Multi-Modal Contrastive Learning and Interaction Informativeness Estimation

Zhiwei Wang, Yongkang Wang, Wen Zhang

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Accurately predicting antibody-antigen binding residues, i.e., paratopes and epitopes, is crucial in antibody design. However, existing methods solely focus on uni-modal data (either sequence or structure), disregarding the complementary information present in multi-modal data, and most methods predict paratopes and epitopes separately, overlooking their specific spatial interactions. In this paper, we propose a novel Multi-modal contrastive learning and Interaction informativeness estimation-based method for Paratope and Epitope prediction, named MIPE, by using both sequence and structure data of antibodies and antigens. MIPE implements a multi-modal contrastive learning strategy, which maximizes representations of binding and non-binding residues within each modality and meanwhile aligns uni-modal representations towards effective modal representations. To exploit the spatial interaction information, MIPE also incorporates an interaction informativeness estimation that computes the estimated interaction matrices between antibodies and antigens, thereby approximating them to the actual ones. Extensive experiments demonstrate the superiority of our method in predicting paratopes and epitopes compared to baselines. Additionally, the ablation studies and visualizations demonstrate the superiority of MIPE owing to the better representations acquired through multi-modal contrastive learning and the interaction patterns comprehended by the interaction informativeness estimation.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Health and medicine

6256

MacMic: Executing Iceberg Orders via Hierarchical Reinforcement Learning

Hui Niu, Siyuan Li, Jian Li

12 min. talk | August 7th at 10:00 | Session: MTA: Finance

[+] More

[-] Less

In recent years, there has been a growing interest in applying reinforcement learning (RL) techniques to order execution owing to RL’s strong sequential decision-making ability. However, realistic order execution tasks usually involve a large fine-grained action space and a long trading duration. The former hinders the RL agents from efficient exploration. The latter increases the task complexity, since the agent must capture price advantages throughout the day as well as micro changes within a few seconds on the limited order books. In addressing these challenges, we propose MacMic, a novel Hierarchical RL-based order execution approach that captures market patterns and executes orders from different temporal scales. MacMic employs a high-level agent to split the parent order into smaller slices at coarse-grained time steps. Then a low-level agent is adopted to execute these slices by placing fixed-size sub-orders at a continuous time. Besides, to balance the multifaceted objectives of the two tasks, MacMic pretrains a causal stacking hidden Markov model (SHMM) to obtain both effective macro-level and micro-level market states. Comprehensive experimental results on 200 stocks across the US and China A-share markets validate the effectiveness of the proposed method.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Finance
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Representation learning

6260

Randomized Learning-Augmented Auctions with Revenue Guarantees

Ioannis Caragiannis, Georgios Kalantzis

12 min. talk | August 8th at 11:30 | Session: GTEP: Mechanism design

[+] More

[-] Less

We consider the fundamental problem of designing a truthful single-item auction with the challenging objective of extracting a large fraction of the highest agent valuation as revenue. Following a recent trend in algorithm design, we assume that the agent valuations belong to a known interval, and a prediction for the highest valuation is available. Then, auction design aims for high consistency and robustness, meaning that, for appropriate pairs of values γ and ρ, the extracted revenue should be at least a γ- or ρ-fraction of the highest valuation when the prediction is correct for the input instance or not. We characterize all pairs of parameters γ and ρ so that a randomized γ-consistent and ρ-robust auction exists. Furthermore, for the setting in which robustness can be a function of the prediction error, we give sufficient and necessary conditions for the existence of robust auctions and present randomized auctions that extract a revenue that is only a polylogarithmic (in terms of the prediction error) factor away from the highest agent valuation.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

6274

Fine-tuning Pre-trained Models for Robustness under Noisy Labels

Sumyeong Ahn, Sihyeon Kim, Jongwoo Ko, Se-Young Yun

6 min. talk | August 9th at 10:00 | Session: ML: Robustness

[+] More

[-] Less

The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. In response to this issue, researchers have focused on identifying clean samples and reducing the influence of noisy labels. Recent works in this field have achieved notable success in terms of generalizability, albeit at the expense of extensive computing resources. Therefore, reducing computational costs remains a crucial challenge. Concurrently, in other research areas, there has been a focus on developing fine-tuning techniques to efficiently achieve high generalization performance. Despite their proven efficiently achievable generalization capabilities, these techniques have seen limited exploration from a label noise point of view. In this research, we aim to find an effective approach to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we empirically investigate the characteristics of pre-trained models on noisy labels and propose an algorithm, named TURN. We present the results of extensive testing and demonstrate both efficient and improved denoising performance on various benchmarks, surpassing previous methods.

List of keywords

Machine Learning -> ML: Robustness
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Trustworthy machine learning

6278

Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction

Kexin Zhang, Feng Huang, Luotao Liu, Zhankun Xiong, Hongyu Zhang, Yuan Quan, Wen Zhang

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associations remain less explored. In this paper, we propose a Heterogeneous Causal Metapath Graph Neural Network (HCMGNN) to predict GMD associations. HCMGNN constructs a heterogeneous graph linking genes, microbes, and diseases through their pairwise associations, and utilizes six predefined causal metapaths to extract directed causal subgraphs, which facilitate the multi-view analysis of causal relations among three entity types. Within each subgraph, we employ a causal semantic sharing message passing network for node representation learning, coupled with an attentive fusion method to integrate these representations for predicting GMD associations. Our extensive experiments show that HCMGNN effectively predicts GMD associations and addresses association sparsity issue by enhancing the graph’s semantics and structure.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Health and medicine

6298

Cross-modal Generation and Alignment via Attribute-guided Prompt for Unsupervised Text-based Person Retrieval

Zongyi Li, Jianbo Li, Yuxuan Shi, Hefei Ling, Jiazhong Chen, Runsheng Wang, Shijuan Huang

6 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

Text-based Person Search aims to retrieve a specified person using a given text query. Current methods predominantly rely on paired labeled image-text data to train the cross-modality retrieval model, necessitating laborious and time-consuming labeling. In response to this challenge, we present the Cross-modal Generation and Alignment via Attribute-guided Prompt framework (GAAP) for fully unsupervised text-based person search, utilizing only unlabeled images. Our proposed GAAP framework consists of two key parts: Attribute-guided Prompt Caption Generation and Attribute-guided Cross-modal Alignment module. The Attribute-guided Prompt Caption Generation module generates pseudo text labels by feeding the attribute prompts into a large-scale pre-trained vision-language model. These synthetic texts are then meticulously selected through a sample selection, ensuring the reliability for subsequent fine-tuning. The Attribute-guided Cross-modal Alignment module encompasses three sub-modules for feature alignment across modalities. Firstly, Cross-Modal Center Alignment (CMCA) aligns the samples with different modality centroids. Subsequently, to address ambiguity arising from local attribute similarities, an Attribute-guided Image-Text Contrastive Learning module (AITC) is proposed to facilitate the alignment of relationships among different pairs by considering local attribute similarities. Lastly, the Attribute-guided Image-Text Matching (AITM) module is introduced to mitigate noise in pseudo captions by using the image-attribute matching score to soften the hard matching labels. Empirical results showcase the effectiveness of our method across various text-based person search datasets under the fully unsupervised setting.

List of keywords

Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Vision, language and reasoning
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Unsupervised learning

6310

A Context-Enhanced Framework for Sequential Graph Reasoning

Shuo Shi, Chao Peng, Chenyang Xu, Zhengfeng Yang

6 min. talk | August 8th at 15:00 | Session: ML: Sequence and graph learning

[+] More

[-] Less

The paper studies sequential reasoning over graph-structured data, which stands as a fundamental task in various trending fields like automated math problem solving and neural graph algorithm learning, attracting a lot of research interest. Simultaneously managing both sequential and graph-structured information in such tasks presents a notable challenge. Over recent years, many neural architectures in the literature have emerged to tackle the issue. In this work, we generalize the existing architectures and propose a context-enhanced framework. The crucial innovation is that the reasoning of each step does not only rely on the outcome of the preceding step but also leverages the aggregation of information from more historical outcomes. The idea stems from our observation that in sequential graph reasoning, each step’s outcome has a much stronger inner connection with each other compared to traditional seq-to-seq tasks. We show that the framework can effectively integrate with the existing methods, enhancing their reasoning abilities. Empirical evaluations are conducted on the challenging CLRS Reasoning Benchmark, and the results demonstrate that the proposed framework significantly improves the performance of existing architectures, yielding state-of-the-art results across the majority of the datasets within the benchmark.

List of keywords

Machine Learning -> ML: Sequence and graph learning

6326

Knowledge Compilation for Incremental and Checkable Stochastic Boolean Satisfiability

Che Cheng, Yun-Rong Luo, Jie-Hong R. Jiang

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

Knowledge compilation has proven effective in (weighted) model counting, uniquely supporting incrementality and checkability. For incrementality, compiling an input formula once suffices to answer multiple queries, thus reducing the total solving effort. For checkability, the compiled formula is amenable to producing machine-checkable proofs for verification, thus strengthening the solver’s reliability. In this work, we extend knowledge compilation from model counting to stochastic Boolean satisfiability (SSAT) solving by generalizing the dec-DNNF representation to accommodate the SSAT quantifier structure and integrate it into SharpSSAT, a state-of-the-art SSAT solver. We further study proof generation from the compiled representation and extend CPOG, a certified model-counting toolchain, to generate proofs for certifying the results of SharpSSAT. Experimental results show the benefits of the proposed knowledge compilation approach for SSAT in sharing computation efforts for multiple queries and producing checkable dec-DNNF logs with negligible overhead.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Knowledge Representation and Reasoning -> KRR: Knowledge compilation

6337

FairGT: A Fairness-aware Graph Transformer

Renqiang Luo, Huafei Huang, Shuo Yu, Xiuzhen Zhang, Feng Xia

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

The design of Graph Transformers (GTs) often neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods are not directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-aware Graph Transformer explicitly crafted to mitigate fairness concerns inherent in GTs. FairGT incorporates a meticulous structural feature selection strategy and a multi-hop node feature integration method, ensuring independence of sensitive features and bolstering fairness considerations. These fairness-aware graph information encodings seamlessly integrate into the Transformer framework for downstream tasks. We also prove that the proposed fair structural topology encoding with adjacency matrix eigenvector selection and multi-hop integration are theoretically effective. Empirical evaluations conducted across five real-world datasets demonstrate FairGT’s superiority in fairness metrics over existing graph transformers, graph neural networks, and state-of-the-art fairness-aware graph learning approaches.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Trustworthy machine learning

6355

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Eline M. Bovy, Marnix Suilen, Sebastian Junges, Nils Jansen

6 min. talk | August 8th at 15:00 | Session: UAI: Uncertainty in AI

[+] More

[-] Less

Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPs give access to results for POSGs, studied in game theory; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

List of keywords

Planning and Scheduling -> PS: POMDPs
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Uncertainty in AI -> UAI: Sequential decision making

6366

Generate Synthetic Text Approximating the Private Distribution with Differential Privacy

Wenhao Zhao, Shaoyang Song, Chunlai Zhou

12 min. talk | August 9th at 11:30 | Session: NLP: Language models

[+] More

[-] Less

Due to the potential leakage of sensitive information in text, there is a societal call for feeding privacy-preserving text to model training. Recently, a lot of work showed that using synthetic text with differential privacy, rather than private text, can provide a strong privacy protection. However, achieving higher semantic similarity between synthetic and private text has not been thoroughly investigated. In this paper, we propose an approach that combines the iteratively optimized mindset from genetic algorithms to align the distribution of synthetic text with that of private text. Furthermore, not only does the final synthetic text meet the requirements of privacy protection, but also has a high level of quality. Through comparisons with various baselines on different datasets, we demonstrate that our synthetic text can closely match the utility of private text, while providing privacy protection standards robust enough to resist membership inference attacks from malicious users.

List of keywords

Natural Language Processing -> NLP: Language models
Multidisciplinary Topics and Applications -> MTA: Security and privacy

6377

Real-World Networks Are Low-Dimensional: Theoretical and Practical Assessment

Tobias Friedrich, Andreas Göbel, Maximilian Katzmann, Leon Schiller

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Recent empirical evidence suggests that real-world networks have very low underlying dimensionality. We provide a theoretical explanation for this phenomenon as well as develop a linear-time algorithm for detecting the underlying dimensionality of such networks. Our theoretical analysis considers geometric inhomogeneous random graphs (GIRGs), a geometric random graph model, which captures a variety of properties observed in real-world networks. These properties include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbors of a vertex are adjacent. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions d, when the latter is at most logarithmic in n, the number of vertices. Hence, for a GIRG to behave like many real-world networks and have a non-vanishing clustering coefficient, it must come from a geometric space of o(log n) dimensions. Our analysis on GIRGs allows us to obtain a linear-time algorithm for determining the dimensionality of a network. Our algorithm bridges the gap between theory and practice, as it comes with a rigorous proof of correctness and yields results comparable to prior empirical approaches, as indicated by our experiments on real-world instances. The efficiency of our algorithm makes it applicable to very large data-sets. We conclude that very low dimensionalities (from 1 to 10) are needed to explain properties of real-world networks.

List of keywords

Data Mining -> DM: Networks
Data Mining -> DM: Theoretical foundations of data mining

6379

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (4/6)

[+] More

[-] Less

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees (epsilon, delta)-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of O(1/sqrt(nK)), where n and K are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to n. Leveraging the moments accountant method, we further derive an optimal K to maximize the model utility under certain privacy budget in decentralized settings. With this optimized K, PrivSGP-VR achieves a tight utility bound of O(sqrt(d*log(1/delta))/(sqrt(n)*J*epsilon)), where J and d are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of 1/sqrt(n) improvement compared to that of the existing decentralized counterparts, such as A(DP)2SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized K, in fully decentralized settings.

List of keywords

Machine Learning -> ML: Trustworthy machine learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Optimization

6387

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

6 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied intelligence. However, current 3D-VLP datasets are hindered by limited scene-level diversity and insufficient fine-grained annotations (only 1.2K scenes and 280K textual annotations in ScanScribe), primarily due to the labor-intensive of collecting and annotating 3D scenes. To overcome these obstacles, we construct SynVL3D, a comprehensive synthetic scene-text corpus with 10K indoor scenes and 1M descriptions at object, view, and room levels, which has the advantages of diverse scene data, rich textual descriptions, multi-grained 3D-text associations, and low collection cost. Utilizing the rich annotations in SynVL3D, we pre-train a simple and unified Transformer for aligning 3D and language with multi-grained pretraining tasks. Moreover, we propose a synthetic-to-real domain adaptation in downstream task fine-tuning process to address the domain shift. Through extensive experiments, we verify the effectiveness of our model design by achieving state-of-the-art performance on downstream tasks including visual grounding, dense captioning, and question answering. Codes are available at: https://github.com/idejie/3DSyn

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Vision, language and reasoning

6389

Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms

Aneesh Komanduri, Yongkai Wu, Feng Chen, Xintao Wu

12 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using nonlinear learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior learned from auxiliary labels and the latent causal structure. We theoretically show the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Causality
Machine Learning -> ML: Generative models

6391

Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

6 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these methods fall short of distinguishing anomalous cells into pathologically distinct subtypes. In response, we propose ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. Notably, we present the first theoretical analysis of using reconstruction deviations output by generative models for anomaly detection in lieu of domain shifts. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth. Extensive benchmarks over various single-cell data and other types of tabular data demonstrate ACSleuth’s superiority over the state-of-the-art methods in identifying and subtyping anomalies in multi-sample and multi-domain contexts. Our code is available at https://github.com/Catchxu/ACsleuth.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Applications

6414

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

Zitong Zhang, Yang Liu, Hao Sun

6 min. talk | August 7th at 11:30 | Session: MTA: Physical sciences

[+] More

[-] Less

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e.g., trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues’ rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e.g., noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Physical sciences
Computer Vision -> CV: Applications
Computer Vision -> CV: Motion and tracking
Machine Learning -> ML: Regression

6416

SeeDRec: Sememe-based Diffusion for Sequential Recommendation

Haokai Ma, Ruobing Xie, Lei Meng, Yimeng Yang, Xingwu Sun, Zhanhui Kang

6 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)

[+] More

[-] Less

Inspired by the power of Diffusion Models (DM) verified in various fields, some pioneering works have started to explore DM in recommendation. However, these prevailing endeavors commonly implement diffusion on item indices, leading to the increasing time complexity, the lack of transferability, and the inability to fully harness item semantic information. To tackle these challenges, we propose SeeDRec, a sememe-based diffusion framework for sequential recommendation (SR). Specifically, inspired by the notion of sememe in NLP, SeeDRec first defines a similar concept of recommendation sememe to represent the minimal interest unit and upgrades the specific diffusion objective from the item level to the sememe level. With the Sememe-to-Interest Diffusion Model (S2IDM), SeeDRec can accurately capture the user’s diffused interest distribution learned from both local interest evolution and global interest generalization while maintaining low computational costs. Subsequently, an Interest-aware Prompt-enhanced (IPE) strategy is proposed to better guide each user’s sequential behavior modeling via the learned user interest distribution. Extensive experiments on nine SR datasets and four cross-domain SR datasets verify its effectiveness and universality. The code is available in https://github.com/hulkima/SeeDRec.

List of keywords

Data Mining -> DM: Recommender systems

6436

Adaptive Order Q-learning

Tao Tan, Hong Xie, Defu Lian

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

This paper revisits the estimation bias control problem of Q-learning, motivated by the fact that the estimation bias is not always evil, i.e., some environments benefit from overestimation bias or underestimation bias, while others suffer from these biases. Different from previous coarse-grained bias control methods, this paper proposes a fine-grained bias control algorithm called Order Q-learning. It uses the order statistic of multiple independent Q-tables to control bias and flexibly meet the personalized bias needs of different environments, i.e., the bias can vary from underestimation bias to overestimation bias as one selects a higher order Q-value. We derive the expected estimation bias and its lower bound and upper bound. They reveal that the expected estimation bias is inversely proportional to the number of Q-tables and proportional to the index of order statistic function. To show the versatility of Order Q-learning, we design an adaptive parameter adjustment strategy, leading to AdaOrder (Adaptive Order) Q-learning. It adaptively selects the number of Q-tables and the index of order statistic function via the number of visits to state-action pair and the average Q-value. We extend Order Q-learning and AdaOrder Q-learning to the large scale setting with function approximation, leading to Order DQN and AdaOrder DQN, respectively. Finally, we consider two experiment settings: deep reinforcement learning experiments show that our method outperforms several SOTA baselines drastically; tabular MDP experiments reveal fundamental insights into why our method can achieve superior performance.Our supplementary file can be found in https://1drv.ms/f/s!Atddp1iaDmL2gjv31CaGquw5WwYI.

List of keywords

Machine Learning -> ML: Reinforcement learning

6438

EAB-FL: Exacerbating Algorithmic Bias through Model Poisoning Attacks in Federated Learning

Syed Irfan Ali Meerza, Jian Liu

6 min. talk | August 8th at 11:30 | Session: ETF: AI Ethics, Trust, Fairness (2/2)

[+] More

[-] Less

Federated Learning (FL) is a technique that allows multiple parties to train a shared model collaboratively without disclosing their private data. It has become increasingly popular due to its distinct privacy advantages. However, FL models can suffer from biases against certain demographic groups (e.g., racial and gender groups) due to the heterogeneity of data and party selection. Researchers have proposed various strategies for characterizing the group fairness of FL algorithms to address this issue. However, the effectiveness of these strategies in the face of deliberate adversarial attacks has not been fully explored. Although existing studies have revealed various threats (e.g., model poisoning attacks) against FL systems caused by malicious participants, their primary aim is to decrease model accuracy, while the potential of leveraging poisonous model updates to exacerbate model unfairness remains unexplored. In this paper, we propose a new type of model poisoning attack, EAB-FL, with a focus on exacerbating group unfairness while maintaining a good level of model utility. Extensive experiments on three datasets demonstrate the effectiveness and efficiency of our attack, even with state-of-the-art fairness optimization algorithms and secure aggregation rules employed. We hope this work will help the community fully understand the attack surfaces of current FL systems and facilitate corresponding mitigation to improve their resilience.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Bias
Machine Learning -> ML: Adversarial machine learning

6447

Quantitative Claim-Centric Reasoning in Logic-Based Argumentation

Markus Hecher, Yasir Mahmood, Arne Meier, Johannes Schmidt

12 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

Argumentation is a well-established formalism for nonmonotonic reasoning, with popular frameworks being Dung’s abstract argumentation (AFs) or logic-based argumentation (Besnard-Hunter’s framework). Structurally, a set of formulas forms support for a claim if it is consistent, subset-minimal, and implies the claim. Then, an argument comprises support and a claim. We observe that the computational task (ARG) of asking for support of a claim in a knowledge base is “brave”, since many claims with a single support are accepted. As a result, ARG falls short when it comes to the question of confidence in a claim, or claim strength. In this paper, we propose a concept for measuring the (acceptance) strength of claims, based on counting supports for a claim. Further, we settle classical and structural complexity of counting arguments favoring a given claim in propositional knowledge bases (KBs). We introduce quantitative reasoning to measure the strength of claims in a KB and to determine the relevance strength of a formula for a claim.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

6450

Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou

6 min. talk | August 6th at 11:30 | Session: MTA: Multidisciplinary Topics and Applications (1/2)

[+] More

[-] Less

Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks — In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Web and social networks
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining text, web, social media

6458

Online Learning of Capacity-Based Preference Models

Margot Herin, Patrice Perny, Nataliya Sokolovska

12 min. talk | August 8th at 15:00 | Session: UAI: Uncertainty in AI

[+] More

[-] Less

In multicriteria decision making, sophisticated decision models often involve a non-additive set function (named capacity) to define the weights of all subsets of criteria. This makes it possible to model criteria interactions, leaving room for a diversity of attitudes in criteria aggregation. Fitting a capacity-based decision model to a given Decision Maker is a challenging problem and several batch learning methods have been proposed in the literature to derive the capacity from a database of preference examples. In this paper, we introduce an online algorithm for learning a sparse representation of the capacity, designed for decision contexts where preference examples become available sequentially. Our method based on regularized dual averaging is also well fitted to decision contexts involving a large number of preference examples or a large number of criteria. Moreover, we propose a variant making it possible to include normative constraints on the capacity (e.g., monotonicity, supermodularity) while preserving scalability, based on the alternating direction method of multipliers.

List of keywords

Uncertainty in AI -> UAI: Decision and utility theory
Machine Learning -> ML: Learning preferences or rankings
Machine Learning -> ML: Online learning

6460

The Trembling-Hand Problem for LTLf Planning

Pian Yu, Shufang Zhu, Giuseppe De Giacomo, Marta Kwiatkowska, Moshe Vardi

6 min. talk | August 9th at 11:30 | Session: KRR: Reasoning about actions

[+] More

[-] Less

Consider an agent acting to achieve its temporal goal, but with a “trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Planning and Scheduling -> PS: Markov decisions processes

6466

MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge

Yuxuan Zhou, Xien Liu, Chen Ning, Ji Wu

12 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

Large language models (LLMs) have excelled across domains, also delivering notable performance on the medical evaluation benchmarks, such as MedQA. However, there still exists a significant gap between the reported performance and the practical effectiveness in real-world medical scenarios. In this paper, we aim to explore the causes of this gap by employing a multifaceted examination schema to systematically probe the actual mastery of medical knowledge by current LLMs. Specifically, we develop a novel evaluation framework MultifacetEval to examine the degree and coverage of LLMs in encoding and mastering medical knowledge at multiple facets (comparison, rectification, discrimination, and verification) concurrently. Based on the MultifacetEval framework, we construct two multifaceted evaluation datasets: MultiDiseK (by producing questions from a clinical disease knowledge base) and MultiMedQA (by rephrasing each question from a medical benchmark MedQA into multifaceted questions). The experimental results on these multifaceted datasets demonstrate that the extent of current LLMs in mastering medical knowledge is far below their performance on existing medical benchmarks, suggesting that they lack depth, precision, and comprehensiveness in mastering medical knowledge. Consequently, current LLMs are not yet ready for application in real-world medical tasks. The codes and datasets are available at https://github.com/THUMLP/MultifacetEval.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Applications

6472

ATTA: Adaptive Test-Time Adaptation for Multi-Modal Sleep Stage Classification

Ziyu Jia, Xihao Yang, Chenyang Zhou, Haoyang Deng, Tianzi Jiang

6 min. talk | August 9th at 11:30 | Session: MTA: Health and medicine

[+] More

[-] Less

Sleep stage classification is crucial for sleep quality assessment and disease diagnosis. Although some recent studies have made great strides in sleep stage classification performance, direct application to multi-modal sleep data with cross-domain distributional variations still poses challenges: 1) How to retain the sleep knowledge acquired by the model from the source domain during cross-domain adaptation to avoid catastrophic forgetting. 2) How to evaluate the contribution of different modalities in identifying specific sleep stages to serve test-time adaptation (TTA). 3) How to dynamically adapt the sleep model to different distribution shift in data domains of different subjects. To address these challenges, we propose an Adaptive Test-Time Adaptation (ATTA) method, a multi-modal test-time adaptation method for sleep stage classification. Specifically, the intra-modal retained-adaptive module is proposed for adapting to the target domain data while retaining the sleep knowledge acquired from the source domain to avoid catastrophic forgetting. The inter-modal contribution assessment module is designed to adaptively assess the contribution of each modality to the identification of specific sleep stages. Furthermore, the adaptive learning rate strategy utilizes a memory bank to record data from different subjects during testing, and based on this, it measures the differences between the target subject and those in the memory bank. According to the difference, the model adapts to the subject samples with different learning rates. We conduct experiments on mutual migration on two sleep datasets, SleepEDF and SHHS. The results show that our ATTA method outperforms state-of-the-art baselines in sleep stage classification.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Health and medicine
Machine Learning -> ML: Applications
Humans and AI -> HAI: Brain sciences
Machine Learning -> ML: Time series and data streams

6480

Endogenous Energy Reactive Modules Games: Modelling Side Payments among Resource-Bounded Agents

Julian Gutierrez, David Hyland, Muhammad Najib, Giuseppe Perelli, Michael Wooldridge

12 min. talk | August 8th at 11:30 | Session: MAS: Formal verification, validation and synthesis

[+] More

[-] Less

We introduce Energy Reactive Modules Games (ERMGs), an extension of Reactive Modules Games (RMGs) in which actions incur an energy cost (which may be positive or negative), and the choices that players make are restricted by the energy available to them. In ERMGs, each action is associated with an energy level update, which determines how their energy level is affected by the performance of the action. In addition, agents are provided with an initial energy allowance. This allowance plays a crucial role in shaping an agent’s behaviour, as it must be taken into consideration when one is determining their strategy: agents may only perform actions if they have the requisite energy. We begin by studying rational verification for ERMGs and then introduce Endogenous ERMGs, where agents can choose to transfer their energy to other agents. This exchange may enable equilibria that are impossible to achieve without such transfers. We study the decision problem of whether a stable outcome exists under both the Nash equilibrium and Core solution concepts.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis

6491

Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu

6 min. talk | August 9th at 11:30 | Session: DM: Mining graphs (3/3)

[+] More

[-] Less

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph’s structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https: //github.com/LiuChuang0059/StructMAE.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Self-supervised Learning

6496

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models

Maitreya Suin, Rama Chellappa

12 min. talk | August 7th at 11:30 | Session: CV: Biometrics, face, gesture and pose recognition

[+] More

[-] Less

Recent generative methods have shown promising blind face restoration performance. They usually project the degraded images to the latent space and then decode high-quality faces either by single-stage latent optimization or directly from the encoding. Generating fine-grained facial details faithful to inputs remains challenging. Most existing methods produce either overly smooth outputs or alter the identity. This could be attributed to the typical trade-off between quality and resolution in the latent space. If the latent is highly compressed, the decoded output is more robust to degradations but shows worse fidelity. On the other hand, a more flexible latent space can capture intricate details better, but is extremely difficult to optimize for highly degraded faces. We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings. We iteratively recover the clean embedding conditioning on the degraded counterpart. Furthermore, to ensure the reverse diffusion trajectory does not deviate from the underlying identity, we train a separate Identity Recovery Network and use its output to constrain the reverse diffusion. Specifically, using a learnable latent mask, we add gradients from a face-recognition network to a subset of latent features that correlates with the finer identity-related details in the pixel space, leaving the other features untouched. Disentanglement between perception and fidelity in the latent space allows us to achieve the best of both worlds. We perform extensive evaluations on multiple real and synthetic datasets to validate our approach.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Image and video synthesis and generation
Machine Learning -> ML: Generative models

6505

Anytime Sorting Algorithms

Emma Caizergues, François Durand, Fabien Mathieu

12 min. talk | August 8th at 15:00 | Session: UAI: Uncertainty in AI

[+] More

[-] Less

This paper addresses the anytime sorting problem, aiming to develop algorithms providing tentative estimates of the sorted list at each execution step. Comparisons are treated as steps, and the Spearman’s footrule metric evaluates estimation accuracy. We propose a general approach for making any sorting algorithm anytime and introduce two new algorithms: multizip sort and Corsort. Simulations showcase the superior performance of both algorithms compared to existing methods. Multizip sort keeps a low global complexity, while Corsort produces intermediate estimates surpassing previous algorithms.

List of keywords

Uncertainty in AI -> UAI: Decision and utility theory
Uncertainty in AI -> UAI: Uncertainty representations

6527

Efficient Federated Multi-View Clustering with Integrated Matrix Factorization and K-Means

Wei Feng, Zhenwei Wu, Qianqian Wang, Bo Dong, Zhiqiang Tao, Quanxue Gao

12 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view clustering is a popular unsupervised multi-view learning method. Real-world multi-view data are often distributed across multiple entities, presenting a challenge for performing multi-view clustering. Federated learning provides a solution by enabling multiple entities to collaboratively train a global model. However, existing federated multi-view clustering methods usually conduct feature extraction and clustering in separate steps, potentially leading to a degradation in clustering performance. To address this issue and for the sake of efficiency, we propose a novel Federated Multi-View Clustering method with Integrated Matrix Factorization and K-Means (FMVC-IMK), which integrates matrix factorization and multi-view K-means into one step. Additionally, an adaptive weight is employed to balance the influence of data from each view. FMVC-IMK further incorporates a graph-based regularizer to preserve the original data’s geometric structure within the learned global clustering structure. We also develop a federated optimization approach to collaboratively learn a global clustering result without sharing any original data. Experimental results on multiple datasets demonstrate the effectiveness of FMVC-IMK.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Multi-view learning

6533

Federated Multi-View Clustering via Tensor Factorization

Wei Feng, Zhenwei Wu, Qianqian Wang, Bo Dong, Zhiqiang Tao, Quanxue Gao

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view clustering is an effective method to process massive unlabeled multi-view data. Since data of different views may be collected and held by different parties, it becomes impractical to train a multi-view clustering model in a centralized way, for the sake of privacy. However, federated multi-view clustering is challenging because multi-view learning has to consider the complementary and consistent information between each view distributed across different clients. For another, efficiency is highly expected in federated scenarios. Therefore, we propose a novel federated multi-view clustering method with tensor factorization (TensorFMVC), which is built based on K-means and hence is more efficient. Besides, TensorFMVC avoids initializing centroids to address the performance degradation of K-means due to its sensitivity to centroid initialization. A three-order tensor stacked by cluster assignment matrices is introduced to exploit the complementary information and spatial structure of different views. Furthermore, we divide the optimization into several subproblems and develop a federated optimization approach to support cooperative model training. Extensive experiments on several datasets demonstrate that our proposed method exhibits superior performance in federated multi-view clustering.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Multi-view learning

6537

Reconstruction Weighting Principal Component Analysis with Fusion Contrastive Learning

Qianqian Wang, Meiling Liu, Wei Feng, Mengping Jiang, Haiming Xu, Quanxue Gao

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Principal component analysis (PCA) is a popular unsupervised dimensionality reduction method to extract the principal components of data. However, there are two problems with the existing PCA: (1) Traditional PCA methods treat each sample equally and ignore sample differences. (2) They fail to extract the discriminative features required by recognition tasks. To overcome these problems, we incorporate contrastive learning to develop a novel weighted PCA algorithm. Specifically, our method weights the reconstruction error of individual samples to reduce the influence of outliers. Besides, it integrates contrastive learning into PCA to increase inter-class distances and reduce intra-class distance, which helps to improve PCA’s discriminative capability. We further develop an unsupervised strategy to select positive and negative samples, which eliminates pseudo-negative samples guided by clustering labels. Specifically, it employs confidence level to distinguish positive and negative samples. Consequently, our method achieves higher recognition accuracy on benchmark datasets.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Classification

6546

Justifying Argument Acceptance with Collective Attacks: Discussions and Disputes

Giovanni Buraglio, Wolfgang Dvorak, Matthias König, Markus Ulbricht

6 min. talk | August 6th at 15:00 | Session: KRR: Argumentation

[+] More

[-] Less

In formal argumentation one aims for intuitive and concise justifications for the acceptance of arguments. Discussion games and dispute trees are established methods to obtain such a justification. However, so far these techniques are based on instantiating the knowledge base into graph-based Dung style abstract argumentation frameworks (AFs). These instantiations are known to produce frameworks with a large number of arguments and thus also yield long discussion games and large dispute trees. To obtain more concise justifications for argument acceptance, we propose to instantiate the knowledge base as an argumentation framework with collective attacks (SETAF). Remarkably, this approach yields smaller frameworks compared to traditional AF instantiation, while exhibiting increased expressive power. We then introduce discussion games and dispute trees tailored to SETAFs, show that they correspond to credulous acceptance w.r.t. the well-known preferred semantics, analyze and tune them w.r.t. the size, and compare the two notions. Finally, we illustrate how our findings apply to assumption-based argumentation.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

6553

SEMANTIFY: Unveiling Memes with Robust Interpretability beyond Input Attribution

Dibyanayan Bandyopadhyay, Asmit Ganguly, Baban Gain, Asif Ekbal

6 min. talk | August 7th at 15:00 | Session: NLP: Natural Language Processing (2/3)

[+] More

[-] Less

Memes, initially created for humor and social commentary, have transformed into platforms for offensive online content. Detecting such content is crucial; however, existing deep learning-based meme offensiveness classifiers lack transparency, functioning as opaque black-box systems. While Integrated Gradient and similar input-attribution interpretability methods exist, they often yield inadequate and irrelevant keywords. To bridge this gap, we introduce SEMANTIFY, a novel system featuring a theoretically grounded multi-step filtering process. SEMANTIFY extracts meaningful "tokens" from a predefined vocabulary, generating a pertinent and comprehensive set of interpretable keywords. These extracted keywords reveal the model’s awareness of hidden meanings in memes, enhancing transparency. Evaluation of SEMANTIFY using interpretability metrics, including ‘leakage-adjusted simulatability,’ demonstrates its superiority over various baselines by up to 2.5 points. Human evaluation of ‘relatedness’ and ‘exhaustiveness’ of extracted keywords further validates its effectiveness. Additionally, a qualitative analysis of extracted keywords serves as a case study, unveiling model error cases and their reasons. SEMANTIFY contributes to the advancement of more interpretable multimodal systems for meme offensiveness detection, fostering trust for real-world applications.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

6562

Aggregation and Purification: Dual Enhancement Network for Point Cloud Few-shot Segmentation

Guoxin Xiong, Yuan Wang, Zhaoyang Li, Wenfei Yang, Tianzhu Zhang, Xu Zhou, Shifeng Zhang, Yongdong Zhang

6 min. talk | August 6th at 11:30 | Session: CV: 3D computer vision (1/2)

[+] More

[-] Less

Point cloud few-shot semantic segmentation (PC-FSS) aims to segment objects within query samples of new categories given only a handful of annotated support samples. Although PC-FSS demonstrates enhanced category generalization capabilities compared to the fully supervised paradigm, the prevalent significant scene discrepancies, which can be systematically summarized into intra-semantic diversity and semantic inconsistency, have posed substantial challenges to the area. In this work, we design a novel Dual Enhancement Network (DENet) to comprehensively tackle different kinds of scene discrepancies in a coherent and synergistic framework. The proposed DENet enjoys several merits. First, we design a mutual aggregation module to reconcile the intrinsic tension between the support prototypes and query point features, and the intra-semantic diversity is diminished in a bidirectional manner. Second, the consistent purification strategy is introduced to eliminate ambiguous prototypes, thereby reducing the mismatches brought by semantic inconsistency. Extensive experiments on S3DIS and ScanNet under different settings demonstrate that DENet significantly outperforms previous SOTAs.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

6569

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang

6 min. talk | August 8th at 11:30 | Session: HAI: Cognitive modeling

[+] More

[-] Less

We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e.g., event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Cognitive systems

6581

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-Avoidance Properties

S. Akshay, Krishnendu Chatterjee, Tobias Meggendorfer, Đorđe Žikelić

12 min. talk | August 8th at 11:30 | Session: MAS: Formal verification, validation and synthesis

[+] More

[-] Less

Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i.e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Agent-based and Multi-agent Systems -> MAS: Trust and reputation
Planning and Scheduling -> PS: Markov decisions processes
Planning and Scheduling -> PS: Planning under uncertainty

6606

A Fourier Perspective of Feature Extraction and Adversarial Robustness

Liangqi Zhang, Yihao Luo, Haibo Shen, Tianjiang Wang

6 min. talk | August 8th at 15:00 | Session: CV: Computer Vision (2/2)

[+] More

[-] Less

Adversarial robustness and interpretability are longstanding challenges of computer vision. Deep neural networks are vulnerable to adversarial perturbations that are incomprehensible and imperceptible to humans. However, the opaqueness of networks prevents one from theoretically addressing adversarial robustness. As a human-comprehensible approach, the frequency perspective has been adopted in recent works to investigate the properties of neural networks and adversarial examples. In this paper, we investigate the frequency properties of feature extraction and analyze the stability of different frequency features when attacking different frequencies. Therefore, we propose an attack method, F-PGD, based on the projected gradient descent to attack the specified frequency bands. Utilizing this method, we find many intriguing properties of neural networks and adversarial perturbations. We experimentally show that contrary to the low-frequency bias of neural networks, the effective features of the same class are distributed across all frequency bands. Meanwhile, the high-frequency features often dominate when the neural networks make conflicting decisions on different frequency features. Furthermore, the attack experiments show that the low-frequency features are more robust to the attacks on different frequencies, but the interference to the high frequencies makes the network unable to make the right decision. These properties indicate that the decision-making process of neural networks tends to use as few low-frequency features as possible and cannot integrate features of different frequencies.

List of keywords

Computer Vision -> CV: Interpretability and transparency
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Recognition (object detection, categorization)

6621

Finite Groundings for ASP with Functions: A Journey through Consistency

Lukas Gerlach, David Carral, Markus Hecher

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Answer set programming (ASP) is a logic programming formalism used in various areas of artificial intelligence like combinatorial problem solving and knowledge representation and reasoning. It is known that enhancing ASP with function symbols makes basic reasoning problems highly undecidable. However, even in simple cases, state of the art reasoners, specifically those relying on a ground-and-solve approach, fail to produce a result. Therefore, we reconsider consistency as a basic reasoning problem for ASP. We show reductions that give an intuition for the high level of undecidability. These insights allow for a more fine-grained analysis where we characterize ASP programs as "frugal" and "non-proliferous". For such programs, we are not only able to semi-decide consistency but we also propose a grounding procedure that yields finite groundings on more ASP programs with the concept of "forbidden" facts.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

6623

Deriving Provably Correct Explanations for Decision Trees: The Impact of Domain Theories

Gilles Audemard, Jean-Marie Lagniez, Pierre Marquis, Nicolas Szczepanski

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

We are interested in identifying the complexity of computing local explanations of various types given a decision tree, when the Boolean conditions used in the tree are not independent. This is usually the case when decision trees are learned from instances described using numerical or categorical attributes. In such a case, considering the domain theory indicating how the Boolean conditions occurring in the tree are logically connected is paramount to derive provably correct explanations. However, the nature of the domain theory may have a strong impact on the complexity of generating explanations. In this paper, we identify the complexity of deriving local explanations (abductive or contrastive) given a decision tree in the general case, and under several natural restrictions about the domain theory.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

6650

Cutting the Black Box: Conceptual Interpretation of a Deep Neural Net with Multi-Modal Embeddings and Multi-Criteria Decision Aid

Nicolas Atienza, Roman Bresson, Cyriaque Rousselot, Philippe Caillou, Johanne Cohen, Christophe Labreuche, Michele Sebag

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

This paper tackles the concept-based explanation of neural models in computer vision, building upon the state of the art in Multi-Criteria Decision Aid (MCDA). The novelty of the approach is to leverage multi-modal embeddings from CLIP to bridge the gap between pixel-based and concept-based representations. The proposed Cut the Black Box (CB2) approach disentangles the latent representation of a trained pixel-based neural net, referred to as teacher model, along a 3-step process. Firstly, the pixel-based representation of the samples is mapped onto a conceptual representation using multi-modal embeddings. Secondly, an interpretable-by-design MCDA student model is trained by distillation from the teacher model, using the conceptual sample representation. Thirdly, the alignment of the teacher and student latent representations spells out the concepts relevant to explaining the teacher model. The empirical validation of the approach on ResNet, VGG, and VisionTransformer on Cifar-10, Cifar-100, Tiny ImageNet, and Fashion-MNIST showcases the effectiveness of the interpretations provided for the teacher models. The analysis reveals that decision-making predominantly relies on few concepts, thereby exposing potential bias in the teacher’s decisions.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Computer Vision -> CV: Interpretability and transparency
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Trustworthy machine learning

6659

Revising Beliefs and Intentions in Stochastic Environments

Nima Motamed, Natasha Alechina, Mehdi Dastani, Dragan Doder

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

The development of autonomous agents operating in dynamic and stochastic environments requires theories and models of how beliefs and intentions are revised while taking their interplay into account. In this paper, we initiate the study of belief and intention revision in stochastic environments, where an agent’s beliefs and intentions are specified in a decidable probabilistic temporal logic. We then provide general Katsuno & Mendelzon-style representation theorems for both belief and intention revision, giving clear semantic characterizations of revision methods.

List of keywords

Knowledge Representation and Reasoning -> KRR: Belief change
Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief

6663

The Impact of Features Used by Algorithms on Perceptions of Fairness

Andrew Estornell, Tina Zhang, Sanmay Das, Chien-Ju Ho, Brendan Juba, Yevgeniy Vorobeychik

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

We investigate perceptions of fairness in the choice of features that algorithms use about individuals in a simulated gigwork employment experiment. First, a collection of experimental participants (the selectors) were asked to recommend an algorithm for making employment decisions. Second, a different collection of participants (the workers) were told about the setup, and a subset were ostensibly selected by the algorithm to perform an image labeling task. For both selector and worker participants, algorithmic choices differed principally in the inclusion of features that were non-volitional, and either directly relevant to the task, or for which relevance is not evident except for these features resulting in higher accuracy. We find that the selectors had a clear predilection for the more accurate algorithms, which they also judged as more fair. Worker sentiments were considerably more nuanced. Workers who were hired were largely indifferent among the algorithms. In contrast, workers who were not hired exhibited considerably more positive sentiments for algorithms that included non-volitional but relevant features. However, workers with disadvantaged values of non-volitional features exhibited more negative sentiment towards their use than the average, although the extent of this appears to depend considerably on the nature of such features.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Moral decision making
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
AI Ethics, Trust, Fairness -> ETF: Values

6671

On the Pursuit of EFX for Chores: Non-existence and Approximations

Vasilis Christoforidis, Christodoulos Santorinaios

12 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

We study the problem of fairly allocating a set of chores to a group of agents. The existence of envy-free up to any item (EFX) allocations is a long-standing open question for both goods and chores. We resolve this question by providing a negative answer for the latter, presenting a simple construction that admits no EFX solutions for allocating six items to three agents equipped with superadditive cost functions, thus proving a separation result between goods and bads. In fact, we uncover a deeper insight, showing that the instance has unbounded approximation ratio. Moreover, we show that deciding whether an EFX allocation exists is NP-complete. On the positive side, we establish the existence of EFX allocations under general monotone cost functions when the number of items is at most n + 2. We then shift our attention to additive cost functions. We employ a general framework in order to improve the approximation guarantees in the well-studied case of three additive agents, and provide several conditional approximation bounds that leverage ordinal information.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division

6677

A Strategic Analysis of Prepayments in Financial Credit Networks

Hao Zhou, Yongzhao Wang, Konstantinos Varsos, Nicholas Bishop, Rahul Savani, Anisoara Calinescu, Michael Wooldridge

6 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By scrutinizing the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems

6695

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability

Jorge García-Carrasco, Alejandro Maté, Juan Trujillo

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

Large Language Models (LLMs), characterized by being trained on broad amounts of data in a self-supervised manner, have shown impressive performance across a wide range of tasks. Indeed, their generative abilities have aroused interest on the application of LLMs across a wide range of contexts. However, neural networks in general, and LLMs in particular, are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model. This is a serious concern that impedes the use of LLMs on high-stakes applications, such as healthcare, where a wrong prediction can imply serious consequences. Even though there are many efforts on making LLMs more robust to adversarial attacks, there are almost no works that study how and where these vulnerabilities that make LLMs prone to adversarial attacks happen. Motivated by these facts, we explore how to localize and understand vulnerabilities, and propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process. Specifically, this method enables us to detect vulnerabilities related to a concrete task by (i) obtaining the subset of the model that is responsible for that task, (ii) generating adversarial samples for that task, and (iii) using MI techniques together with the previous samples to discover and understand the possible vulnerabilities. We showcase our method on a pretrained GPT-2 Small model carrying out the task of predicting 3-letter acronyms to demonstrate its effectiveness on locating and understanding concrete vulnerabilities of the model.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Explainable/Interpretable machine learning

6697

Pre-training General User Representation with Multi-type APP Behaviors

Yuren Zhang, Min Hou, Kai Zhang, Yuqing Yuan, Chao Song, Zhihao Ye, Enhong Chen, Yang Yu

6 min. talk | August 7th at 11:30 | Session: ML: Representation learning

[+] More

[-] Less

In numerous user-centric services on mobile applications (apps), accurately mining user interests and generating effective user representations are paramount. Traditional approaches, which often involve training task-specific user representations, are becoming increasingly impractical due to their high computational costs and limited adaptability. This paper introduces a novel solution to this challenge: the Multi-type App-usage Fusion Network (MAFN). MAFN innovatively pre-trains universal user representations, leveraging multi-type app behaviors to overcome key limitations in existing methods. We address two primary challenges: 1) the varying frequency of user behaviors (ranging from low-frequency actions like (un)installations to high-frequency yet insightful app launches); and 2) the integration of multi-type behaviors to form a cohesive representation. Our approach involves the creation of novel pre-training tasks that harness self-supervised signals from diverse app behaviors, capturing both long-term and short-term user interests. MAFN’s unique fusion approach effectively amalgamates these interests into a unified vector space, facilitating the development of a versatile, general-purpose user representation. With a practical workflow, extensive experiments with three typical downstream tasks on real-world datasets verify the effectiveness of our approach.

List of keywords

Machine Learning -> ML: Representation learning
Data Mining -> DM: Mining heterogenous data
Machine Learning -> ML: Self-supervised Learning
Machine Learning -> ML: Applications

6713

TFWT: Tabular Feature Weighting with Transformer

Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, Kunpeng Liu

6 min. talk | August 7th at 11:30 | Session: DM: Applications

[+] More

[-] Less

In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it leads to suboptimal performance in complex datasets with rich features. To address this problem, we introduce Tabular Feature Weighting with Transformer, a novel feature weighting approach for tabular data. Our method adopts Transformer to capture complex feature dependencies and contextually assign appropriate weights to discrete and continuous features. Besides, we employ a reinforcement learning strategy to further fine-tune the weighting process. Our extensive experimental results across various real-world datasets and diverse downstream tasks show the effectiveness of TFWT and highlight the potential for enhancing feature weighting in tabular data analysis.

List of keywords

Data Mining -> DM: Applications
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction

6725

Updates on the Complexity of SHAP Scores

Xuanxiang Huang, Joao Marques-Silva

6 min. talk | August 6th at 11:30 | Session: ETF: Explainability and interpretability

[+] More

[-] Less

SHAP scores represent one of the most widely used methods of explainability by feature attribution, as illustrated by the explainable AI tool SHAP. A number of recent works studied the computational complexity of the exact computation of SHAP scores, covering a comprehensive range of families of classifiers. This paper refines some of the existing complexity claims, including families of classifiers for which the computation of SHAP scores is computationally hard and those for which there exist polynomial-time algorithms.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning

6758

Model-Free Preference Elicitation

Carlos Martin, Craig Boutilier, Ofer Meshi, Tuomas Sandholm

6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)

[+] More

[-] Less

In recommender systems, preference elicitation (PE) is an effective way to learn about a user’s preferences to improve recommendation quality. Expected value of information (EVOI), a Bayesian technique that computes expected gain in user utility, has proven to be effective in selecting useful PE queries. Most EVOI methods use probabilistic models of user preferences and query responses to compute posterior utilities. By contrast, we develop model-free variants of EVOI that rely on function approximation to obviate the need for specific modeling assumptions. Specifically, we learn user response and utility models from existing data (often available in real-world recommender systems), which are used to estimate EVOI rather than relying on explicit probabilistic inference. We augment our approach by using online planning, specifically, Monte Carlo tree search, to further enhance our elicitation policies. We show that our approach offers significant improvement in recommendation quality over standard baselines on several PE tasks.

List of keywords

Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Data Mining -> DM: Recommender systems
Humans and AI -> HAI: Personalization and user modeling

6763

Solving Long-run Average Reward Robust MDPs via Stochastic Games

Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Petr Novotný, Đorđe Žikelić

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, MDPs do not take uncertainty in transition probabilities into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in NP and coNP and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.

List of keywords

Planning and Scheduling -> PS: Markov decisions processes
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Planning and Scheduling -> PS: Planning under uncertainty
Planning and Scheduling -> PS: Theoretical foundations of planning

6767

Online Sampling and Decision Making with Low Entropy

Mohammad Taghi Hajiaghayi, Dariusz R. Kowalski, Piotr Krysta, Jan Olkowski

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Suppose we are given an integer k and n boxes, labeled 1,2,…,n by an adversary, each containing a single number chosen from an unknown distribution; the n distributions not necessarily identical. We have to choose an order to sequentially open the boxes, and each time we open the next box in this order, we learn the number inside. If we reject a number in a box, the box cannot be recalled. Our goal is to accept k of these numbers, without necessarily opening all boxes, such that the accepted numbers are the k largest numbers in the boxes (thus their sum is maximized). This problem, sometimes called a free order multiple-choice secretary problem, is one of the classic examples of online decision making problems. A natural approach to solve such problems is to sample elements in random order; however, as indicated in several sources, e.g., Turan et al. NIST 2015 [35], Bierhorst et al. Nature 2018 [10], pure randomness is hard to get in reality. Thus, pseudorandomness has to be used, with a small entropy. We show that with a very small O(log log n) entropy an almost-optimal approximation of the value of k largest numbers can be selected, with only a polynomially small additive error, for k < log n / log log n. Our solution works for exponentially larger range of parameter k compared to previously known algorithms (STOC 2015 [22]). We also prove a corresponding lower bound on the entropy of optimal (and even close to optimal, with respect to competitive ratio) solutions for this problem of choosing k largest numbers, matching the entropy of our algorithm. No previous lower bound on entropy was known for this problem if k > 1.

List of keywords

Machine Learning -> ML: Online learning
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Sequence and graph learning

6775

Formal Verification of Parameterised Neural-symbolic Multi-agent Systems

Panagiotis Kouvaros, Elena Botoeva, Cosmo De Bonis-Campbell

12 min. talk | August 8th at 11:30 | Session: MAS: Formal verification, validation and synthesis

[+] More

[-] Less

We study the problem of verifying multi-agent systems composed of arbitrarily many neural-symbolic agents. We introduce a novel parameterised model, where the parameter denotes the number of agents in the system, each homogeneously constructed from an agent template equipped with a neural network-based perception unit and a traditionally programmed action selection mechanism. We define the verification and emergence identification problems for these models against a bounded fragment of CTL. We put forward an abstraction methodology that enables us to recast both problems to the problem of checking Neural Interpreted Systems with a bounded number of agents. We present an implementation and discuss experimental results obtained on a social dilemma game based on guarding.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis

6779

Cross-Talk Reduction

Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

12 min. talk | August 8th at 11:30 | Session: ML: Unsupervised learning

[+] More

[-] Less

While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context, we propose a novel task named \textit{cross-talk reduction} (CTR) which aims at reducing cross-talk speech, and a novel solution named CTRnet which is based on unsupervised or weakly-supervised neural speech separation. In unsupervised CTRnet, close-talk and far-field mixtures are stacked as input for a DNN to estimate the close-talk speech of each speaker. It is trained in an unsupervised, discriminative way such that the DNN estimate for each speaker can be linearly filtered to cancel out the speaker’s cross-talk speech captured at other microphones. In weakly-supervised CTRnet, we assume the availability of each speaker’s activity timestamps during training, and leverage them to improve the training of unsupervised CTRnet. Evaluation results on a simulated two-speaker CTR task and on a real-recorded conversational speech separation and recognition task show the effectiveness and potential of CTRnet.

List of keywords

Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Applications
Natural Language Processing -> NLP: Speech

6784

Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity

Martin Smit, Fernando P. Santos

12 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner’s reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge inand out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

6805

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Chongyang Shi, Yuheng Bu, Jie Fu

6 min. talk | August 9th at 11:30 | Session: PS: Planning and Scheduling (2/2)

[+] More

[-] Less

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered—Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

List of keywords

Planning and Scheduling -> PS: Planning with Incomplete Information
Planning and Scheduling -> PS: Markov decisions processes
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Planning under uncertainty

6807

Scalable Mechanism Design for Multi-Agent Path Finding

Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen McAleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

6 min. talk | August 9th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (2/2)

[+] More

[-] Less

Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally infeasible, making the use of approximate, suboptimal algorithms essential. Adding to the complexity, agents might act in a self-interested and strategic way, possibly misrepresenting their goals to the MAPF algorithm if it benefits them. Although the field of mechanism design offers tools to align incentives, using these tools without careful consideration can fail when only having access to approximately optimal outcomes. In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms. We test our mechanisms on realistic MAPF domains with problem sizes ranging from dozens to hundreds of agents. We find that they improve welfare beyond a simple baseline.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Planning and Scheduling -> PS: Planning algorithms

6819

What Makes Models Compositional? A Theoretical View

Parikshit Ram, Tim Klinger, Alexander G. Gray

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (5/6)

[+] More

[-] Less

Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity. Finally, we provide theoretical guarantees for the expressivity and systematic generalization of compositional models that explicitly depend on our proposed definition and highlighting factors which drive poor empirical performance.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Theory of deep learning
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP

6825

Getting More by Knowing Less: Bayesian Incentive Compatible Mechanisms for Fair Division

Vasilis Gkatzelis, Alexandros Psomas, Xizhi Tan, Paritosh Verma

6 min. talk | August 8th at 15:00 | Session: GTEP: Game Theory and Economic Paradigms

[+] More

[-] Less

We study fair resource allocation with strategic agents. It is well-known that, across multiple fundamental problems in this domain, truthfulness and fairness are incompatible. For example, when allocating indivisible goods, no truthful and deterministic mechanism can guarantee envy-freeness up to one item (EF1), even for two agents with additive valuations. Or, in cake-cutting, no truthful and deterministic mechanism always outputs a proportional allocation, even for two agents with piecewise constant valuations. Our work stems from the observation that, in the context of fair division, truthfulness is used as a synonym for Dominant Strategy Incentive Compatibility (DSIC), requiring that an agent prefers reporting the truth, no matter what other agents report. In this paper, we instead focus on Bayesian Incentive Compatible (BIC) mechanisms, requiring that agents are better off reporting the truth in expectation over other agents’ reports. We prove that, when agents know a bit less about each other, a lot more is possible: using BIC mechanisms we can achieve fairness notions that are unattainable by DSIC mechanisms in both the fundamental problems of allocation of indivisible goods and cake-cutting. We prove that this is the case even for an arbitrary number of agents, as long as the agents’ priors about each others’ types satisfy a neutrality condition. Notably, for the case of indivisible goods, we significantly strengthen the state-of-the-art negative result for efficient DSIC mechanisms, while also highlighting the limitations of BIC mechanisms, by showing that a very general class of welfare objectives is incompatible with Bayesian Incentive Compatibility. Combined, these results give a near-complete picture of the power and limitations of BIC and DSIC mechanisms for the problem of allocating indivisible goods.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Mechanism design

6828

Anomaly Subgraph Detection through High-Order Sampling Contrastive Learning

Ying Sun, Wenjun Wang, Nannan Wu, Chunlong Bao

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

Anomaly subgraph detection is a crucial task in various real-world applications, including identifying high-risk areas, detecting river pollution, and monitoring disease outbreaks. Early traditional graph-based methods can obtain high-precision detection results in scenes with small-scale graphs and obvious anomaly features. Most existing anomaly detection methods based on deep learning primarily concentrate on identifying anomalies at the node level, while neglecting to detect anomaly groups in the internal structure. In this paper, we propose a novel end-to-end Graph Neural Network (GNN) based anomaly subgraph detection approach(ASD-HC) in graph-structured data. 1)We propose a high-order neighborhood sampling strategy to construct our node and k-order neighbor-subgraph instance pairs. 2)Anomaly features of nodes are captured through a self-supervised contrastive learning model. 3) Detecting the maximum connected anomaly subgraph is performed by integrating the Non-parameter Graph Scan statistics and a Random Walk module. We evaluate ASD-HC against five state-of-the-art baselines using five benchmark datasets. ASD-HC outperforms the baselines by over 13.01% in AUC score. Various experiments demonstrate that our approach effectively detects anomaly subgraphs within large-scale graphs.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Machine Learning -> ML: Self-supervised Learning

6831

Online Combinatorial Optimization with Group Fairness Constraints

Negin Golrezaei, Rad Niazadeh, Kumar Kshitij Patel, Fransisca Susan

12 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

As digital marketplaces and services continue to expand, it is crucial to maintain a safe and fair environment for all users. This requires implementing fairness constraints into the sequential decision-making processes of these platforms to ensure equal treatment. However, this can be challenging as these processes often need to solve NP-complete problems with exponentially large decision spaces at each time step. To overcome this, we propose a general framework incorporating robustness and fairness into NP-complete problems, such as optimizing product ranking and maximizing sub-modular functions. Our framework casts the problem as a max-min game between a primal player aiming to maximize the platform’s objective and a dual player in charge of group fairness constraints. We show that one can trace the entire Pareto fairness curve by changing the thresholds on the fairness constraints. We provide theoretical guarantees for our method and empirically evaluate it, demonstrating its effectiveness.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Constraint Satisfaction and Optimization -> CSO: Mixed discrete and continuous optimization
Machine Learning -> ML: Online learning
Machine Learning -> ML: Optimization

6837

ADELT: Transpilation between Deep Learning Frameworks

Linyuan Gong, Jiayi Wang, Alvin Cheung

6 min. talk | August 8th at 11:30 | Session: NLP: Applications

[+] More

[-] Less

We propose the Adversarial DEep Learning Transpiler (ADELT), a novel approach to source-to-source transpilation between deep learning frameworks. ADELT uniquely decouples code skeleton transpilation and API keyword mapping. For code skeleton transpilation, it uses few-shot prompting on large language models (LLMs), while for API keyword mapping, it uses contextual embeddings from a code-specific BERT. These embeddings are trained in a domain-adversarial setup to generate a keyword translation dictionary. ADELT is trained on an unlabeled web-crawled deep learning corpus, without relying on any hand-crafted rules or parallel data. It outperforms state-of-the-art transpilers, improving pass@1 rate by 16.2 pts and 15.0 pts for PyTorch-Keras and PyTorch-MXNet transpilation pairs respectively. We provide open access to our code at https://github.com/gonglinyuan/adelt

List of keywords

Natural Language Processing -> NLP: Applications
Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Machine translation and multilinguality

6846

Computing Optimal Equilibria in Repeated Games with Restarts

Ratip Emin Berker, Vincent Conitzer

6 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

Infinitely repeated games can support cooperative outcomes that are not equilibria in the one-shot game. The idea is to make sure that any gains from deviating will be offset by retaliation in future rounds. However, this model of cooperation fails in anonymous settings with many strategic agents that interact in pairs. Here, a player can defect and then avoid penalization by immediately switching partners. In this paper, we focus on a specific set of equilibria that avoids this pitfall. In them, agents follow a designated sequence of actions, and restart if their opponent ever deviates. We show that the socially-optimal sequence of actions consists of an infinitely repeating goal value, preceded by a hazing period. We introduce an equivalence relation on sequences and prove that the computational problem of finding a representative from the optimal equivalence class is (weakly) NP-hard. Nevertheless, we present a pseudo-polynomial time dynamic program for this problem, as well as an integer linear program, and show they are efficient in practice. Lastly, we introduce a fully polynomial-time approximation scheme that outputs a hazing sequence with arbitrarily small approximation ratio.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

6867

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

12 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Restless multi-arm bandits (RMABs) is a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching. We explore several important question such as how to handle arms opting-in and opting-out over time without frequent retraining from scratch, how to deal with continuous state settings with nonlinear reward functions, which appear naturally in practical contexts. We address these questions by developing a pre-trained model (PreFeRMAB) based on a novel combination of three key ideas: (i) to enable fast generalization, we use train agents to learn from each other’s experience; (ii) to accommodate streaming RMABs, we derive a new update rule for a crucial $\lambda$-network; (iii) to handle more complex continuous state settings, we design the algorithm to automatically define an abstract state based on raw observation and reward data. PreFeRMAB allows general zero-shot ability on previously unseen RMABs, and can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. We theoretically prove the benefits of multi-arm generalization and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Machine Learning -> ML: Few-shot learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning

6877

Fostering Collective Action in Complex Societies Using Community-Based Agents

Jonathan Skaggs, Michael Richards, Melissa Morris, Michael A. Goodrich, Jacob W. Crandall

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

As AI integrates into human societies, its ability to engage in collective action is increasingly important. Human social systems have large and flexible strategy spaces, conflicting interests, power asymmetry, and interdependence among members, which together make it challenging for agents to learn collective action. In this paper, we explore the ability of community-based agents to learn collective action within a novel model of complex social systems. We first present this social model, called the Junior High Game (JHG). The JHG embodies key elements of human social systems that require players to act collectively. We then describe an agent, called CAB, which is based on community detection and formation algorithms. Via simulations and user studies, we evaluate the ability of CAB agents to interact in JHG societies consisting of humans and AI agents. These evaluations both identify requirements for successful collective behaviors in the JHG and identify important unsolved problems for developing AI agents capable of collective action in complex social systems.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent societies
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Agent-based and Multi-agent Systems -> MAS: Trust and reputation

6894

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Carbune, Jason Lin, Jindong Chen, Abhanshu Sharma

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-art results on UI- and infographics-based tasks (Multipage DocVQA, WebSRC, and MoTIF), and new best-in-class performance on others (ChartQA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.

List of keywords

Humans and AI -> HAI: Human-computer interaction
Computer Vision -> CV: Vision, language and reasoning
Machine Learning -> ML: Multi-modal learning
Machine Learning -> ML: Multi-task and transfer learning

6903

Practical Anytime Algorithms for Judicious Partitioning of Active Directory Attack Graphs

Yumeng Zhang, Max Ward, Hung Nguyen

6 min. talk | August 7th at 11:30 | Session: S: Combinatorial search and optimisation

[+] More

[-] Less

Given a directed graph, a set of source nodes, a target node and a budget, we study the problem of maximizing the number of source nodes disconnected from the target node by removing edges not exceeding the budget. Our model is mainly motivated by a cyber security use case where we need to minimize the attack surface of a Windows Active Directory system. In these high-profile attacks, the attackers first compromise a source (i.e., a compromised user node) and then laterally move to a destination (i.e., a high-privileged admin node). Our aim is to minimize the number of users with a path to the admin. We first prove that the problem is NP-hard. Algorithms for exact optimality usually struggle to converge on graphs that approach real-world network scales and therefore are not practical for usage. In light of this, we study anytime algorithms that return an acceptable result whenever the algorithm is terminated, and can improve optimality by allowing longer computational time. We observe the source connectivity of directed graphs, based on which we propose a novel anytime algorithm—the spiral algorithm. We also develop two Monte Carlo Tree Search (MCTS) algorithms as a baseline to study the performance of typical anytime algorithms for our problem, and show that the spiral algorithm improves the optimality at a significantly faster speed and therefore exhibits better anytime behavior compared with MCTS.

List of keywords

Search -> S: Combinatorial search and optimisation
Multidisciplinary Topics and Applications -> MTA: Security and privacy

6924

Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

Brian Hu Zhang, Tuomas Sandholm

6 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (EFGs). Our results depend on what is assumed about the tiebreaking scheme—that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

6926

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

Fangjun Li, David C. Hogg, Anthony G. Cohn

6 min. talk | August 9th at 10:00 | Session: NLP: Resources and evaluation

[+] More

[-] Less

Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models’ (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation. We present a novel benchmark for assessing QSR in LMs, which is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. This approach provides a more detailed and context-rich narrative for spatial reasoning evaluation, diverging from traditional, toy-task-oriented scenarios. Our benchmark encompasses a broad spectrum of qualitative spatial relationships, including topological, directional, and distance relations. These are presented with different viewing points, varied granularities, and density of relation constraints to mimic real-world complexities. A key contribution is our logic-based consistency-checking tool, which enables the assessment of multiple plausible solutions, aligning with real-world scenarios where spatial relationships are often open to interpretation. Our benchmark evaluation of advanced LMs reveals their strengths and limitations in spatial reasoning. They face difficulties with multi-hop spatial reasoning and interpreting a mix of different view descriptions, pointing to areas for future improvement.

List of keywords

Natural Language Processing -> NLP: Resources and evaluation
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Constraint Satisfaction and Optimization -> CSO: Applications
Knowledge Representation and Reasoning -> KRR: Learning and reasoning

6927

Towards a Principle-based Framework for Assessing the Contribution of Formulas on the Conflicts of Knowledge Bases

Badran Raddaoui, Christian Straßer, Said Jabbour

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

Logical conflicts are likely to arise in logic-based intelligent systems. Managing these conflicts has been intensely studied in various parts of Artificial Intelligence (AI). So far, the AI research community has paid more attention to measuring the degree of inconsistency of knowledge bases. The key question we address in the present paper is how much a given formula contributes to the inconsistency of a knowledge base. Different such measures are studied and compared in a principle-based way against the backdrop of a list of desiderata. Two families of inconsistency measures are introduced and compared with measures from the literature: one is based on the notion of problematic formulas, while the other one is defined via the notion of free formulas in knowledge bases.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
Knowledge Representation and Reasoning -> KRR: Applications

6941

Quality-Diversity Algorithms Can Provably Be Helpful for Optimization

Chao Qian, Ke Xue, Ren-Jian Wang

6 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

Quality-Diversity (QD) algorithms are a new type of Evolutionary Algorithms (EAs), aiming to find a set of high-performing, yet diverse solutions. They have found many successful applications in reinforcement learning and robotics, helping improve the robustness in complex environments. Furthermore, they often empirically find a better overall solution than traditional search algorithms which explicitly search for a single highest-performing solution. However, their theoretical analysis is far behind, leaving many fundamental questions unexplored. In this paper, we try to shed some light on the optimization ability of QD algorithms via rigorous runtime analysis. By comparing the popular QD algorithm MAP-Elites with (\mu+1)-EA (a typical EA focusing on finding better objective values only), we prove that on two NP-hard problem classes with wide applications, i.e., monotone approximately submodular maximization with a size constraint, and set cover, MAP-Elites can achieve the (asymptotically) optimal polynomial-time approximation ratio, while (\mu+1)-EA requires exponential expected time on some instances. This provides theoretical justification for that QD algorithms can be helpful for optimization, and discloses that the simultaneous search for high-performing solutions with diverse behaviors can provide stepping stones to good overall solutions and help avoid local optima.

List of keywords

Search -> S: Evolutionary computation

6946

Optimal Extended Formulations from Optimal Dynamic Programming Algorithms

Mateus de Oliveira Oliveira, Wim Van den Broeck

6 min. talk | August 7th at 10:00 | Session: CSO: Constraint optimization problems

[+] More

[-] Less

Vertex Subset Problems (VSPs) are a class of combinatorial optimization problems on graphs where the goal is to find a subset of vertices satisfying a predefined condition. Two prominent approaches for solving VSPs are dynamic programming over tree-like structures, such as tree-decompositions or clique-decompositions, and linear programming. In this work, we establish a sharp connection between both approaches by showing that if a vertex-subset problem Pi admits a solution-preserving dynamic programming algorithm that produces tables of size at most alpha(k,n) when processing a tree decomposition of width at most k of an n-vertex graph G, then the polytope defined as the convex-hull of solutions of Pi in G has extension complexity at most O(alpha(k,n)*n). Additionally, this upper bound is optimal under the exponential time hypothesis (ETH). At the one hand, our results imply that ETH-optimal solution-preserving dynamic programming algorithms for combinatorial problems yield optimal-size parameterized extended formulations for the solution polytopes associated with instances of these problems. At the other hand, unconditional lower bounds obtained in the realm of the theory of extended formulations yield unconditional lower bounds on the table complexity of solution-preserving dynamic programming algorithms.

List of keywords

Constraint Satisfaction and Optimization -> General
Constraint Satisfaction and Optimization -> CSO: Constraint optimization problems
Constraint Satisfaction and Optimization -> CSO: Mixed discrete and continuous optimization

6949

Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis

Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel Day, Florin Manea, Vijay Ganesh

6 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization

[+] More

[-] Less

Modern SMT solvers, such as Z3, offer user-controllable strategies that allow solver users the ability to tailor solving strategies for their unique set of instances, thus dramatically enhancing the solver performance for their specific use cases. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task for both solver developers and users alike. In this paper, we address this problem of automated SMT strategy synthesis via a novel Monte Carlo Tree Search (MCTS) based method. Our method treats strategy synthesis as a sequential decision-making process, whose search tree corresponds to the strategy space, and employs MCTS to navigate this vast search space. The key innovations that enable our method to identify effective strategies, while keeping costs low, are the ideas of layered and staged MCTS search. These novel heuristics allow for a deeper and more efficient exploration of the strategy space, enabling us to synthesize more effective strategies than the default ones in state-of-the-art (SOTA) SMT solvers. We implement our method, dubbed Z3alpha, as part of the Z3 SMT solver. Through extensive evaluations across six important SMT logics, Z3alpha demonstrates superior performance compared to the SOTA synthesis tool FastSMT, the default Z3 solver, and the CVC5 solver on most benchmarks. Remarkably, on a challenging QF_BV benchmark set, Z3alpha solves 42.7% more instances than the default strategy in Z3.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Search -> S: Algorithm portfolios and configuration
Search -> S: Heuristic search

6984

Prompt Learning with Extended Kalman Filter for Pre-trained Language Models

Quan Li, Xike Xie, Chao Wang, S. Kevin Zhou

6 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

Prompt learning has gained popularity as a means to leverage the knowledge embedded in pre-trained language models (PLMs) for NLP tasks while using a limited number of trainable parameters. While it has shown promise in tasks like sentiment classification and natural language inference, generating suitable prompts for PLMs, as opposed to human prompts, remains a challenge. In this paper, we introduce an abstraction of the prompt learning process using an extended Kalman filter. Our approach, called Conditional Extended Kalman Filter based on Neural Networks (CEKFNN), effectively infers more appropriate prompt tokens by enhancing the classic extended Kalman filter with PLM’s contextual representation power. Specifically, CEKFNN learns transition and emission functions from PLM embeddings of input sentences to infer latent prompt tokens. We refine CEKFNN using an alternate-training approach, retraining a PLM’s emission function with prompt tokens inferred by prompt models (PMs), as well as the initial and transition functions. PLM’s output labels assist in PMs’ training. When updating the pre-trained language model (PLM), we use an adapter approach with few trainable parameters, leaving PLM parameters frozen. We evaluate CEKFNN across open-source PLMs, demonstrating performance improvements over state-of-the-art methods while using a limited number of trainable parameters. It shows that CEKFNN performs on-par or better than fine-tuning, which requires updating all parameters in the PLM.

List of keywords

Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Bayesian learning
Natural Language Processing -> NLP: Language models
Planning and Scheduling -> PS: Markov decisions processes

7003

Improved Approximation Algorithms for Capacitated Location Routing

Jingyang Zhao, Mingyu Xiao, Shunwang Wang

6 min. talk | August 7th at 15:00 | Session: PS: Planning and Scheduling (1/2)

[+] More

[-] Less

The Capacitated Location Routing Problem is an important planning and routing problem in logistics, which generalizes the capacitated vehicle routing problem and the uncapacitated facility location problem. In this problem, we are given a set of depots and a set of customers where each depot has an opening cost and each customer has a demand, and we need to use minimum cost to open some depots and route capacitated vehicles in the opened depots to satisfy all customers’ demand. In this paper, we propose a 4.169-approximation algorithm for this problem, improving the best-known 4.38-approximation ratio (Transportation Science 2013). Moreover, if the demand of each customer is allowed to be delivered by multiple tours, we propose a more refined 4.092-approximation algorithm. Experimental study on benchmark instances shows that the quality of our computed solutions is better than that of previous algorithms and is also much closer to optimality than the provable approximation factor.

List of keywords

Planning and Scheduling -> PS: Routing
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Theoretical foundations of planning

7006

Cross-Problem Learning for Solving Vehicle Routing Problems

Zhuoyi Lin, Yaoxin Wu, Bangjian Zhou, Zhiguang Cao, Wen Song, Yingqian Zhang, Senthilnath Jayavelu

6 min. talk | August 6th at 11:30 | Session: S: Search and machine learning

[+] More

[-] Less

Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transformer for tackling the travelling salesman problem (TSP), and 2) the additional lightweight modules for processing problem-specific features in complex VRPs. Accordingly, we propose to pre-train the backbone Transformer for TSP, and then apply it in the process of fine-tuning the Transformer models for each target VRP variant. On the one hand, we fully fine-tune the trained backbone Transformer and problem-specific modules simultaneously. On the other hand, we only fine-tune small adapter networks along with the modules, keeping the backbone Transformer still. Extensive experiments on typical VRPs substantiate that 1) the full fine-tuning achieves significantly better performance than the one trained from scratch, and 2) the adapter-based fine-tuning also delivers comparable performance while being notably parameter-efficient. Furthermore, we empirically demonstrate the favorable effect of our method in terms of cross-distribution application and versatility.

List of keywords

Search -> S: Search and machine learning
Machine Learning -> ML: Applications
Search -> S: Combinatorial search and optimisation

7009

DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

12 min. talk | August 8th at 11:30 | Session: ROB: Robotics (2/2)

[+] More

[-] Less

Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes a divided view method, in which features are modeled globally via the visibility cross-attention mechanism, but interact only with partial features in a divided local virtual space. This effectively reduces interference from other irrelevant features and alleviates the training difficulties of the transformer by decoupling the position embedding from camera poses. Additionally, 2D historical RoI features are incorporated into the object-centric temporal modeling to utilize high-level visual semantic information. The model is trained using a one-to-many assignment strategy to facilitate stability. Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.Codes will be available at https://github.com/dop0/DVPE.

List of keywords

Robotics -> ROB: Robotics and vision
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)
Robotics -> ROB: Perception

7010

LG-FGAD: An Effective Federated Graph Anomaly Detection Framework

Jinyu Cai, Yunhe Zhang, Jicong Fan, See-Kiong Ng

6 min. talk | August 8th at 11:30 | Session: ML: Unsupervised learning

[+] More

[-] Less

Graph anomaly detection (GAD), which aims to identify those graphs that are significantly different from other ones, has gained growing attention in many real-world scenarios. However, existing GAD methods are generally designed for centralized training, while in real-world collaboration, graph data is generally distributed across various clients and exhibits significant non-IID characteristics. To tackle this challenge, we propose a federated graph anomaly detection framework with local-global anomaly awareness (LG-FGAD). We first introduce a self-adversarial generation module and train a discriminator to identify the generated anomalous graphs from the normal graph. To enhance the anomaly awareness of the model, we propose to maximize/minimize the mutual information from local and global perspectives. Importantly, to alleviate the impact of non-IID problems in collaborative learning, we propose a dual knowledge distillation module. The knowledge distillation is conducted over both logits and embedding distributions, and only the student model engages in collaboration to preserve the personalization of each client. Empirical results on various types of real-world datasets prove the superiority of our method.

List of keywords

Machine Learning -> ML: Unsupervised learning
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Federated learning

7024

Federated Adaptation for Foundation Model-based Recommendations

Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while preserving privacy. This paper proposes a novel federated adaptation mechanism to enhance the foundation model-based recommendation system in a privacy-preserving manner. Specifically, each client will learn a lightweight personalized adapter using its private data. The adapter then collaborates with pre-trained foundation models to provide recommendation service efficiently with fine-grained manners. Importantly, users’ private behavioral data remains secure as it is not shared with the server. This data localization-based privacy preservation is embodied via the federated learning framework. The model can ensure that shared knowledge is incorporated into all adapters while simultaneously preserving each user’s personal preferences. Experimental results on four benchmark datasets demonstrate our method’s superior performance. The code is available.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining
Data Mining -> DM: Recommender systems

7044

DarkFed: A Data-Free Backdoor Attack in Federated Learning

Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang

6 min. talk | August 8th at 10:00 | Session: ML: Federated learning (2/2)

[+] More

[-] Less

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.

List of keywords

Machine Learning -> ML: Federated learning
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Machine Learning -> ML: Adversarial machine learning

7047

Multi-Modal Sarcasm Detection Based on Dual Generative Processes

Huiying Ma, Dongxiao He, Xiaobao Wang, Di Jin, Meng Ge, Longbiao Wang

6 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)

[+] More

[-] Less

With the advancement of the internet, sarcastic sentiment expression on social media has grown increasingly diverse. Consequently, multimodal sarcasm detection has emerged as a valuable tool for users to comprehend and interpret sarcastic expressions. Previous research suggests that effectively integrating three modalities (namely image, text, and their inconsistencies) enhances sarcasm detection. However, in some instances, sarcasm detection can be achieved using a single modality, while others necessitate multiple modalities for accurate recognition. This variability suggests that each modality contributes differently to sarcasm detection, and employing a traditional fusion method may introduce bias in the information, unable to explicitly demonstrate the prediction ability of each modality. Therefore, we propose a multimodal sarcasm detection method based on dual generative processes. The dual generative processes map features into the same semantic space to deeply explore emotional inconsistencies between modalities. Concurrently, by incorporating the concept of strong and weak modalities, we explicitly model the modalities’ contributions based on prediction performance and autonomously adjust the weight distribution. Experimental results on publicly available multi-modal sarcasm detection datasets validate the superiority of our proposed model.

List of keywords

Data Mining -> DM: Mining text, web, social media
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining

7067

Joint Source Localization in Different Platforms via Implicit Propagation Characteristics of Similar Topics

Zhen Wang, Dongpeng Hou, Shu Yin, Chao Gao, Xianghua Li

6 min. talk | August 8th at 10:00 | Session: DM: Mining spatial and/or temporal data (2/2)

[+] More

[-] Less

Different social media are widely used in our daily lives. Inspired by the fact that similar topics have similar propagation characteristics, we mine the implicit knowledge of cascades with similar topics from different platforms to enhance the localization performance for scenarios where limited propagation data leads to the weak learning ability of existing localization models. In this work, we first construct a multiple platform propagation cascade dataset, aligning similar topics from both Twitter and Weibo, and enriching it with user profiles. Leveraging this dataset, we propose a Dual-channel Source Localization Framework (DSLF) for the joint cascades with similar topics. Specifically, a self-loop attention based graph convolutional network is designed to adaptively adjust the neighborhood aggregation scheme of different users with heterogeneous features in the message-passing process. Additionally, a dual-structure based Kullback-Leibler (KL) regularization module is proposed to constrain the latent distribution space of the source probabilities of similar characteristic-level users for a similar topic, enhancing the robustness of the model. Extensive experiments across Twitter and Weibo platforms demonstrate the superiority of the proposed DSLF over the SOTA methods. The code is available at https://github.com/cgao-comp/DSLF.

List of keywords

Data Mining -> DM: Networks
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining text, web, social media

7091

Bring Metric Functions into Diffusion Models

Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo

6 min. talk | August 8th at 11:30 | Session: CV: Image and video synthesis and generation (1/2)

[+] More

[-] Less

We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising Diffusion Probabilistic Model (DDPM) by effectively incorporating additional metric functions in training. Metric functions such as the LPIPS loss have been proven highly effective in consistency models derived from the score matching. However, for the diffusion counterparts, the methodology and efficacy of adding extra metric functions remain unclear. One major challenge is the mismatch between the noise predicted by a DDPM at each step and the desired clean image that the metric function works well on. To address this problem, we propose Cas-DM, a network architecture that cascades two network modules to effectively apply metric functions to the diffusion model training. The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function. The second cascaded module learns to predict the clean image, thereby facilitating the metric function computation. Experiment results show that the proposed diffusion model backbone enables the effective use of the LPIPS loss, improving the image quality (FID, sFID) of diffusion models on various established benchmarks.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation

7092

Improved Evolutionary Algorithms for Submodular Maximization with Cost Constraints

Yanhui Zhu, Samik Basu, A. Pavan

6 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

We present an evolutionary algorithm evo-SMC for the problem of Submodular Maximization under Cost constraints (SMC). Our algorithm achieves 1/2-approximation with a high probability 1-1/n within O(n2 K) iterations, where K denotes the maximum size of a feasible solution set with cost constraint B. To the best of our knowledge, this is the best approximation guarantee offered by evolutionary algorithms for this problem. We further refine evo-SMC and develop st-evo-SMC. This stochastic version yields a significantly faster algorithm while maintaining the approximation ratio of 1/2, with probability 1-epsilon. The required number of iterations reduces to O(nK log(1/epsilon)/p), where the user defined parameters p represents the stochasticity probability, and epsilon denotes the error threshold. Finally, the empirical evaluations carried out through extensive experimentation substantiate the efficiency and effectiveness of our proposed algorithms. Our algorithms consistently outperform existing methods, producing higher-quality solutions.

List of keywords

Search -> S: Evolutionary computation
Machine Learning -> ML: Optimization
Search -> S: Combinatorial search and optimisation
Machine Learning -> ML: Evolutionary learning

7100

Finite-Time Convergence Rates of Decentralized Local Markovian Stochastic Approximation

Pengfei Wang, Nenggan Zheng

6 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)

[+] More

[-] Less

Markovian stochastic approximation has recently aroused a great deal of interest in many fields; however, it is not well understood in decentralized settings. Decentralized Markovian stochastic approximation is far more challenging than its single-agent counterpart due to the complex coupling structure between decentralized communication and Markovian noise-corrupted local updates. In this paper, a decentralized local markovian stochastic approximation (DLMSA) algorithm has been proposed and attains a near-optimal convergence rate. Specifically, we first provide a local variant of decentralized Markovian stochastic approximation so that each agent performs multiple local updates and then periodically communicate with its neighbors. Furthermore, we propose DLMSA with compressed communication (C-DLMSA) for further reducing the communication overhead. In this way, each agent only needs to communicate compressed information (e.g., sign compression) with its neighbors. We show that C-DLMSA enjoys the same convergence rate as that of the original DLMSA. Finally, we verify our theoretical results by applying our methods to solve multi-task reinforcement learning problems over multi-agent systems.

List of keywords

Machine Learning -> ML: Optimization

7110

Multiplex Graph Representation Learning via Bi-level Optimization

Yudi Huang, Yujie Mo, Yujing Liu, Ci Nie, Guoqiu Wen, Xiaofeng Zhu

6 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)

[+] More

[-] Less

Many multiplex graph representation learning (MGRL) methods have been demonstrated to 1) ignore the globally positive and negative relationships among node features; and 2) usually utilize the node classification task to train both graph structure learning and representation learning parameters, and thus resulting in the problem of edge starvation. To address these issues, in this paper, we propose a new MGRL method based on the bi-level optimization. Specifically, in the inner level, we optimize the self-expression matrix to capture the globally positive and negative relationships among nodes, as well as complement them with the local relationships in graph structures. In the outer level, we optimize the parameters of the graph convolutional layer to obtain discriminative node representations. As a result, the graph structure optimization does not depend on the node classification task, which solves the edge starvation problem. Extensive experiments show that our model achieves the superior performance on node classification tasks on all datasets.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining heterogenous data

7115

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, Hongyu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

12 min. talk | August 7th at 10:00 | Session: CV: Multimodal learning

[+] More

[-] Less

Human-AI interactivity is a critical aspect that reflects the usability of Multimodal Large Language Models (MLLMs). However, existing end-to-end MLLMs only allow users to interact with them through language instructions, leading to the limitation of the interactive accuracy and efficiency. In this study, we present precise referring instructions that utilize diverse reference representations such as points and boxes as referring prompts to refer to the special region. This enables MLLMs to focus on the region of interest and achieve finer-grained interaction. Based on precise referring instruction, we propose ChatSpot, a unified end-to-end MLLM that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience. We also construct a multi-grained vision-language instruction-following dataset based on existing datasets and GPT-4 generating. Furthermore, we design a series of evaluation tasks to assess the effectiveness of region recognition and interaction. Experimental results showcase ChatSpot’s promising performance. Project page: https://github.com/Ahnsun/ChatSpot.

List of keywords

Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Embodied vision: Active agents, simulation
Natural Language Processing -> NLP: Dialogue and interactive systems
Natural Language Processing -> NLP: Question answering

7128

Mean Aggregator Is More Robust than Robust Aggregators under Label Poisoning Attacks

Jie Peng, Weiyu Li, Qing Ling

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Robustness to malicious attacks is of paramount importance for distributed learning. Existing works often consider the classical Byzantine attacks model, which assumes that some workers can send arbitrarily malicious messages to the server and disturb the aggregation steps of the distributed learning process. To defend against such worst-case Byzantine attacks, various robust aggregators have been proven effective and much superior to the often-used mean aggregator. In this paper, we show that robust aggregators are too conservative for a class of weak but practical malicious attacks, as known as label poisoning attacks, where the sample labels of some workers are poisoned. Surprisingly, we are able to show that the mean aggregator is more robust than the state-of-the-art robust aggregators in theory, given that the distributed data are sufficiently heterogeneous. In fact, the learning error of the mean aggregator is proven to be optimal in order. Experimental results corroborate our theoretical findings, demonstrating the superiority of the mean aggregator under label poisoning attacks.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Optimization

7153

Synthesizing Programmatic Policy for Generalization within Task Domain

Tianyi Wu, Liwei Shen, Zhen Dong, Xin Peng, Wenyun Zhao

6 min. talk | August 7th at 15:00 | Session: ML: Reinforcement learning (1/2)

[+] More

[-] Less

Deep reinforcement learning struggles to generalize across tasks that remain unseen during training. Consider a neural process observed in humans and animals, where they not only learn new solutions but also deduce shared subroutines. These subroutines can be applied to tasks involving similar states to improve efficiency. Inspired by this phenomenon, we consider synthesizing a programmatic policy characterized by a conditional branch structure, which is capable of capturing subroutines and state patterns. This enables the learned policy to generalize to unseen tasks. The architecture of the programmatic policy is synthesized based on a context-free grammar. Such a grammar supports a nested If-Then-Else derivation and the incorporation of Recurrent Neural Network. The programmatic policy is trained across tasks in a domain through a meta-learning algorithm. We evaluate our approach in benchmarks, adapted from PDDLGym for task planning and Pybullet for robotic manipulation. Experimental results showcase the effectiveness of our approach across diverse benchmarks. Moreover, the learned policy demonstrates the ability to generalize to tasks that were not seen during training.

List of keywords

Machine Learning -> ML: Reinforcement learning
Robotics -> ROB: Learning in robotics

7167

Expected Work Search: Combining Win Rate and Proof Size Estimation

Owen Randall, Martin Müller, Ting-Han Wei, Ryan Hayward

12 min. talk | August 8th at 10:00 | Session: S: Search

[+] More

[-] Less

We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms on the games of Go and Hex. For Go, we present the first solution to the empty 5×5 board with the commonly used positional superko ruleset. For Hex, our algorithm solves the empty 8×8 board in under 4 minutes. Experiments show that EWS succeeds both with and without extensive domain-specific knowledge.

List of keywords

Search -> S: Heuristic search
Search -> S: Applications
Search -> S: Game playing

7176

A Better Approximation for Bipartite Traveling Tournament in Inter-League Sports Scheduling

Jingyang Zhao, Mingyu Xiao

6 min. talk | August 9th at 11:30 | Session: PS: Planning and Scheduling (2/2)

[+] More

[-] Less

The bipartite traveling tournament problem (BTTP) was initially introduced by Hoshino and Kawarabayashi (AAAI 2011) to address inter-league sports scheduling, which aims to design a feasible bipartite tournament between two n-team leagues under some constraints such that the total traveling distance of all participating teams is minimized. Since its introduction, several heuristic methods have been developed to design feasible schedules for NBA, NPB and so on. In terms of solution quality with a theoretical guarantee, previously only a (2+ε) approximation is known for the case that n≡0 (mod 3). Whether there are similar results for the cases that n≡1 (mod 3) and n≡2 (mod 3) was asked in the literature. In this paper, we answer this question positively by proposing a (3/2+ε)-approximation algorithm for any n and any constant ε>0, which also improves the previous ratio for the case that n≡0 (mod 3).

List of keywords

Planning and Scheduling -> PS: Scheduling
Planning and Scheduling -> PS: Routing
Planning and Scheduling -> PS: Theoretical foundations of planning

7178

No Regularization Is Needed: Efficient and Effective Incomplete Label Distribution Learning

Xiang Li, Songcan Chen

6 min. talk | August 9th at 11:30 | Session: ML: Multi-label learning

[+] More

[-] Less

In reality, it is laborious to obtain complete label degrees, giving birth to Incomplete Label Distribution Learning (InLDL), where some degrees are missing. Existing InLDL methods often assume that degrees are uniformly random missing. However, it is often not the case in practice, which arises the first issue. Besides, they often adopt explicit regularization to compensate the incompleteness, leading to burdensome parameter tuning and extra computation, causing the second issue. To address the first issue, we adopt a more practical setting, i.e., small degrees are more prone to be missing, since large degrees are likely to catch more attention. To tackle the second issue, we argue that label distribution itself already contains abundant knowledge, such as label correlation and ranking order, thus it may have provided enough prior for learning. It is precisely because existing methods overlook such a prior that leads to the forced adoption of explicit regularization. By directly utilizing the label degrees prior, we design a properly weighted objective function, exempting the need from explicit regularization. Moreover, we provide rigorous theoretical analysis, revealing in principle that the weighting plays an implicit regularization role. To sum up, our method has four advantages, it is 1) model selection free; 2) with closed-form solution (sub-problem) and easy-to-implement (a few lines of codes); 3) with linear computational complexity in the number of samples, thus scalable to large datasets; 4) competitive with state-of-the-arts in both random and non-random missing scenarios.

List of keywords

Machine Learning -> ML: Multi-label learning
Machine Learning -> ML: Weakly supervised learning

7199

Towards a Framework for Learning of Algorithms: The Case of Learned Comparison Sorts

Philipp Kunz, Ilche Georgievski, Marco Aiello

6 min. talk | August 6th at 11:30 | Session: ML: Applications

[+] More

[-] Less

Designing algorithms is cumbersome and error-prone. This, among other things, has increasingly led to efforts to extend or even replace designing algorithms with machine learning models. While previous research has demonstrated that some machine learning models possess Turing-completeness, the findings are largely theoretical, and solutions for specific algorithmic tasks remain unclear. With this in mind, we investigate the feasibility of learning representations of classical algorithms from data on their execution, enabling their application to different inputs. We propose a novel and general framework for algorithm learning consisting of a model of computation that facilitates algorithm analysis across various levels of abstraction. We formalize the problem of learning an algorithm using an algebraic approach for graph traversal. We apply this framework to comparison sorts and evaluate the inferred machine learning models’ performance, demonstrating the applicability of the approach in terms of accuracy and sensitivity.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Classification
Machine Learning -> ML: Supervised Learning

7202

Error-aware Sampling in Adaptive Shells for Neural Surface Reconstruction

Qi Wang, Yuchi Huo, Qi Ye, Rui Wang, Hujun Bao

6 min. talk | August 6th at 15:00 | Session: CV: 3D computer vision (2/2)

[+] More

[-] Less

Neural implicit surfaces with signed distance functions (SDFs) achieve superior quality in 3D geometry reconstruction. However, training SDFs is time-consuming because it requires a great number of samples to calculate accurate weight distributions and a considerable amount of samples sampled from the distribution for integrating the rendering results. Some existing sampling strategies focus on this problem. During the training, they assume a spatially-consistent convergence speed of kernel size, thus still suffering from low convergence or errors. Instead, we introduce an error-aware sampling method based on thin intervals of valid weight distributions, dubbed adaptive shells, to reduce the number of samples while still maintaining the reconstruction accuracy. To this end, we first extend Laplace-based neural implicit surfaces with learned spatially-varying kernel sizes which indicates the range of valid weight distributions. Then, the adaptive shell for each ray is determined by an efficient double-clipping strategy with spatially-varying SDF values and kernel sizes, fitting larger kernel sizes to wider shells. Finally, we calculate the error-bounded cumulative distribution functions (CDFs) of shells to conduct efficient importance sampling, achieving low-variance rendering with fewer calculations. Extensive results in various scenes demonstrate the superiority of our sampling technique, including significantly reducing sample counts and training time, even improving the reconstruction quality. The code is available at https://github.com/erernan/ESampling.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications

7209

How Hard Is It to Impact the Impact of Your Paper?

Yongjie Yang

6 min. talk | August 6th at 11:30 | Session: GTEP: Computational social choice (1/2)

[+] More

[-] Less

Consolidation-disruption index (CD index) is a new metric for qualitatively measuring the contribution of a patent or a research paper. We embark on the study of the complexity of the CD index manipulation problems, which model scenarios where scholars seek to enhance the CD indices of their papers through the merging, addition, or deletion of papers. We show that these problems are generally computationally hard, even when restricted to very realistic special cases. Specifically, we analyze how various parameters influence the parameterized complexity of these problems.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

7219

Towards Sharper Risk Bounds for Minimax Problems

Bowei Zhu, Shaojie Li, Yong Liu

6 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)

[+] More

[-] Less

Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.

List of keywords

Machine Learning -> ML: Learning theory
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Robustness

7236

Trusted Multi-view Learning with Label Noise

Cai Xu, Yilin Zhang, Ziyu Guan, Wei Zhao

6 min. talk | August 6th at 15:00 | Session: ML: Multi-view learning

[+] More

[-] Less

Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However, these methods heavily rely on high-quality ground-truth labels. This motivates us to delve into a new generalized trusted multi-view learning problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose a trusted multi-view noise refining method to solve this problem. We first construct view-opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates. Subsequently, we design view-specific noise correlation matrices that transform the original opinions into noisy opinions aligned with the noisy labels. Considering label noises originating from low-quality data features and easily-confused classes, we ensure that the diagonal elements of these matrices are inversely proportional to the uncertainty, while incorporating class relations into the off-diagonal elements. Finally, we aggregate the noisy opinions and employ a generalized maximum likelihood loss on the aggregated opinion for model training, guided by the noisy labels. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets. Experiment results show that TMNR outperforms baseline methods on accuracy, reliability and robustness. The code and appendix are released at https://github.com/YilinZhang107/TMNR.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Trustworthy machine learning
Machine Learning -> ML: Multi-modal learning

7237

Disentangling Domain and General Representations for Time Series Classification

Youmin Chen, Xinyu Yan, Yang Yang, Jianfeng Zhang, Jing Zhang, Lujia Pan, Juren Li

6 min. talk | August 8th at 15:00 | Session: ML: Time series and data streams

[+] More

[-] Less

Modeling time series data has become a very at tractive research topic due to its wide application, such as human activity recognition, financial forecasting and sensor-based automatic system monitoring. Recently deep learning models have shown great advances in modeling the time series data but they heavily depend on a large amount of labeled data. To avoid costly labeling, this paper explores domain adaptation from a labeled source domain to the unlabeled target domain on time series data. To achieve the goal, we propose a disentangled representation learning framework named CADT to disentangle the domain-invariant features from the domain-specific ones. Particularly, CADT is injected with a novel class-wise hypersphere loss to improve the generalization of the classifier from the source domain to the target domain. Intuitively, it restricts the source data of the same class within the same hypersphere and minimizes the radius of it, which in turn enlarges the margin between different classes and makes the decision boundary of both domains easier. We further devise several kinds of domain-preserving data augmentation methods to better capture the domain-specific patterns. Extensive experiments on two public datasets and two real-world applications demonstrate the effectiveness of the proposed model against several state-of-the-art baselines.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Classification
Machine Learning -> ML: Deep learning architectures

7247

D3ETR: Decoder Distillation for Detection Transformer

Xiaokang Chen, Jiahui Chen, Yan Liu, Jiaxiang Tang, Gang Zeng

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Although various knowledge distillation (KD) methods for CNN-based detectors have been proven effective in improving small students, build- ing baselines and recipes for DETR-based detec- tors remains a challenge. This paper concentrates on the transformer decoder of DETR-based detec- tors and explores KD methods suitable for them. However, the random order of the decoder outputs poses a challenge for knowledge distillation as it provides no direct correspondence between the pre- dictions of the teacher and the student. To this end, we propose MixMatcher that aligns the de- coder outputs of DETR-based teacher and student, by mixing two teacher-student matching strategies for combined advantages. The first strategy, Adap- tive Matching, applies bipartite matching to adap- tively match the outputs of the teacher and the stu- dent in each decoder layer. The second strategy, Fixed Matching, fixes the correspondence between the outputs of the teacher and the student with the same object queries as input, which alleviates in- stability of bipartite matching in Adaptive Match- ing. Using both strategies together produces bet- ter results than using either strategy alone. Based on MixMatcher, we devise Decoder Distillation for DEtection TRansformer (D3ETR), which dis- tills knowledge in decoder predictions and attention maps from the teacher to student. D3ETR shows superior performance on various DETR-based de- tectors with different backbones. For instance, D3ETR improves Conditional DETR-R50-C5 by 8.3 mAP under 12 epochs training setting with Conditional DETR-R101-C5 serving as the teacher. The code will be released.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)

7253

Peptide Vaccine Design by Evolutionary Multi-Objective Optimization

Dan-Xuan Liu, Yi-Heng Xu, Chao Qian

6 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization problem, aiming to maximize the expected number of peptide-Major Histocompatibility Complex (peptide-MHC) bindings across a broad range of populations by selecting a subset of diverse peptides with limited size; and employed a greedy algorithm, whose performance, however, may be limited due to the greedy nature. In this paper, we propose a new framework PVD-EMO based on Evolutionary Multi-objective Optimization, which reformulates Peptide Vaccine Design as a bi-objective optimization problem that maximizes the expected number of peptide-MHC bindings and minimizes the number of selected peptides simultaneously, and employs a Multi-Objective Evolutionary Algorithm (MOEA) to solve it. We also incorporate warm-start and repair strategies into MOEAs to improve efficiency and performance. We prove that the warm-start strategy ensures that PVD-EMO maintains the same worst-case approximation guarantee as the previous greedy algorithm, and meanwhile, the EMO framework can help avoid local optima. Experiments on a peptide vaccine design for COVID-19, caused by the SARS-CoV-2 virus, demonstrate the superiority of PVD-EMO.

List of keywords

Search -> S: Evolutionary computation
Multidisciplinary Topics and Applications -> MTA: Health and medicine
Search -> S: Applications

7263

Who Looks like Me: Semantic Routed Image Harmonization

Jinsheng Sun, Chao Yao, Xiaokun Wang, Yu Guo, Yalan Zhang, Xiaojuan Ban

6 min. talk | August 8th at 15:00 | Session: CV: Image and video synthesis and generation (2/2)

[+] More

[-] Less

Image harmonization, aiming to seamlessly blend extraneous foreground objects with background images, is a promising and challenging task.Ensuring a synthetic image appears realistic requires maintaining consistency in visual characteristics, such as texture and style, across global and semantic regions.In this paper, We approach image harmonization as a semantic routed style transfer problem, and propose an imageharmonization model by routing semantic similarity explicitly to enhance the consistency of appearance characteristics.To refine calculate the similarity between the composed foreground and background instance, we propose an InstanceSimilarity Evaluation Module(ISEM).To harness analogous semantic information effectively, we further introduceStyle Transfer Block(STB) to establish fine-grained foreground-background semantic correlation.Our method has achieved excellent experimental results on existing datasets and our model outperforms the state-of-the-art by a margin of 0.45 dB on iHarmony4 dataset.

List of keywords

Computer Vision -> CV: Image and video synthesis and generation
Computer Vision -> CV: Computational photography

7267

Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Pei Xu, Junge Zhang, Kaiqi Huang

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Although population-based learning shows its potential in producing diverse behaviors, most previous works still focus on improving the exploration of a single joint policy. In this paper, we show that with a suitable exploration method, maintaining a population of joint policies rather than one joint policy can significantly improve exploration. Our key idea is to guide each member of the population to explore different regions of the environment. To this end, we propose a member-aware exploration objective which explicitly guides each member to maximize deviation from the explored regions of other members, thus forcing them to explore different regions. In addition, we further propose an exploration-enhanced policy constraint to guide each member to learn a joint policy that is both different from other members and promotes exploration, thus increasing the probability of exploring different regions. Under reward-free setting, our method achieves 72% average improvement in the number of explored states compared to classical exploration methods in the multiple-particle environment. Moreover, under sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning

7270

Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning

Wei Tang, Weijia Zhang, Min-Ling Zhang

12 min. talk | August 8th at 10:00 | Session: ML: Weakly supervised learning

[+] More

[-] Less

Multi-instance partial-label learning (MIPL) addresses scenarios where each training sample is represented as a multi-instance bag associated with a candidate label set containing one true label and several false positives. Existing MIPL algorithms have primarily focused on mapping multi-instance bags to candidate label sets for disambiguation, disregarding the intrinsic properties of the label space and the supervised information provided by non-candidate label sets. In this paper, we propose an algorithm named ELIMIPL, i.e., Exploiting conjugate Label Information for Multi-Instance Partial-Label learning, which exploits the conjugate label information to improve the disambiguation performance. To achieve this, we extract the label information embedded in both candidate and non-candidate label sets, incorporating the intrinsic properties of the label space. Experimental results obtained from benchmark and real-world datasets demonstrate the superiority of the proposed ELIMIPL over existing MIPL algorithms and other well-established partial-label learning algorithms.

List of keywords

Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Classification

7272

Learning Label-Specific Multiple Local Metrics for Multi-Label Classification

Jun-Xiang Mao, Jun-Yi Hang, Min-Ling Zhang

6 min. talk | August 9th at 11:30 | Session: ML: Classification

[+] More

[-] Less

Multi-label metric learning serve as an effective strategy to facilitate multi-label classification, aiming to learn better similarity metrics from multi-label examples. Existing multi-label metric learning approaches learn consistent metrics across all multi-label instances in the label space. However, such consistent metric learning approaches are suboptimal as they neglect the nonlinear distribution characteristics of multi-label instances. In this paper, we present LSMM, a label-specific multi-metric learning framework for multi-label classification, where nonlinear distribution characteristics of multi-label examples are considered by learning label-specific multiple local metrics for different instances on the shoulder of a global one. Specifically, multi-label instances within each label space can be divided into several disjoint partitions through either semantic-based or cluster-based partition strategies, in each of which a local metric is trained to separate the instances locally. Besides, a global metric is introduced to implicitly exploit high-order label correlations across all labels. The combination of the global metric and label-specific local metrics is utilized to measure the semantic similarities between multi-label instances in each label space, under which similar intra-class instances are pushed closer and inter-class instances are pulled apart. Comprehensive experiments on benchmark multi-label data sets validate the superiority of LSMM in learning effective similarity metrics for multi-label classification.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label learning

7274

Robust Heterophilic Graph Learning against Label Noise for Anomaly Detection

Junhang Wu, Ruimin Hu, Dengshi Li, Zijun Huang, Lingfei Ren, Yilong Zang

6 min. talk | August 8th at 11:30 | Session: DM: Anomaly/outlier detection

[+] More

[-] Less

Given clean labels, Graph Neural Networks (GNNs) have shown promising abilities for graph anomaly detection. However, real-world graphs are inevitably noisy labeled, which drastically degrades the performance of GNNs. To alleviate it, some studies follow the local consistency (a.k.a homophily) assumption to conduct neighborhood-based label noise correction, and to dense raw graphs using raw features or representations learned by poisoned labels. But for the anomaly detection task, the graph is not always homophilic but more likely to be heterophilic, which would corrupt the above assumption due to complicating connection patterns and impairing the effects of message passing. To this end, we propose a novel label noise-resistant graph learning (NRGL) framework, which facilitates robust graph learning from the perspectives of structure augmentation and fine-grained label governance. Specifically, we first present an investigation to verify that increasing graph homophily could help resist label noise. Based on the observation, an unsupervised contrastive learning paradigm is then introduced so well that it cannot only adaptively extract the dual views from the raw graph as structure augmentation, but also enhance the robustness of node representations. Next, given robust node representations, the noisy labels are divided into three candidate sets based on the small-loss criterion for fine-grained noise governance. Furthermore, a node sampler is designed to take structure importance, class frequency, and confidence score into consideration, which helps select reliable and important nodes for training. Extensive experiments on real-world datasets demonstrate the effectiveness of our method.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining
Data Mining -> DM: Mining graphs

7276

Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization

Shengjie Ren, Zhijia Qiu, Chao Bian, Miqing Li, Chao Qian

12 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically, we prove that the proposed method, working with crossover, can help enhance the exploration, leading to polynomial or even exponential acceleration on the expected running time. This result is derived by rigorous running time analysis in both single-objective and multi-objective scenarios, including (mu+1)-GA solving the widely studied single-objective problem, Jump, and NSGA-II and SMS-EMOA (two well-established multi-objective EAs) solving the widely studied bi-objective problem, OneJumpZeroJump. Experiments are also conducted to validate the theoretical results. We hope that our results may encourage the exploration of diversity maintenance in the solution space for multi-objective optimization, where existing EAs usually only consider the diversity in the objective space and can easily be trapped in local optima.

List of keywords

Search -> S: Evolutionary computation

7281

From Optimization to Generalization: Fair Federated Learning against Quality Shift via Inter-Client Sharpness Matching

Nannan Wu, Zhuo Kuang, Zengqiang Yan, Li Yu

6 min. talk | August 6th at 15:00 | Session: ML: Federated learning (1/2)

[+] More

[-] Less

Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to develop an inherent bias towards higher-quality images, thus posing a severe fairness issue. In this study, we pioneer the identification and formulation of this new fairness challenge within the context of the imaging quality shift. Traditional methods for promoting fairness in federated learning predominantly focus on balancing empirical risks across diverse client distributions. This strategy primarily facilitates fair optimization across different training data distributions, yet neglects the crucial aspect of generalization. To address this, we introduce a solution termed Federated learning with Inter-client Sharpness Matching (FedISM). FedISM enhances both local training and global aggregation by incorporating sharpness-awareness, aiming to harmonize the sharpness levels across clients for fair generalization. Our empirical evaluations, conducted using the widely-used ICH and ISIC 2019 datasets, establish FedISM’s superiority over current state-of-the-art federated learning methods in promoting fairness. Code is available at https://github.com/wnn2000/FFL4MIA.

List of keywords

Machine Learning -> ML: Federated learning
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Biomedical image analysis

7289

An Archive Can Bring Provable Speed-ups in Multi-Objective Evolutionary Algorithms

Chao Bian, Shengjie Ren, Miqing Li, Chao Qian

12 min. talk | August 8th at 11:30 | Session: S: Evolutionary computation

[+] More

[-] Less

In the area of multi-objective evolutionary algorithms (MOEAs), there is a trend of using an archive to store non-dominated solutions generated during the search. This is because 1) MOEAs may easily end up with the final population containing inferior solutions that are dominated by other solutions discarded during the search process and 2) the population that has a commensurable size of the problem’s Pareto front is often not practical. In this paper, we theoretically show, for the first time, that using an archive can guarantee speed-ups for MOEAs. Specifically, we prove that for two well-established MOEAs (NSGA-II and SMS-EMOA) on two commonly studied problems (OneMinMax and LeadingOnesTrailingZeroes), using an archive brings a polynomial acceleration on the expected running time. The reason is that with an archive, the size of the population can reduce to a small constant; there is no need for the population to keep all the Pareto optimal solutions found. This contrasts existing theoretical studies for MOEAs where a population with a commensurable size of the problem’s Pareto front is needed. The findings in this paper not only provide a theoretical confirmation for an increasingly popular practice in the design of MOEAs, but can also be beneficial to the theory community towards studying more practical MOEAs.

List of keywords

Search -> S: Evolutionary computation

7294

Federated Prompt Learning for Weather Foundation Models on Devices

Shengchao Chen, Guodong Long, Tao Shen, Jing Jiang, Chengqi Zhang

6 min. talk | August 9th at 10:00 | Session: MTA: Multidisciplinary Topics and Applications (2/2)

[+] More

[-] Less

On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing, holds significance for supporting human activates. Federated Learning is a promising solution for such forecasting by enabling collaborative model training without sharing raw data. However, it faces three main challenges that hinder its reliability: (1) data heterogeneity among devices due to geographic differences; (2) data homogeneity within individual devices and (3) communication overload from sending large model parameters for collaboration. To address these challenges, this paper propose Federated Prompt learning for Weather Foundation Models on Devices (FedPoD), which enables devices to obtain highly customized models while maintaining communication efficiency. Concretely, our Adaptive Prompt Tuning leverages lightweight prompts guide frozen foundation model to generate more precise predictions, also conducts prompt-based multi-level communication to encourage multi-source knowledge fusion and regulate optimization. Additionally, Dynamic Graph Modeling constructs graphs from prompts, prioritizing collaborative training among devices with similar data distributions to against heterogeneity. Extensive experiments demonstrates FedPoD leads the performance among state-of-the-art baselines across various setting in real-world on-device weather forecasting datasets.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Energy, environment and sustainability
Machine Learning -> ML: Applications
Machine Learning -> ML: Federated learning
Data Mining -> DM: Mining heterogenous data

7299

Where Elegance Meets Precision: Towards a Compact, Automatic, and Flexible Framework for Multi-modality Image Fusion and Applications

Jinyuan Liu, Guanyao Wu, Zhu Liu, Long Ma, Risheng Liu, Xin Fan

6 min. talk | August 9th at 10:00 | Session: CV: Applications

[+] More

[-] Less

Multi-modality image fusion aims to integrate images from multiple sensors, producing an image that is visually appealing and offers more comprehensive information than any single one. To ensure high visual quality and facilitate accurate subsequent perception tasks, previous methods have often cascaded networks using weighted loss functions. However, such simplistic strategies struggle to truly achieve the "Best of Both Worlds", and the adjustment of numerous hand-crafted parameters becomes burdensome. To address these challenges, this paper introduces a Compact, Automatic and Flexible framework, dubbed CAF, designed for infrared and visible image fusion, along with subsequent tasks. Concretely, we recast the combined problem of fusion and perception into a single objective, allowing mutual optimization of information from both tasks. Then we also utilize the perception task to inform the design of fusion loss functions, facilitating the automatic identification of optimal fusion objectives tailored to the task. Furthermore, CAF can support seamless integration with existing approaches easily, offering flexibility in adapting to various tasks and network structures. Extensive experiments demonstrate the superiority of CAF, which not only produces visually admirable fused results but also realizes 1.7 higher detection mAP@.5 and 2.0 higher segmentation mIoU than the state-of-the-art methods. The code is available at https://github.com/RollingPlain/CAF_IVIF.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Multimodal learning

7320

A SAT Solver + Computer Algebra Attack on the Minimum Kochen–Specker Problem

Zhengyu Li, Curtis Bright, Vijay Ganesh

6 min. talk | August 9th at 10:00 | Session: CSO: Satisfiabilty

[+] More

[-] Less

One of the fundamental results in quantum foundations is the Kochen–Specker (KS) theorem, which states that any theory whose predictions agree with quantum mechanics must be contextual, i.e., a quantum observation cannot be understood as revealing a pre-existing value. The theorem hinges on the existence of a mathematical object called a KS vector system. While many KS vector systems are known, the problem of finding the minimum KS vector system in three dimensions (3D) has remained stubbornly open for over 55 years. To address the minimum KS problem, we present a new verifiable proof-producing method based on a combination of a Boolean satisfiability (SAT) solver and a computer algebra system (CAS) that uses an isomorph-free orderly generation technique that is very effective in pruning away large parts of the search space. Our method shows that a KS system in 3D must contain at least 24 vectors. We show that our sequential and parallel Cube-and-Conquer (CnC) SAT+CAS methods are significantly faster than SAT-only, CAS-only, and a prior CAS-based method of Uijlen and Westerbaan. Further, while our parallel pipeline is somewhat slower than the parallel CnC version of the recently introduced Satisfiability Modulo Theories (SMS) method, this is in part due to the overhead of proof generation. Finally, we provide the first computer-verifiable proof certificate of a lower bound to the KS problem with a size of 40.3 TiB in order 23.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Solvers and tools

7331

Improving Multi-agent Reinforcement Learning with Stable Prefix Policy

Yue Deng, Zirui Wang, Yin Zhang

6 min. talk | August 7th at 15:00 | Session: MAS: Multi-agent learning

[+] More

[-] Less

In multi-agent reinforcement learning (MARL), the epsilon-greedy method plays an important role in balancing exploration and exploitation during the decision-making process in value-based algorithms. However, the epsilon-greedy exploration process will introduce conservativeness when calculating the expected state value when the agents are more in need of exploitation during the approximate policy convergence, which may result in a suboptimal policy convergence. Besides, eliminating the epsilon-greedy algorithm leaves no exploration and may lead to unacceptable local optimal policies. To address this dilemma, we use the previously collected trajectories to construct a Monte-Carlo Trajectory Tree, so that an existing optimal template, a sequence of state prototypes, can be planned out. The agents start by following the planned template and act according to the policy without exploration, Stable Prefix Policy. The agents will adaptively dropout and begin to explore by following the epsilon-greedy method when the policy still needs exploration. We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within early time steps.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

7338

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis

Zhoulin Ji, Chenhao Lin, Hang Wang, Chao Shen

6 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)

[+] More

[-] Less

Detecting synthetic from real speech is increasingly crucial due to the risks of misinformation and identity impersonation. While various datasets for synthetic speech analysis have been developed, they often focus on specific areas, limiting their utility for comprehensive research. To fill this gap, we propose the Speech-Forensics dataset by extensively covering authentic, synthetic, and partially forged speech samples that include multiple segments synthesized by different high-quality algorithms. Moreover, we propose a TEmporal Speech LocalizaTion network, called TEST, aiming at simultaneously performing authenticity detection, multiple fake segments localization, and synthesis algorithms recognition, without any complex post-processing. TEST effectively integrates LSTM and Transformer to extract more powerful temporal speech representations and utilizes dense prediction on multi-scale pyramid features to estimate the synthetic spans. Our model achieves an average mAP of 83.55% and an EER of 5.25% at the utterance level. At the segment level, it attains an EER of 1.07% and a 92.19% F1 score. These results highlight the model’s robust capability for a comprehensive analysis of synthetic speech, offering a promising avenue for future research and practical applications in this field.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Deep learning architectures
Machine Learning -> ML: Structured prediction
Natural Language Processing -> NLP: Speech

7353

Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification

Teng Yang, Song Xiao, Wenqian Dong, Jiahui Qu, Yueguang Yang

6 min. talk | August 7th at 15:00 | Session: CV: Recognition (object detection, categorization)

[+] More

[-] Less

The joint classification of multisource remote sensing data is a prominent research field. However, most of the existing works are tailored for two specific data sources, which fail to effectively address the diverse combinations of data sources in practical applications. The importance of designing a unified network with applicability has been disregarded. In this paper, we propose a unified and self-supervised Symbiotic Diffusion framework (named SymDiffuser), which achieves the joint classification of any pair of different remote sensing data sources in a single model. The SymDiffuser captures the inter-modal relationship through establishing reciprocal conditional distributions across diverse sources step by step. The fusion process of multisource data is consistently represented within the framework from a data distribution perspective. Subsequently, features under the current conditional distribution at each time step is integrated during the downstream phase to accomplish the classification task. Such joint classification methodology transcends source-specific considerations, rendering it applicable to remote sensing data from any diverse sources. The experimental results showcase the framework’s potential in achieving state-of-the-art performance in multimodal fusion classification task.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Classification

7355

Prompt Learning for Generalized Vehicle Routing

Fei Liu, Xi Lin, Weiduo Liao, Zhenkun Wang, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan

6 min. talk | August 6th at 11:30 | Session: S: Search and machine learning

[+] More

[-] Less

Neural combinatorial optimization (NCO) is a promising learning-based approach to solving various vehicle routing problems without much manual algorithm design. However, the current NCO methods mainly focus on the in-distribution performance, while the real-world problem instances usually come from different distributions. A costly fine-tuning approach or generalized model retraining from scratch could be needed to tackle the out-of-distribution instances. Unlike the existing methods, this work investigates an efficient prompt learning approach in NCO for cross-distribution adaptation. To be concrete, we propose a novel prompt learning method to facilitate fast zero-shot adaptation of a pre-trained model to solve routing problem instances from different distributions. The proposed model learns a set of prompts among various distributions and then selects the best-matched one to prompt a pre-trained attention model for each problem instance. Extensive experiments show that the proposed prompt learning approach facilitates the fast adaptation of pre-trained routing models. It also outperforms existing generalized models on both in-distribution prediction and zero-shot generalization to a diverse set of new tasks. Our code implementation is available online at https://github.com/FeiLiu36/PromptVRP.

List of keywords

Search -> S: Search and machine learning
Machine Learning -> ML: Applications
Search -> S: Combinatorial search and optimisation

7359

On the Computation of Example-Based Abductive Explanations for Random Forests

Gilles Audemard, Jean-Marie Lagniez, Pierre Marquis, Nicolas Szczepanski

6 min. talk | August 8th at 11:30 | Session: ML: Explainable/Interpretable machine learning

[+] More

[-] Less

We show how to define and compute example-based abductive explanations. Such explanations are guaranteed to be 100% correct, fairly general, and persuasive enough since they cover sufficiently many reference instances furnished by the explainee. We prove that the latter coverage condition yields a complexity shift to the second level of the polynomial hierarchy. We present a CEGAR-based algorithm to derive such explanations, and show how to modify it to derive most anchored example-based abductive explanations, i.e., example-based abductive explanations that cover as many reference instances as possible. We also explain how to reduce example-based abductive explanations to get subset-minimal explanations. Experiments in the case of random forest classifiers show that our CEGAR-based algorithm is quite efficient in practice.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Constraint Satisfaction and Optimization -> CSO: Applications
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

7374

Efficient Offline Meta-Reinforcement Learning via Robust Task Representations and Adaptive Policy Generation

Zhengwei Li, Zhenyang Lin, Yurou Chen, Zhiyong Liu

6 min. talk | August 8th at 15:00 | Session: ML: Reinforcement learning (2/2)

[+] More

[-] Less

Zero-shot adaptation is crucial for agents facing new tasks. Offline Meta-Reinforcement Learning (OMRL), utilizing offline multi-task datasets to train policies, offers a way to attain this ability. Although most OMRL methods construct task representations via contrastive learning and merge them with states for policy input, these methods may have inherent problems. Specifically, integrating task representations with states for policy input limits learning efficiency, due to failing to leverage the similarities among tasks. Moreover, uniformly sampling an equal number of negative samples from different tasks in contrastive learning can hinder differentiation of more similar tasks, potentially diminishing task representation robustness. In this paper, we introduce an OMRL algorithm to tackle the aforementioned issues. We design a network structure for efficient learning by leveraging task similarity. It features shared lower layers for common feature extraction with a hypernetworks-driven upper layer, customized to process features per task’s attributes. Furthermore, to achieve robust task representations for generating task-specific control policies, we utilize contrastive learning and introduce a novel method to construct negative sample pairs based on task similarity. Experimental results show that our method notably boosts learning efficiency and zero-shot adaptation in new tasks, surpassing previous methods across multiple challenging domains.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Offline reinforcement learning
Robotics -> ROB: Learning in robotics

7380

Public Event Scheduling with Busy Agents

Bo Li, Lijun Li, Minming Li, Ruilong Zhang

6 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)

[+] More

[-] Less

We study a public event scheduling problem, where multiple public events are scheduled to coordinate the availability of multiple agents. The availability of each agent is determined by solving a separate flexible interval job scheduling problem, where the jobs are required to be preemptively processed. The agents want to attend as many events as possible, and their agreements are considered to be the total length of time during which they can attend these events. The goal is to find a schedule for events as well as the job schedule for each agent such that the total agreement is maximized. We first show that the problem is NP-hard, and then prove that a simple greedy algorithm achieves 1/2-approximation when the whole timeline is polynomially bounded. Our method also implies a (1-1/e)-approximate algorithm for this case. Subsequently, for the general timeline case, we present an algorithmic framework that extends a 1/alpha-approximate algorithm for the one-event instance to the general case that achieves 1/(alpha+1)-approximation. Finally, we give a polynomial time algorithm that solves the one-event instance, and this implies a 1/2-approximate algorithm for the general case.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Planning and Scheduling -> PS: Scheduling

7397

Off-Agent Trust Region Policy Optimization

Ruiqing Chen, Xiaoyuan Zhang, Yali Du, Yifan Zhong, Zheng Tian, Fanglei Sun, Yaodong Yang

6 min. talk | August 6th at 11:30 | Session: ML: Multiagent Reinforcement Learning

[+] More

[-] Less

Leveraging the experiences of other agents offers a powerful mechanism to enhance policy optimization in multi-agent reinforcement learning (MARL). However, contemporary MARL algorithms often neglect experience sharing possibilities or adopt a simple approach via direct parameter sharing. Our work explores a refined off-agent learning framework that allows selective integration of experience from other agents to improve policy learning. Our investigation begins with a thorough assessment of current mechanisms for reusing experiences among heterogeneous agents, revealing that direct experience transfer may result in negative consequences. Moreover, even the experience of homogeneous agents requires modification before reusing. Our approach introduces off-agent adaptations to the multi-agent policy optimization methods, enabling effective and purposeful leverage of cross-agent experiences beyond conventional parameter sharing. Accompanying this, we provide a theoretical guarantee for an approximate monotonic improvement. Experiments conducted on the StarCraftII Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate that our algorithms outperform state-of-the-art (SOTA) methods and achieve faster convergence, suggesting the viability of our approach for efficient experience reusing in MARL.

List of keywords

Machine Learning -> ML: Multiagent Reinforcement Learning
Machine Learning -> ML: Partially observable reinforcement learning and POMDPs
Machine Learning -> ML: Reinforcement learning

7428

Dynamic Many-Objective Molecular Optimization: Unfolding Complexity with Objective Decomposition and Progressive Optimization

Dong-Hee Shin, Young-Han Son, Deok-Joong Lee, Ji-Wung Han, Tae-Eui Kam

12 min. talk | August 8th at 15:00 | Session: MTA: Bioinformatics

[+] More

[-] Less

Molecular discovery has received significant attention across various scientific fields by enabling the creation of novel chemical compounds. In recent years, the majority of studies have approached this process as a multi-objective optimization problem. Despite notable advancements, most methods optimize only up to four molecular objectives and are mainly designed for scenarios with a predetermined number of objectives. However, in real-world applications, the number of molecular objectives can be more than four (many-objective) and additional objectives may be introduced over time (dynamic-objective). To fill this gap, we propose DyMol, the first method designed to tackle the dynamic many-objective molecular optimization problem by utilizing a novel divide-and-conquer approach combined with a decomposition strategy. Additionally, we comprehensively integrate convergence, Pareto diversity, and structural diversity into the optimization process to provide efficient exploration of the search space. We validate the superior performance of our method using the practical molecular optimization (PMO) benchmark. The source code and supplementary material are available online.

List of keywords

Multidisciplinary Topics and Applications -> MTA: Bioinformatics
Multidisciplinary Topics and Applications -> MTA: Health and medicine

7441

Tackling Stackelberg Network Interdiction against a Boundedly Rational Adversary

Tien Mai, Avinandan Bose, Arunesh Sinha, Thanh Nguyen, Ayushman Kumar singh

6 min. talk | August 6th at 15:00 | Session: GTEP: Noncooperative games

[+] More

[-] Less

This work studies Stackelberg network interdiction games — an important class of games in which a defender first allocates (randomized) defense resources to a set of critical nodes on a graph while an adversary chooses its path to attack these nodes accordingly. We consider a boundedly rational adversary in which the adversary’s response model is based on a dynamic form of classic logit-based (quantal response) discrete choice models. The resulting optimization is non-convex and additionally, involves complex terms that sum over exponentially many paths. We tackle these computational challenges by presenting new efficient algorithms with solution guarantees. First, we present a near optimal solution method based on path sampling, piece-wise linear approximation and mixed-integer linear programming (MILP) reformulation. Second, we explore a dynamic programming based method, addressing the exponentially-many-path challenge. We then show that the gradient of the non-convex objective can also be computed in polynomial time, which allows us to use a gradient-based method to solve the problem efficiently. Experiments based on instances of different sizes demonstrate the efficiency of our approach in achieving near-optimal solutions.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Constraint Satisfaction and Optimization -> CSO: Mixed discrete and continuous optimization

7473

A Successful Strategy for Multichannel Iterated Prisoner’s Dilemma

Zhen Wang, Zhaoheng Cao, Juan Shi, Peican Zhu, Shuyue Hu, Chen Chu

6 min. talk | August 7th at 10:00 | Session: MAS: Coordination and cooperation

[+] More

[-] Less

Iterated prisoner’s dilemma (IPD) and its variants are fundamental models for understanding the evolution of cooperation in human society as well as AI systems. In this paper, we focus on multichannel IPD, and examine how an agent should behave to obtain generally high payoffs under this setting. We propose a novel strategy that chooses to cooperate or defect by considering the difference in the cumulative number of defections between two agents. We show that our proposed strategy is nice, retaliatory, and forgiving. Moreover, we analyze the performance of our proposed strategy across different scenarios, including the self-play settings with and without errors, as well as when facing various opponent strategies. In particular, we show that our proposed strategy is invincible and never loses to any opponent strategy in terms of the expected payoff. Last but not least, we empirically validate the evolutionary advantage of our strategy, and demonstrate its potential to serve as a catalyst for cooperation emergence.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

7474

Zero-Shot Sketch Based Image Retrieval via Modality Capacity Guidance

Yanghong Zhou, Dawei Liu, P. Y. Mok

6 min. talk | August 7th at 10:00 | Session: CV: Image and video retrieval

[+] More

[-] Less

Zero-shot sketch-based image retrieval (ZS-SBIR), aiming to recognize and retrieve relevant photos based on freehand sketch queries that belong to unseen categories in the search set, has sparked considerable interest, benefiting from the rapid advancements in multimodal learning and feature representation research. Despite the recent improvements in performance, there are still rooms for refining feature representation and thus enhancing the generalization capabilities of the models. Most of the existing research efforts have primarily focused on learning the feature distribution of modalities within specific datasets, without considering the broader dataset-agnostic `population distribution’ of relevant modalities. In this paper, we investigate the modality population distribution and apply such knowledge to guide feature learning. Specifically, we propose a modality capacity constraint loss to control the learning of population distribution for sketches and photos. This loss can be effectively combined with retrieval loss (e.g., triplet loss) or classification loss (e.g., InfoNCE loss) to enhance the performance of ZS-SBIR, through the fine-tuning process of pre-trained models like CLIP and DINO. Extensive experiment results have demonstrated our significant performance improvements, achieving an increase of 7.3%/3.2% and 19.9%/10.3% in terms of mAP@200/P@200 compared to the state-of-the-art models on CLIP and DINO, respectively, on the Sketchy-ext dataset (split 2). Data, code, and supplementary information are available at https://github.com/YHdian0716/ZS-SBIR-MCC.git.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Multimodal learning
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

7486

LocMoE: A Low-overhead MoE for Large Language Model Training

Jing Li, Zhijie Sun, Xuan He, Li Zeng, Yi Lin, Entong Li, Binfan Zheng, Rongqian Zhao, Xin Chen

6 min. talk | August 9th at 11:30 | Session: NLP: Language models

[+] More

[-] Less

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Σ model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Other

7534

Exactly Solving Minimum Dominating Set and Its Generalization

Ziliang Xiong, Mingyu Xiao

6 min. talk | August 7th at 11:30 | Session: S: Combinatorial search and optimisation

[+] More

[-] Less

The Minimum Dominating Set Problem (MDSP) is an important NP-Hard optimization problem with many applications in various domains. This paper designs two exact algorithms for MDSP that use the same Branch-and-Bound framework. However, one uses LP relaxations as lower bounds for pruning the search space, and the other one is a pure combinatorial algorithm. The two algorithms possess a distinct advantage. Performance experiments on standard test datasets reveal that our combinatorial algorithm is over 1000 times faster than the previous state-of-the-art exact algorithm presented in IJCAI 2023, and our LP Relaxation algorithm can even enhance the speed of our combinatorial algorithm by over 100 times. However, our combinatorial algorithm still outperform the LP Relaxation algorithm on very dense graphs.

List of keywords

Search -> S: Combinatorial search and optimisation

7616

Efficient Correlated Subgraph Searches for AI-powered Drug Discovery

Hiroaki Shiokawa, Yuma Naoi, Shohei Matsugu

6 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)

[+] More

[-] Less

Correlated subgraph searches (CSSs) are essential building blocks for AI-powered drug discovery. Given a query molecule modeled as a graph, CSS finds top-k molecules correlated to the query in a database. However, the cost increases exponentially with the molecule size. Herein we present Corgi, a framework to accelerate CSS methods while ensuring top-k search accuracy. Corgi dynamically excludes unnecessary subgraphs to overcome the expensive cost without sacrificing search accuracy. Our experimental analysis confirms that Corgi has a shorter running time and improved accuracy compared to existing state-of-the-art methods, while a case study demonstrates that Corgi is suitable for practical AI-powered drug discovery.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MTA: Other

7623

Learning What to Monitor: Using Machine Learning to Improve past STL Monitoring

Andrea Brunello, Luca Geatti, Angelo Montanari, Nicola Saccomanno

6 min. talk | August 8th at 10:00 | Session: KRR: Learning and reasoning

[+] More

[-] Less

Monitoring is a runtime verification technique that can be used to check whether an execution of a system (trace) satisfies or not a given set of properties. Compared to other formal verification techniques, e.g., model checking, one needs to specify the properties to be monitored, but a complete model of the system is no longer necessary. First, we introduce the pure past fragment of Signal Temporal Logic (ppSTL), and we use it to define the monitorable safety (G(ppSTL)) and cosafety (F(ppSTL)) fragments of STL, which properly extend the commonly-used bounded-future fragment. Then, we devise a multi-objective genetic programming algorithm to automatically extend the set of properties to monitor on the basis of the history of failure traces collected over time. The framework resulting from the integration of the monitor and the learning algorithm is then experimentally validated on various public datasets. The outcomes of the experimentation confirm the effectiveness of the proposed solution.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Symbolic methods

7639

MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization

Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly

6 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)

[+] More

[-] Less

This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task’s reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time — from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) — bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines).

List of keywords

Natural Language Processing -> NLP: Summarization
Machine Learning -> ML: Generative models
Multidisciplinary Topics and Applications -> MTA: Health and medicine
Natural Language Processing -> NLP: Language generation

7714

Individual Fairness under Group Fairness Constraints in Bipartite Matching – One Framework to Approximate Them All

Atasi Panda, Anand Louis, Prajakta Nimbhorkar

6 min. talk | August 7th at 11:30 | Session: MAS: Agent-based and Multi-agent Systems (1/2)

[+] More

[-] Less

We study the probabilistic assignment of items to platforms that satisfies both group and individual fairness constraints. Each item belongs to specific groups and has a preference ordering over platforms. Each platform enforces group fairness by limiting the number of items per group that can be assigned to it. There could be multiple optimal solutions that satisfy the group fairness constraints, but this alone ignores item preferences. Our approach explores a `best of both worlds fairness’ solution to get a randomized matching, which is ex-ante individually fair and ex-post group-fair. Thus, we seek a `probabilistic individually fair’ distribution over `group-fair’ matchings where each item has a `high’ probability of matching to one of its top choices. This distribution is also ex-ante group-fair. Users can customize fairness constraints to suit their requirements. Our first result is a polynomial-time algorithm that computes a distribution over `group-fair’ matchings such that the individual fairness constraints are approximately satisfied and the expected size of a matching is close to OPT. We empirically test this on real-world datasets. We present two additional polynomial-time bi-criteria approximation algorithms that users can choose from to balance group fairness and individual fairness trade-offs. For disjoint groups, we provide an exact polynomial-time algorithm adaptable to additional lower `group fairness’ bounds. Extending our model, we encompass `maxmin group fairness,’ amplifying underrepresented groups, and `mindom group fairness,’ reducing the representation of dominant groups.’

List of keywords

Agent-based and Multi-agent Systems -> MAS: Resource allocation
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity

7723

Continual Multimodal Knowledge Graph Construction

Xiang Chen, Jingtian Zhang, Xiaohan Wang, Ningyu Zhang, Tongtong Wu, Yuxiang Wang, Yongheng Wang, Huajun Chen

6 min. talk | August 7th at 11:30 | Session: NLP: Information extraction

[+] More

[-] Less

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting—loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce the MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT’s superior performance in evolving knowledge environments, showcasing its capacity to navigate the balance between stability and plasticity.

List of keywords

Natural Language Processing -> NLP: Information extraction
Data Mining -> DM: Knowledge graphs and knowledge base completion
Natural Language Processing -> NLP: Named entities

7738

Combinatorial Routing for Neural Trees

Jiahao Li, Ruichu Cai, Yuguang Yan

6 min. talk | August 7th at 11:30 | Session: ML: Deep learning architectures

[+] More

[-] Less

Neural trees benefit from the high-level representation of neural networks and the interpretability of decision trees. Therefore, the existing works on neural trees perform outstandingly on various tasks such as architecture search. However, these works require every router to provide only one successor for each sample, causing the predictions to be dominated by the elite branch and its derivative architectures. To break this branch dominance, we propose the combinatorial routing neural tree method, termed CombRo. Unlike the previous methods employing unicast routing, CombRo performs multicast schema in each iteration, allowing the features to be routed to any combination of successors at every non-leaf. The weights of each architecture are then evaluated accordingly. We update the weights by training the routing subnetwork, and the architecture with the top weight is selected in the final step. We compare CombRo with the existing algorithms on 3 public image datasets, demonstrating its superior performance in terms of accuracy. Visualization results further validate the effectiveness of the multicast routing schema. Code is available at https://github.com/JiahaoLi-gdut/CombRo.

List of keywords

Machine Learning -> ML: Deep learning architectures
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Classification
Machine Learning -> ML: Ensemble methods

7750

Learning Conditional Preference Networks: An Approach Based on the Minimum Description Length Principle

Pierre-François Gimenez, Jérôme Mengin

6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)

[+] More

[-] Less

CP-nets are a very expressive graphical model for the representation of preferences over combinatorial spaces. They are particularly well suited for settings where an important task is to compute the optimal completion of some partially specified alternative; this is for instance the case of interactive configurators, where preferences can be used, at every step of the interaction, to guide the decision maker towards a satisfactory configuration. Learning CP-nets turns out to be challenging when the input data has the form of pairwise comparisons between alternatives. Furthermore, this type of preference data is not commonly stored: it can be elicitated but this puts an additional burden on the decision maker. In this article, we propose a new method for learning CP-nets from sales history, a kind of data readily available in many e-commerce applications. The approach is based on the the minimum description length (MDL) principle. We show some theoretical properties of this learning task, namely its sample complexity and its NP-completeness, and we experiment this learning algorithm in a recommendation settings with a real sales history from a car maker.

List of keywords

Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Learning preferences or rankings

7786

CoCoG: Controllable Visual Stimuli Generation Based on Human Concept Representations

Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

6 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI

[+] More

[-] Less

A central question for cognitive science is to understand how humans process visual scenes, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli. Generating visual stimuli with controlling concepts is the key. However, there are currently no generative models in AI to solve this problem. Here, we present the Concept based Controllable Generation (CoCoG) framework. CoCoG consists of two components, a simple yet efficient AI agent for extracting interpretable concept and predicting human decision-making in visual similarity judgment tasks, and a conditional generation model for generating visual stimuli given the concepts. We quantify the performance of CoCoG from two aspects, the human behavior prediction accuracy and the controllable generation ability. The experiments with CoCoG indicate that 1) the reliable concept embeddings in CoCoG allows to predict human behavior with 64.07% accuracy in the THINGS-similarity dataset; 2) CoCoG can generate diverse stimuli through the control of concepts; 3) CoCoG can manipulate human similarity judgment behavior by intervening key concepts. CoCoG offers visual objects with controlling concepts to advance our understanding of causality in human cognition. The code of CoCoG framework is available at https://github.com/ncclab-sustech/CoCoG.

List of keywords

Humans and AI -> HAI: Cognitive systems
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Human-computer interaction