SC8842
Self-Repellent Random Walks on General Graphs – Achieving Minimal Sampling Variance via Nonlinear Markov Chains (Extended Abstract)
12 min. talk | August 6th at 11:30 | Session: DM: Mining graphs (1/3)
[+] More
[-] Less
We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (SRRW) which is less likely to transition to nodes that were highly visited in the past, and more likely to transition to seldom visited nodes. For a class of SRRWs parameterized by a positive real α, we prove that the empirical distribution of the process converges almost surely to the target (stationary) distribution of the underlying Markov chain kernel. We then provide a central limit theorem and derive the exact form of the arising asymptotic co-variance matrix, which allows us to show that the SRRW with stronger repellence (larger α) always achieves a smaller asymptotic covariance, in the sense of Loewner ordering of co-variance matrices. Especially for SRRW-driven MCMC algorithms, we show that the decrease in the asymptotic sampling variance is of the order O(1/α), eventually going down to zero. After generalizing these results for a class of weighted empirical measures, we use them as a stepping stone to show that a similar performance ordering can also be obtained for distributed stochastic optimization tasks using token algorithms. More explicitly, by replacing a Markovian token by a SRRW version with the same target distribution, we show that the asymptotic co-variance of the optimization iterates decreases at rate O(1/α^2) – the performance benefit of using SRRW thereby amplified in the stochastic optimization context. Empirical results support our theoretical findings.
SC8844
A Single Vector Is Not Enough: Taxonomy Expansion via Box Embeddings (Extended Abstract)
6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)
[+] More
[-] Less
Taxonomies support various practical web applications such as product navigation in online shopping and user profile tagging on social platforms. Most existing methods for expanding taxonomies encode entities into vector embeddings (i.e., single points). However, we argue that vectors are insufficient to model the “is-a” hierarchy in taxonomy (asymmetrical relation), because two points can only represent pairwise similarity (symmetrical relation). To this end, we propose to project taxonomy entities into boxes (i.e., hyperrectangles). Two boxes can be "contained", "disjoint" and "intersecting", thus naturally representing an asymmetrical taxonomic hierarchy. Upon box embeddings, we propose a novel model BoxTaxo for taxonomy expansion. The core of BoxTaxo is to learn boxes for entities to capture their child-parent hierarchies. Extensive experiments on two benchmarks demonstrate the effectiveness of BoxTaxo compared to vector based models.
SC8845
All in One: Multi-task Prompting for Graph Neural Networks (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)
[+] More
[-] Less
This paper is an extended abstract of our original work published in KDD23, where we won the best research paper award. The paper introduces a novel approach to bridging the gap between pre-trained graph models and the diverse tasks they’re applied to, inspired by the success of prompt learning in NLP. Recognizing the challenge of aligning pre-trained models with varied graph tasks (node level, edge level, and graph level), which can lead to negative transfer and poor performance, we propose a multi-task prompting method for graphs. This method involves unifying graph and language prompt formats, enabling NLP’s prompting strategies to be adapted for graph tasks. By analyzing the task space of graph applications, we reformulate problems to fit graph-level tasks and apply meta-learning to improve prompt initialization for multiple tasks. Experiments show our method’s effectiveness in enhancing model performance across different graph tasks. Beyond the original work, in this extended abstract, we further discuss the graph prompt from a bigger picture and provide some of the latest work toward this area.
SC8846
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (2/6)
[+] More
[-] Less
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. We argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference.
SC8847
gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling (Extended Abstract)
12 min. talk | August 9th at 10:00 | Session: DM: Recommender systems
[+] More
[-] Less
Sequential recommendation models predict the next item in a sequence of user-item interactions, akin to how language models predict the next tokens. These models often adapt language model architectures, treating item IDs as if they were token IDs. However, the number of potential items in recommender systems makes calculating the interaction probability for all items impractical during training; therefore, recommender systems frequently employ negative sampling, where the model learns to differentiate between actual user interactions (positives) and randomly chosen non-interactions (negatives), often using Binary Cross-Entropy (BCE) the loss, framing the problem as a binary classification task.
We demonstrate that using negative sampling with BCE can lead to overconfidence, where the model’s predicted probabilities for user interactions are higher than the actual probabilities. Although the actual score magnitude is not important for ranking items (only the order of scores matters), overconfidence leads to training instability when using Binary Cross-Entropy (BCE) loss. We show that overconfidence explains the performance gap between two leading sequential recommendation models, SASRec and BERT4Rec — the former uses negative sampling, while the latter does not. To counter overconfidence, we introduce Generalised Binary Cross-Entropy (gBCE) loss and the gSASRec model that utilises gBCE. We mathematically prove and empirically validate that gSASRec effectively addresses the issue of overconfidence. Consequently, gSASRec’s effectiveness is better than that of SASRec and matches the state of the BERT4Rec while retaining negative sampling. On the Gowalla dataset with more than 1MM items, where training BERT4Rec is infeasible, gSASRec outperforms the original SASRec model by 41% in terms of NDCG@10.
SC8848
MPGraf: a Modular and Pre-trained Graphformer for Learning to Rank at Web-scale (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: DM: Mining graphs (2/3)
[+] More
[-] Less
Both Transformer and Graph Neural Networks (GNNs) have been used in learning to rank (LTR), however, they adhere to two distinct yet complementary problem formulations, i.e., ranking score regression based on query-webpage pairs and link prediction within query-webpage bipartite graphs, respectively. Though it is possible to pre-train GNNs or Transformers on source datasets and fine-tune them subject to sparsely annotated LTR datasets separately, the source-target distribution shifts across the pairs and bipartite graphs domains make it extremely difficult to integrate these diverse models into a single LTR framework at a web-scale. We introduce the novel MPGraf model, which utilizes a modular and capsule-based pre-training approach, aiming to incorporate regression capacities from Transformers and link prediction capabilities of GNNs cohesively.
We conduct extensive experiments to evaluate the performance of MPGraf using real-world datasets collected from large-scale search engines. The results show that MPGraf can outperform baseline algorithms on several major metrics. Further, we deploy and evaluate MPGraf atop a large-scale search engine with realistic web traffic via A/B tests, where we can still observe significant improvement. MPGraf performs consistently in both offline and online evaluations.
SC8849
Bagging is an Optimal PAC Learner (Extended Abstract)
12 min. talk | August 7th at 15:00 | Session: ML: Machine Learning (3/6)
[+] More
[-] Less
Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, seminal work by Hanneke gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each of the sub-samples. While being a very exciting theoretical result, it has not had much impact in practice, in part due to inefficiency, since it constructs a polynomial number of sub-samples of the training data, each of linear size.
In this work, we prove the surprising result that the practical and classic heuristic bagging (a.k.a. bootstrap aggregation), due to Breiman, is in fact also an optimal PAC learner. Bagging pre-dates Hanneke’s algorithm by twenty years and is taught in most undergraduate machine learning courses. Moreover, we show that it only requires a logarithmic number of sub-samples to reach optimality.
SC8850
GS2P: A Generative Pre-trained Learning to Rank Model with Over-parameterization for Web-Scale Search (Extended Abstract)
12 min. talk | August 8th at 15:00 | Session: DM: Data Mining (2/2)
[+] More
[-] Less
While Learning to Rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of various popularity, debilitating their coverage of search queries across the popularity spectrum, and 2) ill-trained models that are incapable of inducing generalized representations for LTR, culminating in overfitting. To tackle above challenges, we proposed a Generative Semi-supervised Pre-trained (GS2P) LTR model. Specifically, GS2P first generates pseudo-labels for the unlabeled samples using tree-based LTR models after a series of co-training procedures, then learns the representations of query-webpage pairs with self-attentive transformers via both discriminative and generative losses. Finally, GS2P boosts the performance of LTR through incorporating Random Fourier Features to over-parameterize the models into "interpolating regime", so as to enjoy the further descent of generalization errors with learned representations. We conduct extensive offline experiments on a publicly available dataset and a real-world dataset collected from a large-scale search engine. The results show that GS2P can achieve the best performance on both datasets, compared to baselines.
We also deploy GS2P at a large-scale web search engine with realistic traffic, where we can still observe significant improvement in real-world applications.
SC8851
Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: ETF: AI Ethics, Trust, Fairness (1/2)
[+] More
[-] Less
Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. Extensive experiments on two benchmark datasets are reported which confirm the effectiveness of our defense.
SC8852
Voting by Axioms (Extended Abstract)
12 min. talk | August 7th at 15:00 | Session: GTEP: Computational social choice (2/2)
[+] More
[-] Less
We develop an approach for collective decision making from first principles. In this approach, rather than using a—necessarily imperfect—voting rule to map any given scenario where individual agents report their preferences into a collective decision, we identify for every concrete such scenario the most appealing set of normative principles (known as axioms in social choice theory) that would entail a unique decision and then implement that decision. We analyse some of the fundamental properties of this new approach, from both an algorithmic and a normative point of view.
SC8853
The Information Retrieval Experiment Platform (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: NLP: Natural Language Processing (1/3)
[+] More
[-] Less
We have built TIREx, the information retrieval experiment platform, to promote standardized, reproducible, scalable, and blinded retrieval experiments. Standardization is achieved through integration with PyTerrier’s interfaces and compatibility with ir_datasets and ir_measures. Reproducibility and scalability are based on the underlying TIRA framework, which runs dockerized software in a cloud-native execution environment. Using Docker images of 50 standard retrieval approaches, we evaluated all of them on 32 tasks (i.e., 1,600 runs) in less than a week on a midsize cluster (1,620 CPU cores and 24 GPUs), demonstrating multi-task scalability. Importantly, TIRA also enables blind evaluation of AI experiments, as the test data can be hidden from public access and the tested approaches run in a sandbox that prevents data leaks. Keeping the test data hidden from public access ensures that it cannot be used by third parties for LLM training, preventing future training-test leaks.
SC8854
Capturing (Optimal) Relaxed Plans with Stable and Supported Models of Logic Programs (Extended Abstract)
6 min. talk | August 7th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (1/2)
[+] More
[-] Less
We establish a novel relation between delete-free planning, an important task for the AI Planning community also known as relaxed planning, and logic programming. We show that given a planning problem, all subsets of actions that could be ordered to produce relaxed plans for the problem can be bijectively captured with stable models of a logic program describing the corresponding relaxed planning problem. We also consider the supported model semantics of logic programs, and introduce one causal and one diagnostic encoding of the relaxed planning problem as logic programs, both capturing relaxed plans with their supported models. Our experimental results show that these new encodings can provide major performance gain when computing optimal relaxed plans.
SC8855
Proof Logging for Smart Extensional Constraints (Extended Abstract)
12 min. talk | August 8th at 10:00 | Session: CSO: Constraint Satisfaction and Optimization
[+] More
[-] Less
Proof logging provides an auditable way of guaranteeing that a solver has produced a correct answer using sound reasoning. This is standard practice for Boolean satisfiability solving, but for constraint programming, a challenge is that every propagator must be able to justify all inferences it performs. Here we demonstrate how to support proof logging for a wide range of previously uncertified global constraints. We do this by showing how to justify every inference that could be performed by the propagation algorithms for two families of generalised extensional constraint: "Smart Table" and "Regular Language Membership".
SC8856
Streamlining Input/Output Logics with Sequent Calculi (Extended Abstract)
6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)
[+] More
[-] Less
Input/Output (I/O) logic is a general framework for reasoning about conditional norms and/or causal relations. We streamline Bochman’s causal I/O logics and their original version via proof-search-oriented sequent calculi. As a byproduct, we obtain new, simple semantics for all these logics, complexity bounds, embeddings into normal modal logics, and efficient deduction methods. Our work encompasses many scattered results and provides uniform solutions to various unresolved problems.
SC8857
Planning for Temporally Extended Goals in Pure-Past Linear Temporal Logic (Extended Abstract)
12 min. talk | August 9th at 11:30 | Session: KRR: Reasoning about actions
[+] More
[-] Less
We study classical planning for temporally extended goals expressed in Pure-Past Linear Temporal Logic (PPLTL).
PPLTL is as expressive as Linear-time Temporal Logic on finite traces (LTLf), but as shown in this paper, it is computationally much better behaved for planning.
Specifically, we show that planning for PPLTL goals can be encoded into classical planning with minimal overhead, introducing only a number of new fluents that is at most linear in the PPLTL goal and no spurious additional actions.
Based on these results, we implemented a system called Plan4Past, which can be used along with state-of-the-art classical planners, such as LAMA.
An empirical analysis demonstrates the practical effectiveness of Plan4Past, showing that a classical planner generally performs better with our compilation than with other existing compilations for LTLf goals over the considered benchmarks.
SC8858
Selective Learning for Sample-Efficient Training in Multi-Agent Sparse Reward Tasks (Extended Abstract)
12 min. talk | August 6th at 15:00 | Session: ML: Machine Learning (1/6)
[+] More
[-] Less
Learning effective strategies in sparse reward tasks is one of the fundamental challenges in reinforcement learning. This becomes extremely difficult in multi-agent environments, as the concurrent learning of multiple agents induces the non-stationarity problem and a sharply increased joint state space. Existing works have attempted to promote multi-agent cooperation through experience sharing. However, learning from a large collection of shared experiences is inefficient as there are only a few high-value states in sparse reward tasks, which may instead lead to the curse of dimensionality in large-scale multi-agent systems. This paper focuses on sparse-reward multi-agent cooperative tasks and proposes an effective experience-sharing method, Multi-Agent Selective Learning (MASL), to boost sample-efficient training by reusing valuable experiences from other agents. MASL adopts a retrogression-based selection method to identify high-value traces of agents from the team rewards, based on which some recall traces are generated and shared among agents to motivate effective exploration. Moreover, MASL selectively considers information from other agents to cope with the non-stationarity issue while enabling efficient training for large-scale agents. Experimental results show that MASL significantly improves sample efficiency compared with state-of-the-art MARL algorithms in cooperative tasks with sparse rewards.
SC8859
Inferring Ontological Categories of OWL Classes Using Foundational Rules (Extended Abstract)
6 min. talk | August 8th at 15:00 | Session: KRR: Knowledge Representation and Reasoning (2/2)
[+] More
[-] Less
Several efforts that leverage the tools of formal ontology have demonstrated the fruitfulness of considering key metaproperties of classes in ontology engineering. Despite that, it is still a common practice to apply representation schemes and approaches–such as OWL–that do not benefit from identifying ontological categories, and simply treat all classes in the same manner. In the original study, we proposed an approach to support the automated classification of classes into the ontological categories underlying the (g)UFO foundational ontology. We proposed a set of inference rules derived from (g)UFO’s axiomatization that, given an initial classification of the classes in an OWL ontology, supports the inference of the classification for the remaining classes in the ontology. We formalized these rules, implemented them in a tool, and assessed them against a catalog of ontologies.
SC8860
POWL: Partially Ordered Workflow Language (Extended Abstract)
12 min. talk | August 7th at 15:00 | Session: DM: Data Mining (1/2)
[+] More
[-] Less
Processes in real-life scenarios tend to inherently establish partial orders over their constituent activities. This makes partially ordered graphs viable for process modeling. While partial orders capture both concurrent and sequential interactions among activities in a compact way, they fall short in modeling choice and cyclic behavior. To address this gap, we introduce the Partially Ordered Workflow Language (POWL), a novel language for process modeling that combines traditional hierarchical modeling languages with partial orders. In a POWL model, sub-models are combined into larger ones either as partial orders or using control-flow operators that enable the representation of choice and loop structures. This integration of hierarchical structure and partial orders not only offers an effective solution for process modeling but also provides quality guarantees that make POWL particularly suitable for the automated discovery of process models.
SC8861
Content Matters: A Computational Investigation into the Effectiveness of Retrieval Practice and Worked Examples (Extended Abstract)
12 min. talk | August 7th at 15:00 | Session: HAI: Humans and AI
[+] More
[-] Less
In this paper, we argue that computational models of learning can contribute precise theory to explain surprising student learning phenomena. In some past studies, practice produces better learning than studying examples, whereas other studies show the opposite result. We explain this contradiction by suggesting that retrieval practice and example study involve different learning cognitive processes, memorization and induction, and each process is optimal for different types of knowledge. We implement and test this theoretical explanation by extending an AI model of human cognition to include both memory and induction processes and comparing the behavior of the simulated learners to those of human participants. We show that the behavior of simulated learners with forgetting matches that of human participants better than simulated learners without forgetting. Simulated learners with forgetting learn best using retrieval practice in situations that emphasize memorization (such as learning facts), whereas studying examples improves learning when multiple pieces of information are available, so induction and generalization are necessary (such as learning skills).