DM7820
PyClause – Simple and Efficient Rule Handling for Knowledge Graphs
Patrick Betz, Luis Galárraga, Simon Ott, Christian Meilicke, Fabian Suchanek, Heiner Stuckenschmidt
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
Rule mining finds patterns in structured data such as knowledge graphs. Rules can predict facts, help correct errors, and yield explainable insights about the data. However, existing rule mining implementations focus exclusively on mining rules — and not on their application. The PyClause library offers a rich toolkit for the application of the mined rules: from explaining facts to predicting links, scoring rules, and deducing query results. The library is easy to use and can handle substantial data loads.
DM8163
Interactive Visual Learning for Stable Diffusion
Seongmin Lee, Benjamin Hoover, Hendrik Strobelt, Zijie J. Wang, ShengYun Peng, Austin Wright, Kevin Li, Haekyu Park, Haoyang Yang, Duen Horng Chau
30 min. talk | August 6th at 11:30 | Session: Demos-6-AM2-B2
[+] More
[-] Less
Diffusion-based generative models’ impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce Diffusion Explainer, the first interactive visualization tool designed to elucidate how Stable Diffusion transforms text prompts into images. It tightly integrates a visual overview of Stable Diffusion’s complex components with detailed explanations of their underlying operations. This integration enables users to fluidly transition between multiple levels of abstraction through animations and interactive elements. Offering real-time hands-on experience, Diffusion Explainer allows users to adjust Stable Diffusion’s hyperparameters and prompts without the need for installation or specialized hardware. Accessible via users’ web browsers, Diffusion Explainer is making significant strides in democratizing AI education, fostering broader public access. More than 7,200 users spanning 113 countries have used our open-sourced tool at https://poloclub.github.io/diffusion-explainer/. A video demo is available at https://youtu.be/MbkIADZjPnA.
DM8176
ReportParse: A Unified NLP Tool for Extracting Document Structure and Semantics of Corporate Sustainability Reporting
Gaku Morio, Soh Young In, Jungah Yoon, Harri Rowlands, Christopher Manning
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B1
[+] More
[-] Less
We introduce ReportParse, a Python-based tool designed to parse corporate sustainability reports. It combines document structure analysis with natural language processing (NLP) models to extract sustainability-related information from the reports. We also provide easy-to-use web and command interfaces. The tool is expected to aid researchers and analysts in evaluating corporate commitment to and management of sustainability efforts.
DM8296
PiShield: A PyTorch Package for Learning with Requirements
Mihaela C. Stoian, Alex Tatomir, Thomas Lukasiewicz, Eleonora Giunchiglia
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
Deep learning models have shown their strengths in various application domains, however, they often struggle to meet safety requirements for their outputs. In this paper, we introduce PiShield, the first package ever allowing for the integration of the requirements into the neural networks’ topology. PiShield guarantees compliance with these requirements, regardless of input. Additionally, it allows for integrating requirements both at inference and/or training time, depending on the practitioners’ needs. Given the widespread application of deep learning, there is a growing need for frameworks allowing for the integration of the requirements across various domains. Here, we explore three application scenarios: functional genomics, autonomous driving, and tabular data generation.
DM8297
From 2D to 3D: AISG-SLA Visual Localization Challenge
Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, Georg Bökman, Johan Edstedt, Kirill Brodt, Clémentin Boittiaux, Maxime Ferrera, Stepan Konev
30 min. talk | August 7th at 10:00 | Session: Demos-7-AM1-B2
[+] More
[-] Less
Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring 3D data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining the camera’s pose solely through visual cues. However, this task is challenging due to limited data from a single camera. To tackle these challenges, we organized the AISG–SLA Visual Localization Challenge (VLC) at IJCAI 2023 to explore how AI can accurately extract camera pose data from 2D images in 3D space. The challenge attracted over 300 participants worldwide, forming 50+ teams. Winning teams achieved high accuracy in pose estimation using images from a car-mounted camera with low frame rates. The VLC dataset is available for research purposes upon request via vlc-dataset@aisingapore.org.
DM8299
AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning
Chanhyuk Park, Jungbin Cho, Junwan Kim, Seongmin Lee, Jungsu Kim, Sanghoon Lee
30 min. talk | August 7th at 11:30 | Session: Demos-7-AM2-B1
[+] More
[-] Less
This work presents an audio-visual interactive chatbot (AVIN-Chat) system that allows users to have face-to-face conversations with 3D avatars in real-time. Compared to the previous chatbot services, which provide text-only or speech-only communications, the proposed AVIN-Chat can offer audio-visual communications providing users with a superior experience quality. In addition, the proposed AVIN-Chat emotionally speaks and expresses according to the user’s emotional state. Thus, it enables users to establish a strong bond with the chatbot system, increasing the user’s immersion. Through user subjective tests, it is demonstrated that the proposed system provides users with a higher sense of immersion than previous chatbot systems. The demonstration video is available at https://www.youtube.com/watch?v=Z74uIV9k7_k.
DM8308
Carbon Market Simulation with Adaptive Mechanism Design
Han Wang, Wenhao Li, Hongyuan Zha, Baoxiang Wang
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
A carbon market is a market-based tool that incentivizes economic agents to align individual profits with the global utility, i.e., reducing carbon emissions to tackle climate change. Cap and trade stands as a critical principle based on allocating and trading carbon allowances (carbon emission credit), enabling economic agents to follow planned emissions and penalizing excess emissions. A central authority is responsible for introducing and allocating those allowances in cap and trade. However, the complexity of carbon market dynamics makes accurate simulation intractable, which in turn hinders the design of effective allocation strategies. To address this, we propose an adaptive mechanism design framework, simulating the market using hierarchical, model-free multi-agent reinforcement learning (MARL). Government agents allocate carbon credits, while enterprises engage in economic activities and carbon trading. This framework illustrates agents’ behavior comprehensively. Numerical results show MARL enables government agents to balance productivity, equality, and carbon emissions. Our project is available at https: //anonymous.4open.science/r/Carbon-Simulator.
DM8311
LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
Hung Nguyen, Tobias Clement, Loc Nguyen, Nils Kemmerzell, Binh Truong, Khang Nguyen, Mohamed Abdelaal, Hung Cao
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B2
[+] More
[-] Less
LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification, object detection, and semantic segmentation model outputs to end-users. Preliminary results demonstrate LangXAI’s enhanced plausibility, with high BERTScore across tasks, fostering a more transparent and reliable AI framework on vision tasks for end-users. The code and demo of this work can be found at https://analytics-everywhere-lab.github.io/langxai.io/.
DM8315
InViTe: Individual Virtual Transfer for Personalized 3D Face Generation System
Mingyu Jang, Kyungjune Lee, Seongmin Lee, Hoseok Tong, Juwan Chung, Yusung Ro, Sanghoon Lee
30 min. talk | August 7th at 11:30 | Session: Demos-7-AM2-B1
[+] More
[-] Less
With the expansion of the virtual communication industry using VR/AR, it has attracted increasing attention to enable users to represent their personalities in a 3D avatar. As the face of 3D avatars plays a crucial role in conveying human personality, a system that generates and manipulates 3D faces is desired. However, establishing the system is challenging due to the need for human effort and specialized knowledge. To fill this void, we present the Individual Virtual Transfer (InViTe), which enables the creation and customization of a 3D face according to the user’s preference. Our proposed system is featured for 1) 3D face reconstruction with high fidelity texture map, 2) 3D face personalization, 3) realistic rendering results, and 4) real-time mobile virtual applications. We conduct an experiment to demonstrate that the proposed system can achieve sufficient individual personalization of 3D faces. Furthermore, we evaluate the system’s data transmission protocol and demonstrate its efficiency. The demonstration video is available at https://www.youtube.com/watch?v=D 4pXZvGUWU.
DM8316
E-QGen: Educational Lecture Abstract-based Question Generation System
Mao-Siang Chen, An-Zi Yen
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
To optimize the preparation process for educators in academic lectures and associated question-and-answer sessions, this paper presents E-QGen, a lecture abstract-based question generation system. Given a lecture abstract, E-QGen generates potential student inquiries. The questions suggested by our system are expected to not only facilitate teachers in preparing answers in advance but also enable them to supply additional resources when necessary.
DM8317
Place Anything into Any Video
Ziling Liu, Jinyu Yang, Mingqi Gao, Feng Zheng
30 min. talk | August 9th at 11:30 | Session: Demos-9-AM2-B2
[+] More
[-] Less
Controllable video editing has demonstrated remarkable potential across diverse applications, particularly in scenarios where capturing or re-capturing real-world videos is either impractical or costly. This paper introduces a novel and efficient system named Place-Anything, which facilitates the insertion of any object into any video solely based on a picture or text description of the target object or element. The system comprises three modules: 3D generation, video reconstruction, and 3D target insertion. This integrated approach offers an efficient and effective solution for producing and editing high-quality videos by naturally inserting realistic objects. Through experiment, we demonstrate that our system can effortlessly place any object into any video using just a photograph of the object. Our demo video can be found at https://youtu.be/afXqgLLRnTE. Please also visit our project page https://place-anything.github.io to get more information.
DM8320
AUTODRAITEC: An Infrastructure-Based AUTOnomous DRiving System Using Artificial Intelligence and TEleCommunication Technologies
Zine el abidine Kherroubi, Fouzi Boukhalfa, Thierry Lestable, Carlos-Faouzi Bader
30 min. talk | August 6th at 11:30 | Session: Demos-6-AM2-B1
[+] More
[-] Less
This paper introduces AUTODRAITEC, a novel AI-based system that is deployed on the road infrastructure to control the driving of Connected and Autonomous Vehicles (CAVs). For this purpose, we present a convincing proof of concept that demonstrates the effectiveness of our solution. The system deploys a hybrid machine learning approach comprised of a supervised learning classifier to characterize the behaviors of human drivers, with a deep reinforcement learning policy to provide speed recommendations for CAVs. This system is implemented using perception sensors and an industrial computer (IPC), which are intended to be deployed on the road infrastructure. Using a 1:18 scale testbed that faithfully replicates real-world driving scenarios, we demonstrate that AUTODRAITEC improves driving safety and efficiency while preserving the traffic flow rate.
DM8322
SPARK: Harnessing Human-Centered Workflows with Biomedical Foundation Models for Drug Discovery
Bum Chul Kwon, Simona Rabinovici-Cohen, Beldine Moturi, Ruth Mwaura, Kezia Wahome, Oliver Njeru, Miguel Shinyenyi, Catherine Wanjiru, Sekou Remy, William Ogallo, Itai Guez, Partha Suryanarayanan, Joseph Morrone, Shreyans Sethi, Seung-Gu Kang, Tien Huynh, Kenney Ng, Diwakar Mahajan, Hongyang Li, Matan Ninio, Shervin Ayati, Efrat Hexter, Wendy Cornell
30 min. talk | August 8th at 11:30 | Session: Demos-8-AM2-B1
[+] More
[-] Less
Biomedical foundation models, trained on diverse sources of small molecule data, hold great potential for accelerating drug discovery. However, their complex nature often presents a barrier for researchers seeking scientific insights and drug candidate generation. SPARK addresses this challenge by providing a user-friendly, web-based interface that empowers researchers to leverage these powerful models in their scientific workflows. Through SPARK, users can specify target proteins and desired molecule properties, adjust pre-trained models for tailored inferences, generate lists of potential drug candidates, analyze and compare molecules through interactive visualizations, and filter candidates based on key metrics (e.g., toxicity). By seamlessly integrating human knowledge and biomedical AI models’ capabilities through an interactive web-based system, SPARK can improve the efficiency of collaboration between human experts and AI, thereby accelerating drug candidate discovery and ultimately leading to breakthroughs in finding cures for various diseases.
DM8326
Inside Out: Emotional Multiagent Multimodal Dialogue Systems
Andrey V. Savchenko, Lyudmila V. Savchenko
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
In this paper, we introduce the novel technological framework for the development of emotional dialogue systems. Inspired by the "Inside Out" film, we propose to use multiple emotional agents based on Large Language Models (LLMs) to prepare answers to a user query. Their answers are aggregated into a single response, taking into account the current emotional state of a user. The latter is estimated by video-based facial expression recognition (FER). We introduce several publicly available lightweight neural networks that show near state-of-the-art results on the AffectNet dataset. Qualitative examples using either GPT-3.5 or LLama2 and Mistral demonstrate that the proposed approach leads to more emotional responses in LLMs.
DM8327
O2ARC 3.0: A Platform for Solving and Creating ARC Tasks
Suyeon Shim, Dohyun Ko, Hosung Lee, Seokki Lee, Doyoon Song, Sanha Hwang, Sejin Kim, Sundong Kim
30 min. talk | August 7th at 10:00 | Session: Demos-7-AM1-B2
[+] More
[-] Less
We introduce O2ARC 3.0 interface for the Abstraction and Reasoning Corpus (ARC). O2ARC 3.0 gamifies the experience, fostering user engagement through competitive features and community-driven problem creation and evaluation. Built with a React frontend and NestJS backend, the platform provides a responsive and intuitive interface for efficient rule inference. This approach not only improves data collection for AI training but also enhances the problem-solving process, offering a scalable solution for advancing cognitive AI research. O2ARC is available at https://o2arc.com.
DM8329
Rhythm Inference Helping Writing Music Scores
François Schwarzentruber
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B2
[+] More
[-] Less
We present a new problem called rhythm inference. It consists in inferring the duration of each note and each rest from a partial specification. We then formulate rhythm inference as a constraint satisfaction problem and we use mixed linear programming to solve it. The solution is implemented for a language representing music scores, called abcd. Interestingly, the rhythm is inferred in real-time from partial musical indications.
DM8330
Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset
Wei-Yao Wang, Wei-Wei Du, Wen-Chih Peng, Tsi-Ui Ik
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
In recent years, badminton analytics has drawn attention due to the advancement of artificial intelligence and the efficiency of data collection. While there is a line of effective applications to improve and investigate player performance, there are only a few public badminton datasets that can be used by researchers outside the badminton domain. Existing badminton singles datasets focus on specific matchups; however, they cannot provide comprehensive studies on different players and various matchups. In this paper, we provide a badminton singles dataset, ShuttleSet22, which is collected from high-ranking matches in 2022. ShuttleSet22 consists of 30,172 strokes in 2,888 rallies in the training set, 1,400 strokes in 450 rallies in the validation set, and 2,040 strokes in 654 rallies in the testing set, with detailed stroke-level metadata within a rally. To benchmark existing work with ShuttleSet22, we hold a challenge, Track 2: Forecasting Future Turn-Based Strokes in Badminton Rallies, at CoachAI Badminton Challenge @ IJCAI 2023, to encourage researchers to tackle this real-world problem through innovative approaches and to summarize insights between the state-of-the-art baseline and improved techniques, exchanging inspiring ideas. The baseline codes and the dataset are made available at https://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023.
DM8336
REAVER: Real-time Earthquake Prediction with Attention-based Sliding-Window Spectrograms
Lotfy Abdel Khaliq, Sabine Janzen, Wolfgang Maass
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
Predicting earthquakes with precision remains an ongoing challenge in earthquake early warning systems (EEWS), that struggle with accuracy and fail to provide timely warnings for impending earthquakes. Recent efforts employing deep learning techniques have shown promise in overcoming these limitations. However, current methods lack the ability to capture subtle frequency changes indicative of seismic activity in real-time, limiting their effectiveness in EEWS. To address this gap, we propose REAVER, a novel approach for real-time prediction of P- and S-waves of earthquakes using attention-based sliding-window spectrograms. REAVER leverages Mel-Spectrogram signal representations to capture temporal frequency changes in seismic signals effectively. By employing an encoder-decoder architecture with attention mechanisms, REAVER accurately predicts the onset of P- and S-waves moments when an earthquake occurs. We benchmark the effectiveness of REAVER, showing its performance in terms of both accuracy and real-time prediction capabilities compared to existing methods. Additionally, we provide a web-based implementation of REAVER, allowing users to monitor seismic activity in real-time and analyze historical earthquake waveforms.
DM8346
Probabilistic Feature Matching for Fast Scalable Visual Prompting
Thomas Frick, Cezary Skura, Filip M. Janicki, Roy Assaf, Niccolo Avogaro, Daniel Caraballo, Yagmur G. Cinar, Brown Ebouky, Ioana Giurgiu, Takayuki Katsuki, Piotr Kluska, Cristiano Malossi, Haoxiang Qiu, Tomoya Sakai, Florian Scheidegger, Andrej Simeski, Daniel Yang, Andrea Bartezzaghi, Mattia Rigotti
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B2
[+] More
[-] Less
In this work, we propose a novel framework for image segmentation guided by visual prompting which leverages the power of vision foundation models. Inspired by recent advancements in computer vision, our approach integrates multiple large-scale pretrained models to address the challenges of segmentation tasks with limited and sparsely annotated data interactively provided by a user. Our method combines a frozen feature extraction backbone with a scalable and efficient probabilistic feature correspondence (soft matching) procedure derived from Optimal Transport to couple pixels between reference and target images. Moreover, a pretrained segmentation model is harnessed to translate user scribbles into reference masks and matched target pixels into output target segmentation masks. This results in a framework that we name Softmatcher, a versatile and fast training-free architecture for image segmentation by visual prompting. We demonstrate the efficiency and scalability of Softmatcher for real-time interactive image segmentation by visual prompting and showcase it in diverse visual domains including technical visual inspection use cases.
DM8347
FasterVD: On Acceleration of Video Diffusion Models
Pinrui Yu, Dan Luo, Timothy Rupprecht, Lei Lu, Zhenglun Kong, Pu Zhao, Yanyu Li, Octavia Camps, Xue Lin, Yanzhi Wang
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
Equipped with Denoising Diffusion Probabilistic Models, video content generation has gained significant research interest recently. However, diffusion pipelines call for intensive computation and model storage, which poses challenges for their wide and efficient deployment. In this work, we address this issue by integrating LCM-LoRA to reduce the denoising steps and escalating the video generation process by frame skipping and interpolation. Our framework achieves an approximately 10× inference acceleration for high-quality realistic video generation on commonly available GPUs.
DM8350
M2RL: A Multi-player Multi-agent Reinforcement Learning Framework for Complex Games
Tongtong Yu, Chenghua He, Qiyue Yin
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B1
[+] More
[-] Less
Distributed deep reinforcement learning (DDRL) has gained increasing attention due to the emerging requirements for addressing complex games like Go and StarCraft. However, how to effectively and stably train bots with asynchronous and heterogeneous agents cooperation and competition for multiple players under multiple machines (with multiple CPUs and GPUs) using DDRL is still an open problem. We propose and open M2RL, a Multi-player and Multi-agent Reinforcement Learning framework, to make training bots for complex games an easy-to-use warehouse. Experiments involving training a two-player multi-agent Wargame AI, and a sixteen-player multi-agent community game Neural MMO AI, demonstrate the effectiveness of the proposed framework by winning a silver award and beating high-level AI bots designed by professional players.
DM8354
GigaPevt: Multimodal Medical Assistant
Pavel Blinov, Konstantin Egorov, Ivan Sviridov, Nikolay Ivanov, Stepan Botman, Evgeniy Tagin, Stepan Kudin, Galina Zubkova, Andrey V. Savchenko
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.
DM8355
Reinforcement Learning for Athletic Intelligence: Lessons from the 1st “AI Olympics with RealAIGym” Competition
Felix Wiebe, Niccolò Turcato, Alberto Dalla Libera, Chi Zhang, Theo Vincent, Shubham Vyas, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres, Akhil Sathuluri, Markus Zimmermann, Boris Belousov, Jan Peters, Frank Kirchner, Shivesh Kumar
30 min. talk | August 7th at 11:30 | Session: Demos-7-AM2-B2
[+] More
[-] Less
As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. In particular, the fields of robotics and reinforcement learning (RL) are lacking in standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, we held an AI Olympics competition at IJCAI 2023 conference based on the double pendulum system in the RealAIGym project where the participants were asked to develop a controller for the swing up and stabilization task. This paper presents the methods and results from the top participating teams and provides insights into the real-world performance of RL algorithms with respect to a baseline time-varying LQR controller.
DM8356
Do You Remember the Future? Weak-to-Strong Generalization in 3D Object Detection
Alexander Gambashidze, Aleksandr Dadukin, Maxim Golyadkin, Maria Razzhivina, Ilya Makarov
30 min. talk | August 9th at 11:30 | Session: Demos-9-AM2-B2
[+] More
[-] Less
This paper demonstrates a novel method for LiDAR-based 3D object detection, addressing major field challenges: sparsity and occlusion. Our approach leverages temporal point cloud sequences to generate frames that provide comprehensive views of objects from multiple angles. To address the challenge of generating these frames in real-time, we employ Knowledge Distillation within a Teacher-Student framework, allowing the Student model to emulate the Teacher’s advanced perception. We pioneered the application of weak-to-strong generalization in computer vision by training our Teacher model on enriched, object-complete data. In this demo, we showcase the exceptional quality of labels produced by the X-Ray Teacher on object-complete frames, showing our method distilling its knowledge to enhance object 3D detection models.
DM8357
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Ethan Smith, Nayan Saxena, Aninda Saha
30 min. talk | August 9th at 10:00 | Session: Demos-9-AM1-B1
[+] More
[-] Less
Attention has been a crucial component in the success of image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048 × 2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.
DM8359
DFRP: A Dual-Track Feedback Recommendation System for Educational Resources
ChaoJun Meng, Changfan Pan, Zilong Li, Cong Zhou, Xinran Cao, Jia Zhu
30 min. talk | August 8th at 11:30 | Session: Demos-8-AM2-B2
[+] More
[-] Less
The educational disparities among different regions are remarkably significant. The educational resource platform can effectively bridge the educational capability gap between regions. Most of the existing recommendation algorithms only consider interaction history, while we argue that the dependencies between knowledge points and education-related features are crucial for education resource recommendations. To address this, we propose DFRP, an educational resource recommendation platform based on knowledge graphs(KGs) and educational scale feedback. DFRP employs a recommendation algorithm based on teaching pathways and educational dimensions to achieve accurate recommendations and active feedback on educational resources. We also provide a detailed description of the system framework and present a demonstration scenario that uses educational scales for active feedback and KGs to show knowledge point dependencies.
DM8361
RLOP: A Framework for Reinforcement Learning, Optimization and Planning Algorithms
Song Zhang
30 min. talk | August 9th at 11:30 | Session: Demos-9-AM2-B1
[+] More
[-] Less
Reinforcement learning, optimization, and planning/search are interconnected domains in artificial intelligence. Algorithms within these domains share many similarities. They complement each other in solving complex decision-making problems, and also offer opportunities for cross-disciplinary integration. However, conducting research on algorithms across these domains typically requires learning the specialized libraries. These libraries often couple algorithms with domain-specific problem classes, making it difficult to conduct cross-disciplinary researches. In order to solve this problem, we developed a generic and lightweight framework for reinforcement learning, optimization, and planning/search algorithms (RLOP). It implements only the core logic of algorithms, abstracting away domain-specific details by defining interface functions, which enables flexible customization and efficient integration across different domains. The framework has been open-sourced at https://github.com/songzhg/RLOP.
DM8362
Real-time Multi-modal Object Detection and Tracking on Edge for Regulatory Compliance Monitoring
Jia Syuen Lim, Ziwei Wang, Jiajun Liu, Abdelwahed Khamis, Reza Arablouei, Robert Barlow, Ryan McAllister
30 min. talk | August 9th at 10:00 | Session: Demos-9-AM1-B2
[+] More
[-] Less
Regulatory compliance auditing in agrifood processing facilities is crucial for upholding the highest standards of quality assurance and traceability. However, the current manual and intermittent approaches to auditing present significant challenges and risks, potentially leading to gaps or loopholes in the system. To address these shortcomings, we introduce a real-time, multi-modal sensing system that utilizes 3D time-of-flight and RGB cameras and leverages unsupervised learning techniques on edge AI devices. The proposed system enables continuous object tracking, leading to improved efficiency in record-keeping and reduced manual labor. We demonstrate the effectiveness of the system in a knife sanitization monitoring scenario, showcasing its capability to overcome occlusion and low-light performance limitations commonly encountered with conventional RGB cameras.
DM8363
XGA-Osteo: Towards XAI-Enabled Knee Osteoarthritis Diagnosis with Adversarial Learning
Hieu Phan, Loc Le, Mao Nguyen, Phat Nguyen, Sang Nguyen, Minh-Triet Tran, Tho Quan
30 min. talk | August 7th at 10:00 | Session: Demos-7-AM1-B1
[+] More
[-] Less
This research introduces XGA-Osteo, an innovative approach that leverages Explainable Artificial Intelligence (XAI) to enhance the accuracy and interpretability of knee osteoarthritis diagnosis. Recent studies have utilized AI approaches to automate the diagnosis using knee joint X-ray images. However, these studies have primarily focused on predicting the severity of osteoarthritis without providing additional information to assist doctors in their diagnoses. In addition to accurately diagnosing the severity of the condition, XGA-Osteo generates an anomaly map, produced from a reconstructed image of a healthy knee using adversarial learning. Thus, the abnormal regions in X-ray images can be highlighted, offering valuable supplementary information to medical experts during the diagnosis process.
DM8365
FD-UAD: Unsupervised Anomaly Detection Platform Based on Defect Autonomous Imaging and Enhancement
Yang Chang, Yuxuan Lin, Boyang Wang, Qing Zhao, Yan Wang, Wenqiang Zhang
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B2
[+] More
[-] Less
In industrial quality control, detecting defects is essential. However, manual checks and machine vision encounter challenges in complex conditions, as defects vary among products made of different materials and shapes. We create FD-UAD, Unsupervised Anomaly Detection Platform Based on Defect Autonomous Imaging and Enhancement. It uses multi-sensor technology, combining RGB and infrared imaging, liquid lenses for adjustable focal lengths, and uses image fusion to capture multidimensional features. The system incorporates image restoration techniques such as enhancement, deblurring, denoising, and super-resolution, alongside unsupervised anomaly detection model for enhanced accuracy. FD-UAD is successfully used in a top diesel engine manufacturer, demonstrating its value in AI-enhanced industrial applications.
DM8372
An LLM-enhanced Agent-based Simulation Tool for Information Propagation
Yuxuan Hu, Gemju Sherpa, Lan Zhang, Weihua Li, Quan Bai, Yijun Wang, Xiaodan Wang
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
Influence diffusion models are used for simulating information propagation in social networks. While most existing influence diffusion models are probabilistic, the emergence of Large Language Model (LLM) sheds light on the language-level inferences and interactions of user agents. This paper presents an LLM-enhanced Agent-based Influence Diffusion model (LAID), and a web-based visualization tool, LAIDSim, for simulating the information propagation in social networks.
DM8376
PyXAI: An XAI Library for Tree-Based Models
Gilles Audemard, Jean-Marie Lagniez, Pierre Marquis, Nicolas Szczepanski
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
PyXAI (Python eXplainable AI) is a Python library designed for providing explanations and cor- recting tree-based Machine Learning (ML) models. It is suited to decision trees, random forests, and boosted trees, when used for regression or classification tasks. In contrast to many model-agnostic approaches to XAI, PyXAI exploits the model it- self to generate explanations, ensuring them to be faithful. PyXAI includes several algorithms for the generation of explanations, which can be abductive or contrastive. PyXAI also includes algorithms for correcting tree-based models when their predictions conflict with pieces of user knowledge.
DM8387
Demo: Enhancing Wildlife Acoustic Data Annotation Efficiency through Transfer and Active Learning
Hannes Kath, Patricia P. Serafini, Ivan B. Campos, Thiago S. Gouvêa, Daniel Sonntag
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
Passive Acoustic Monitoring (PAM) has become a key technology in wildlife monitoring, generating large amounts of acoustic data. However, the effective application of machine learning methods for sound event detection in PAM datasets is highly dependent on the accessibility of annotated data, a process that can be labour intensive. As a team of domain experts and machine learning researchers, in this paper we present a no-code annotation tool designed for PAM datasets that incorporates transfer learning and active learning strategies to address the data annotation challenge inherent in PAM. Transfer learning is applied to use pre-trained models to compute meaningful embeddings from the PAM audio files. Active learning iteratively identifies the most informative samples and then presents them to the user for annotation. This iterative approach improves the performance of the model compared to random sample selection. In a preliminary evaluation of the tool, a domain expert annotated part of a real PAM data set. Compared to conventional tools, the workflow of the proposed tool showed a speed improvement of 2-4 times. Further enhancements, such as the incorporation of sound examples, have the potential to further improve efficiency.
DM8388
ComVas: Contextual Moral Values Alignment System
Inkit Padhi, Pierre Dognin, Jesus Rios, Ronny Luss, Swapnaja Achintalwar, Matthew Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, Djallel Bouneffouf
30 min. talk | August 6th at 11:30 | Session: Demos-6-AM2-B2
[+] More
[-] Less
In contemporary society, the integration of artificial intelligence (AI) systems into various aspects of daily life raises significant ethical concerns. One critical aspect is to ensure that AI systems align with the moral values of the endusers. To that end, we introduce the Contextual Moral Value Alignment System, ComVas. Unlike traditional AI systems which have moral values predefined, ComVas empowers users to dynamically select and customize the desired moral values thereby guiding the system’s decision-making process. Through a user-friendly interface, individuals can specify their preferred morals, allowing the system to steer the model’s responses and actions accordingly. ComVas utilizes advanced natural language processing techniques to engage with the users in a meaningful dialogue, understanding their preferences, and reasoning about moral dilemmas in diverse contexts. This demo article showcases the functionality of ComVas, illustrating its potential to foster ethical decision-making in AI systems while respecting individual autonomy and promoting user-centric design principles.
DM8392
Digital Avatars: Framework Development and Their Evaluation
Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote – an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visualize the responses of our LLM, and the effectiveness of our prompting strategy we propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars. This pipeline effectively captures an individual’s essence for interaction and our streaming algorithm delivers a high-quality digital avatar with real-time audio-video streaming from server to mobile device. Both our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines. In the case of our Donald Trump and Joe Biden avatars, their authenticity and favorability are rated higher than even their real-world equivalents.
DM8393
LLM-powered GraphQL Generator for Data Retrieval
Balaji Ganesan, Sambit Ghosh, Nitin Gupta, Manish Kesarwani, Sameep Mehta, Renuka Sindhgatta
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
GraphQL offers an efficient, powerful, and flexible alternative to REST APIs. However, application developers writing GraphQL clients need both technical and domain-specific expertise to reap its benefits, and avoid over-fetching or under-fetching data. Automated GraphQL generation has so far proven to be a hard problem because of complex GraphQL schema and lack of benchmark datasets. To address these issues, our work focuses on building an LLM-powered pipeline that can accept user requirements in natural language along with the complex GraphQL schema and automatically produce the GraphQL query needed to retrieve the necessary data. Automated GraphQL generation helps reduce entry barriers to application developers, broadening GraphQL adoption.
DM8396
A Framework for Centralized Traffic Routing in Urban Areas
Matyáš Švadlenka, Lukas Chrpa
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
Dealing with the ever-increasing demand for traffic management is one of the main challenges of the 21st century. The issue is much more apparent in urban areas during rush hours. Traffic congestion causes economic losses due to delays and increased fuel consumption and, on top of that, is a major health risk. Intelligent centralized traffic routing is an important concept aiming at reducing traffic congestion in urban areas by more effectively utilizing road networks. In this demo, we present a framework that, in a nutshell, integrates techniques for intelligent centralized traffic routing into the well-known SUMO simulator, so these techniques can be evaluated in realistic settings on real/realistic datasets. In particular, the framework automatically identifies “problematic” urban regions by analyzing historical traffic data, then simplifies the road networks by precomputing promising routes (for each considered traffic flow), and finally, leverages a planning-based approach to generate routes. Our framework is evaluated on a real dataset from Dublin’s metropolitan area.
DM8397
Enhancing Manufacturing with AI-powered Process Design
Gianmarco Genalti, Gabriele Corbo, Tommaso Bianchi, Marco Missaglia, Luca Negri, Andrea Sala, Luca Magri, Giacomo Boracchi, Giovanni Miragliotta, Nicola Gatti
30 min. talk | August 7th at 11:30 | Session: Demos-7-AM2-B2
[+] More
[-] Less
Manufacturing companies are experiencing a transformative journey, moving from labor-intensive processes to integrating cutting-edge technologies such as digitalization and AI. In this demo paper, we present a novel AI tool to enhance manufacturing processes. Remarkably, our work has been developed in collaboration with Agrati S.p.A., a worldwide leading company in the bolts manufacturing sector. In particular, we propose an AI-powered tool to address the problem of automatically generating the production cycle of a bolt. Currently, this decision-making task is performed by process engineers who spend several days to study, draw, and test multiple alternatives before finding the desired production cycle. We cast this task as a model-based planning problem, mapping bolt technical drawings and metal deformations to, potentially continuous, states and actions, respectively. Furthermore, we resort to computer vision tools and visual transformers to design efficient heuristics that make the search affordable in concrete applications. Agrati S.p.A.’s process engineers extensively validated our tool, and they are currently using it to support their work. To the best of our knowledge, ours is the first AI tool dealing with production cycle design in bolt manufacturing.
DM8399
NegoLog: An Integrated Python-based Automated Negotiation Framework with Enhanced Assessment Components
Anıl Doğru, Mehmet Onur Keskin, Catholijn M. Jonker, Tim Baarslag, Reyhan Aydoğan
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
The complexity of automated negotiation research calls for dedicated, user-friendly research frameworks that facilitate advanced analytics, comprehensive loggers, visualization tools, and auto-generated domains and preference profiles. This paper introduces NegoLog, a platform that provides advanced and customizable analysis modules to agent developers for exhaustive performance evaluation. NegoLog introduces an automated scenario and tournament generation tool in its Web-based user interface so that the agent developers can adjust the competitiveness and complexity of the negotiations. One of the key novelties of the NegoLog is an individual assessment of preference estimation models independent of the strategies.
DM8401
Mahjong AI Competition: Exploring AI Application in Complex Real-World Games
Yunlong Lu, Wenxin Li
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
This paper presents three Mahjong AI competitions we held at IJCAI. We briefly introduce the rule of Mahjong and its challenges to AI algorithms. By showing the results and the application of various algorithms in the competitions, we claim that existing algorithms show promising results in Mahjong, while open problems remain and more efforts are needed towards solving this complex game.
DM8402
3D-FuM: Benchmarking 3D Molecule Learning with Functional Groups
Tingwei Chen, Jianpeng Chen, Dawei Zhou
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
Molecular graph representation learning plays a crucial role in various domains, such as drug discovery and chemical reaction prediction, where molecular graphs are typically depicted as 2D topological structures. However, recent insights highlight the critical role of 3D geometric information and functional groups in accurately predicting molecular properties, aspects often neglected in existing molecular graph benchmark datasets. To bridge the research gap, we introduce a comprehensive molecular learning benchmark named 3D-FUM, which incorporates both 3D geometric information and functional groups of a large number of molecules. 3D-FUM integrates 18 state-of-the-art algorithms and 19 evaluation metrics on three molecular learning tasks, including general molecule generation, conditional molecule generation, and property predictions. 3D-FUM, for the first time, take into consideration both 3D geometric information and molecular functional groups, which enables researchers and practitioners to effectively and impartially evaluate newly proposed methods in comparison to existing baselines across diverse datasets. Furthermore, we design a user interface for user-friendly interaction and development with the benchmark for evaluation metrics selection, parameter adjustment, and leaderboard comparison. To ensure accessibility and reproducibility, we opensource our benchmark 3D-FUM and experimental results at https://3dfunctiongroupmoleculedataset.github.io/3D-FuM/#/Home.
DM8405
Integrating LLM, VLM, and Text-to-Image Models for Enhanced Information Graphics: A Methodology for Accurate and Visually Engaging Visualizations
Chao-Ting Chen, Hen-Hsen Huang
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
This study presents an innovative approach to the creation of information graphics, where the accuracy of content and aesthetic appeal are of paramount importance. Traditional methods often struggle to balance these two aspects, particularly in complex visualizations like phylogenetic trees. Our methodology integrates the strengths of Large Language Models (LLMs), Vision Language Models (VLMs), and advanced text-to-image models to address this challenge. Initially, an LLM plans the layout and structure, employing Mermaid—a JavaScript-based tool that uses Markdown-like scripts for diagramming—to establish a precise and structured foundation. This structured script is crucial for ensuring data accuracy in the graphical representation. Following this, text-to-image models are employed to enhance the vector graphic generated by Mermaid, adding rich visual elements and enhancing overall aesthetic appeal. The integration of text-to-image models is a key innovation, enabling the creation of graphics that are not only informative but also visually captivating. Finally, a VLM performs quality control, ensuring that the visual enhancements align with the informational accuracy. This comprehensive approach effectively combines the accuracy of structured data representation, the creative potential of text-to-image models, and the validation capabilities of VLMs. The result is a new standard in information graphic creation, suitable for diverse applications ranging from education to scientific communication, where both information integrity and visual engagement are essential.
DM8407
CVAT-BWV: A Web-Based Video Annotation Platform for Police Body-Worn Video
Parsa Hejabi, Akshay Kiran Padte, Preni Golazizian, Rajat Hebbar, Jackson Trager, Georgios Chochlakis, Aditya Kommineni, Ellie Graeden, Shrikanth Narayanan, Benjamin A.T. Graham, Morteza Dehghani
30 min. talk | August 8th at 10:00 | Session: Demos-8-AM1-B2
[+] More
[-] Less
We introduce an open-source platform for annotating body-worn video (BWV) footage aimed at enhancing transparency and accountability in policing. Despite the widespread adoption of BWVs in police departments, analyzing the vast amount of footage generated has presented significant challenges. This is primarily due to resource constraints, the sensitive nature of the data, which limits widespread access, and consequently, lack of annotations for training machine learning models. Our platform, called CVAT-BWV, offers a secure, locally hosted annotation environment that integrates several AI tools to assist in annotating multimodal data. With features such as automatic speech recognition, speaker diarization, object detection, and face recognition, CVAT-BWV aims to reduce the manual annotation workload, improve annotation quality, and allow for capturing perspectives from a diverse population of annotators. This tool aims to streamline the collection of annotations and the building of models, enhancing the use of BWV data for oversight and learning purposes to uncover insights into police-civilian interactions.
DM8408
SmartTransit.AI: A Dynamic Paratransit and Microtransit Application
Sophie Pavia, David Rogers, Amutheezan Sivagnanam, Michael Wilbur, Danushka Edirimanna, Youngseo Kim, Ayan Mukhopadhyay, Philip Pugliese, Samitha Samaranayake, Aron Laszka, Abhishek Dubey
30 min. talk | August 8th at 10:00 | Session: Demos-8-AM1-B1
[+] More
[-] Less
New rideshare and shared mobility services have transformed urban mobility in recent years. Such services have the potential to improve efficiency and reduce costs by allowing users to share rides in high-capacity vehicles and vans. Most transit agencies already operate various ridepooling services, including microtransit and paratransit. However, the objectives and constraints for implementing these services vary greatly between agencies and can be challenging. First, off-the-shelf ridepooling formulations must be adapted for real-world conditions and constraints. Second, the lack of modular and reusable software makes it hard to implement and evaluate new ridepooling algorithms and approaches in real-world settings. We demonstrate a modular on-demand public transportation scheduling software for microtransit and paratransit services. The software is aimed at transit agencies looking to incorporate state-of-the-art rideshare and ridepooling algorithms in their everyday operations. We provide management software for dispatchers and mobile applications for drivers and users and conclude with results from the demonstration in Chattanooga, TN.
DM8409
SaGol: Using MiniGPT-4 to Generate Alt Text for Improving Image Accessibility
Yunseo Moon, Hyunmin Lee, SeungYoung Oh, Hyunggu Jung
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B1
[+] More
[-] Less
SaGol is an AI-powered application to improve image accessibility for people with visual impairments (PVI) users. Alternative (alt) text, a general method of web accessibility for PVI users, is text or phrases that describe images on a website in an understandable way. SaGol generates alt text with the images on the user’s smartphone using a vision large language model called MiniGPT-4. SaGol searches for similar images based on the generated alt text. We evaluated the length of alt text and the search accuracy. This paper shows a potential opportunity to improve image accessibility for PVI users.
DM8411
AI-Olympics: Exploring the Generalization of Agents through Open Competitions
Chen Wang, Yan Song, Shuai Wu, Sa Wu, Ruizhi Zhang, Shu Lin, Haifeng Zhang
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
Between 2021 and 2023, AI-Olympics—a series of online AI competitions, was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights notable findings. We aim to contribute insights to the field of multi-agent decision-making and explore the generalization of agents through engineering efforts.
DM8413
Oasis: Data Curation and Assessment System for Pretraining of Large Language Models
Tong Zhou, Yubo Chen, Pengfei Cao, Kang Liu, Shengping Liu, Jun Zhao
30 min. talk | August 8th at 10:00 | Session: Demos-8-AM1-B2
[+] More
[-] Less
Data is one of the most critical elements in building a large language model. However, existing systems either fail to customize a corpus curation pipeline or neglect to leverage comprehensive corpus assessment for iterative optimization of the curation. To this end, we present a pretraining corpus curation and assessment platform called Oasis — a one-stop system for data quality improvement and quantification with user-friendly interactive interfaces. Specifically, the interactive modular rule filter module can devise customized rules according to explicit feedback. The debiased neural filter module builds the quality classification dataset in a negative-centric manner to remove the undesired bias. The adaptive document deduplication module could execute large-scale deduplication with limited memory resources. These three parts constitute the customized data curation module. And in the holistic data assessment module, a corpus can be assessed in local and global views, with three evaluation means including human, GPT-4, and heuristic metrics. We exhibit a complete process to use Oasis for the curation and assessment of pretraining data. In addition, an 800GB bilingual corpus curated by Oasis is publicly released.
DM8416
iFakeDetector: Real Time Integrated Web-based Deepfake Detection System
Kangjun Lee, Inho Jung, Simon S. Woo
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B1
[+] More
[-] Less
Recently, deepfake detection research has been actively conducted. While many deepfake detectors have been proposed, validating the practicality of such systems against real world settings has not been explored much. Indeed, there are some gaps and disparities when they are applied in the real world. In this work, we developed a real time integrated web-based deepfake detection system, iFakeDetector, which incorporates the recent high performing deepfake detectors, and enables easy access for non-expert users to evaluate deepfake videos. Our system takes a deepfake video as input, allowing users to upload videos and select different detectors, and provides detection results on whether the uploaded video is a deepfake or not. Also, we provide an analysis tool that enables the video to be analyzed on a frame-by-frame basis with the probability of each frame being manipulated. Finally, we tested and deployed iFakeDetector in a real world scenario to verify its practicality and feasibility.
DM8419
Design of a Data-driven Intervention Dashboard for SDG Localization
Pooja Bassin, Abraham G K, Srinath Srinivasa
30 min. talk | August 7th at 10:00 | Session: Demos-7-AM1-B1
[+] More
[-] Less
The localization problem of the United Nations Sustainable Development Goals (SDGs) involves adopting strategies that are in tune with local conditions, to achieve a given SDG target. However, even within a given region, localized conditions may vary drastically. With increasing amounts of Open Government Data (OGD) being available, there is an opportunity to systematically address the localization problem by using predictive and prescriptive modeling techniques. This work presents a predictive and prescriptive modeling dashboard for the SDG indicator maternal deaths (MD) for the Indian state of Karnataka. The dashboard was created by examining a vast set of data points to focus on four factors that showed high correlations with that of MD. We then construct a multivariate linear regression model to showcase the differential impact that a given factor has on the indicator and identify prescribed values for different factors to achieve a given target value of the indicator. Finally, a budget allocation dashboard is also provided that helps policymakers allocate budgets to specific schemes to help operationalize these changes. This dashboard was built by combining data coming from five different OGD sources.
DM8421
Plug-and-Play Unsupervised Fault Detection and Diagnosis for Complex Industrial Monitoring
Maksim Golyadkin, Maria Shtark, Petr Ivanov, Alexander Kozhevnikov, Leonid Zhukov, Ilya Makarov
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B1
[+] More
[-] Less
Today industrial facilities are equipped with lots of sensors throughout all the production line for monitoring means. Gathered data can be used to detect and predict failures; however, manual labeling of large amounts of data for supervised learning is complicated. This paper introduces an innovative approach to unsupervised fault detection and diagnosis tailored for monitoring industrial chemical processes. We showcase the efficacy of our model using two publicly accessible datasets from the Tennessee Eastman Process, each containing various faults. Furthermore, we illustrate that by fine-tuning the model on a limited amount of labeled data, it achieves performance close to that of a state-of-the-art model trained on the entire dataset.
DM8422
Upgrading Search Applications in the Era of LLMs: A Demonstration with Practical Lessons
Shuang Yu, Nirandika Wanigasekara, Jeff Tan, Kent Fitch, Jaydeep Sen
30 min. talk | August 9th at 10:00 | Session: Demos-9-AM1-B1
[+] More
[-] Less
While traditional search systems have mostly been satisfactorily relying on lexical based sparse retrievers such as BM25, recent research advances in neural models, the current day large language models (LLMs) hold good promise for practical search applications as well. In this work, we discuss a collaboration between IBM and National Library of Australia to upgrade an existing search application (referred to as NLA) over terabytes of Australian Web Archive data and serving thousands of daily users. We posit and demonstrate both empirically and through qualitative user studies that LLMs and neural models can indeed provide good gains, when combined effectively with traditional search. We believe this demonstration will show the unique challenges associated with real world practical deployments and also offer valuable insights into how to effectively upgrade legacy search applications in the era of LLMs.
DM8423
IntEr-HRI Competition: Intrinsic Error Evaluation during Human – Robot Interaction
Kartik Chari, Niklas Kueper, Su Kyoung Kim, Frank Kirchner, Elsa Andrea Kirchner
30 min. talk | August 8th at 11:30 | Session: Demos-8-AM2-B1
[+] More
[-] Less
Reliable detection of human intentions from electroencephalogram (EEG) to improve human-robot interaction (HRI) has recently gained significant importance. To ensure safe and satisfactory interactions, implicit detection of erroneous behavior of robotic systems, particularly assistive devices, is essential. This can be achieved by detecting error-related potentials (ErrPs) in EEG, evoked by visual, tactile, or visuo-tactile stimuli. Of these, the ErrPs evoked tactilely with the help of a robot remains unexplored and has been the main focus of this competition. The task for participating teams was to develop robust AI models for continuous real-time classification of erroneous behavior of assistive robotic devices from the human EEG. Even though the competition results prove its feasibility, a performance gap (balanced accuracy and computation time) of more than 10% was observed between the offline and online classification of errors in real-world scenarios. In addition to the competitive AI models developed by the participating teams, this competition also contributed towards a one-of-its-kind open-access EEG and EMG dataset, a lossless live streaming solution for EEG data, and a novel quantitative metric for benchmarking online asynchronous EEG detection solutions.
DM8424
Towards a Resilient Intelligent Automation System
Segev Shlomov, Sami Marreed, Avi Yaeli
30 min. talk | August 6th at 11:30 | Session: Demos-6-AM2-B1
[+] More
[-] Less
Intelligent Process Automation (IPA) solutions must adapt to changes in user interfaces autonomously, without manual intervention. Addressing this critical challenge and aiming to advance the state-of-the-art, last year we introduced the IPA Challenge competition at IJCAI 2023. This demo paper presents IDA – our novel UI automation solution, developed to tackle complex resiliency issues. Leveraging the capabilities of large language models and employing grounded instructions, our system demonstrates a significant advancement towards resilient IPA. We provide an overview of the IPA Challenge, detail the architecture of our system, and illustrate its effectiveness in overcoming the resiliency challenges. A link to the demo video can be found at: https://youtu.be/G5nI3V9Umjc
DM8425
Using Large Language Models and Recruiter Expertise for Optimized Multilingual Job Offer – Applicant CV Matching
Hamit Kavas, Marc Serra-Vidal, Leo Wanner
30 min. talk | August 8th at 11:30 | Session: Demos-8-AM2-B2
[+] More
[-] Less
In the context of the increasingly globalised economy and labour market, recruitment agencies face the challenge to deal with a magnitude of job offers and job applications written in a variety of languages, formats, and styles. Quite often, this leads to a suboptimal evaluation of the CVs of job seekers with respect to their relevance to a job offer. To address this challenge, we propose an interactive system that follows the “human-in-the-loop” approach, actively involving recruiters in the job offer — applicant CV matching. The system uses a fine-tuned state-of-the-art classification model that aligns job seeker CVs with labels of the {\it European Skills, Competences, Qualifications and Occupations} taxonomy to propose an initial match between job offers with the CVs of job candidates. This match is refined in sequential LLM driven-interaction with the recruiter, which culminates in CV relevance scores and reports that justify them.
DM8426
ProMoAI: Process Modeling with Generative AI
Humam Kourani, Alessandro Berti, Daniel Schuster, Wil M.P. van der Aalst
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B2
[+] More
[-] Less
ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the generated model, which is then used for refining the process model. ProMoAI utilizes the capabilities LLMs to offer a novel, AI-driven approach to process modeling, significantly reducing the barrier to entry for users without deep technical knowledge in process modeling.
DM8427
AESim: A Data-Driven Aircraft Engine Simulator
Abdellah Madane, Florent Forest, Hanane Azzag, Mustapha Lebbah, Jérôme Lacaille
30 min. talk | August 6th at 15:00 | Session: Demos-6-PM-B2
[+] More
[-] Less
We present AESim, a data-driven Aircraft Engine Simulator developed using transformer-based conditional generative adversarial networks. AESim generates samples of aircraft engine sensor measurements over full flights, conditioned on a given flight mission profile representing the flight conditions. It constitutes an essential tool in aircraft engine digital twins, capable of simulating their performance for different flight missions. It allows for comparison of the behavior of different engines under the same operational conditions, simulation of various scenarios for a given engine, facilitating applications like engine behavior analysis, performance limit identification, and optimization of maintenance schedules within a global Prognostics and Health Management (PHM) strategy. It also allows the imputation of missing flight data and addresses confidentiality concerns by generating synthetic flight datasets that can be shared for public research purposes or data challenges.
DM8428
Artificial Intelligence-Driven Video Indexing for Rapid Surveillance Footage Summarization and Review
Jaemin Jung, Soonyong Park, Harim Kim, Changha Lee, Charmgil Hong
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B2
[+] More
[-] Less
This paper introduces VIDEX, an advanced tool designed to streamline the analysis of surveillance video through a user-friendly interface. VIDEX achieves high development efficiency and maintainability utilizing the Model-View-ViewModel (MVVM) design pattern. The core feature of VIDEX is a footage summary using object detection and anomaly detection. Its architecture ensures efficient data management by organizing detected objects and anomalies within an indexed database, thus facilitating a more rapid review process. Additionally, multi-threading was used to shorten the processing time. VIDEX provides a video summarization that can be used primarily in the criminal investigation stage using the information stored in a database. Discover more about VIDEX and access its resources at https://github.com/nth221/videx.
DM8430
KnowledgeHub: An End-to-End Tool for Assisted Scientific Discovery
Shinnosuke Tanaka, James Barry, Vishnudev Kuruvanthodi, Movina Moses, Maxwell J. Giammona, Nathan Herr, Mohab Elkaref, Geeth de Mel
30 min. talk | August 8th at 15:00 | Session: Demos-8-PM-B1
[+] More
[-] Less
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. Named Entity Recognition (NER) and Relation Classification (RC) models can be trained on the resulting annotations and can be used to annotate the unannotated portion of the documents. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data. Furthermore, we integrate a suite of Large Language Models (LLMs) that can be used for QA and summarisation that is grounded in the included documents. KnowledgeHub is a unique tool that supports annotation, IE and QA, which gives the user full insight into the knowledge discovery pipeline.
DM8432
NEGOTIATOR: A Comprehensive Framework for Human-Agent Negotiation Integrating Preferences, Interaction, and Emotion
Mehmet Onur Keskin, Berk Buzcu, Berkecan Koçyiğit, Umut Çakan, Anıl Doğru, Reyhan Aydoğan
30 min. talk | August 7th at 15:00 | Session: Demos-7-PM-B1
[+] More
[-] Less
The paper introduces a comprehensive human-agent negotiation framework designed to facilitate the development and evaluation of research studies on human-agent negotiation without building each component from scratch. Leveraging the interoperability and reusability of its components, this framework offers various functionalities, including speech-to-text conversion, emotion recognition, a repository of negotiation strategies, and an interaction manager capable of managing gestures designed for Nao, Pepper, and QT, and coordinating message exchanges in a turn-taking fashion. This framework aims to lower the entry barrier for researchers in human-agent negotiation by providing a versatile platform that supports a wide range of research directions, including affective computing, natural language processing, decision-making, and non-verbal communication.
DM8433
AADMIP: Adversarial Attacks and Defenses Modeling in Industrial Processes
Vitaliy Pozdnyakov, Aleksandr Kovalenko, Ilya Makarov, Mikhail Drobyshevskiy, Kirill Lukyanov
30 min. talk | August 8th at 10:00 | Session: Demos-8-AM1-B1
[+] More
[-] Less
The development of the smart manufacturing trend includes the integration of Artificial Intelligence technologies into industrial processes. One example of such implementation is deep learning models that diagnose the current state of a technological process. Recent studies have demonstrated that small data perturbations, named adversarial attacks, can significantly affect the correct predictions of such models. This fact is critical in industrial systems, where AI-based decisions can be made to manage physical equipment. In this work, we present a system which can help to evaluate the robustness of technological process diagnosis models to adversarial attacks, as well as consider protection options. We briefly review the system’s modules and also consider some useful applications. Our demo video is available at: http://tinyurl.com/3by9zcj5
DM8439
An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex Annotations
Jonathan Erskine, Matt Clifford, Alexander Hepburn, Raul Santos Rodriguez
30 min. talk | August 9th at 10:00 | Session: Demos-9-AM1-B2
[+] More
[-] Less
Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset. Finally we discuss the challenges in future extensions of this work.