Publications | Jiashu Xu 徐家澍

2024

NAACL Oral
Instructional Fingerprinting of Large Language Models

Jiashu Xu, Fei Wang* , Mingyu Derek Ma*, Pang Wei Koh, Chaowei Xiao, and Muhao Chen

In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , 2024

Oral Abs arXiv Bib HTML PDF Code

NAACL 2024 Oral

The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (\eg restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
@inproceedings{xu2024instructional, title = {Instructional Fingerprinting of Large Language Models}, author = {Xu, Jiashu and Wang, Fei and Ma, Mingyu Derek and Koh, Pang Wei and Xiao, Chaowei and Chen, Muhao}, year = {2024}, booktitle = {Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, abbr2 = {Oral}, eprint = {2401.12255}, archiveprefix = {arXiv}, primaryclass = {cs.CR} }
ECCV Under Review
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

Brian Nlong Zhao , Yuhang Xiao*, Jiashu Xu*, Xinyang Jiang, Yifan Yang , Dongsheng Li, Laurent Itti, Yunhao Ge, and Vibhav Vineet

In review of The European Conference on Computer Vision (ECCV), 2024

Abs arXiv Bib HTML Code

The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment.
@article{zhao2023dream, title = {DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models}, author = {Zhao, Brian Nlong and Xiao, Yuhang and Xu, Jiashu and Jiang, Xinyang and Yang, Yifan and Li, Dongsheng and Itti, Laurent and Ge, Yunhao and Vineet, Vibhav}, journal = {In review of The European Conference on Computer Vision (ECCV)}, abbr2 = {Under Review}, year = {2024} }
CVPR Highlight
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge*, Yihe Tang*, Jiashu Xu*, Cem Gokmen* , Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti , Yunzhu Li, Roberto Martin-Martin , Miao Liu, Pengchuan Zhang , Ruohan Zhang, Li Fei-Fei, and Jiajun Wu

In Conference on Computer Vision and Pattern Recognition (CVPR) , 2024

Highlight Abs arXiv Bib HTML PDF Code

CVPR 2024 Highlight

The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction.
@inproceedings{ge2024behavior, title = {BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation}, author = {Ge, Yunhao and Tang, Yihe and Xu, Jiashu and Gokmen, Cem and Li, Chengshu and Ai, Wensi and Martinez, Benjamin Jose and Aydin, Arman and Anvari, Mona and Chakravarthy, Ayush K and Yu, Hong-Xing and Wong, Josiah and Srivastava, Sanjana and Lee, Sharon and Zha, Shengxin and Itti, Laurent and Li, Yunzhu and Martin-Martin, Roberto and Liu, Miao and Zhang, Pengchuan and Zhang, Ruohan and Fei-Fei, Li and Wu, Jiajun}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, abbr2 = {Highlight}, }
COLM Under Review
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Wenjie Mo, Jiashu Xu, Qin Liu , Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen

In review of Conference on Language Modeling (COLM), 2024

Abs arXiv Bib

Existing studies in backdoor defense have predominantly focused on the training phase, overlooking the critical aspect of testing time defense. This gap becomes particularly pronounced in the context of Large Language Models (LLMs) deployed as Web Services, which typically offer only black-box access, rendering training-time defenses impractical. To bridge this gap, our work introduces defensive demonstrations, an innovative backdoor defense strategy for blackbox large language models. Our method involves identifying the task and retrieving task-relevant demonstrations from an uncontaminated pool. These demonstrations are then combined with user queries and presented to the model during testing, without requiring any modifications/tuning to the black-box model or insights into its internal mechanisms. Defensive demonstrations are designed to counteract the adverse effects of triggers, aiming to recalibrate and correct the behavior of poisoned models during test-time evaluations. Extensive experiments show that defensive demonstrations are effective in defending both instance-level and instruction-level backdoor attacks, not only rectifying the behavior of poisoned models but also surpassing existing baselines in most scenarios.
@article{mo2023testtime, title = {Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations}, author = {Mo, Wenjie and Xu, Jiashu and Liu, Qin and Wang, Jiongxiao and Yan, Jun and Xiao, Chaowei and Chen, Muhao}, year = {2024}, abbr2 = {Under Review}, journal = {In review of Conference on Language Modeling (COLM)}, }
NAACL
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models

Jiashu Xu , Mingyu Derek Ma, Fei Wang, Chaowei Xiao, and Muhao Chen

In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , 2024

Abs arXiv Bib HTML PDF

Instruction-tuned models are trained on crowdsourcing datasets with task instructions to achieve superior performance. However, in this work we raise security concerns about this training paradigm. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions among thousands of gathered data and control model behavior through data poisoning, without even the need of modifying data instances or labels themselves. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly used NLP datasets, and cause persistent backdoors that are easily transferred to 15 diverse datasets zero-shot. In this way, the attacker can directly apply poisoned instructions designed for one dataset on many other datasets. Moreover, the poisoned model cannot be cured by continual learning. Lastly, instruction attacks show resistance to existing inference-time defense. These findings highlight the need for more robust defenses against data poisoning attacks in instructiontuning models and underscore the importance of ensuring data quality in instruction crowdsourcing.
@inproceedings{xu2023instructions, title = {Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models}, author = {Xu, Jiashu and Ma, Mingyu Derek and Wang, Fei and Xiao, Chaowei and Chen, Muhao}, booktitle = {Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2024}, }

2023

ACL Oral
Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Jiashu Xu , Mingyu Derek Ma, and Muhao Chen

In Association for Computational Linguistics (ACL) , Jul 2023

Oral Abs arXiv Bib PDF Code

ACL 2023 Oral

Two key obstacles in biomedical relation extraction (RE) are the scarcity of annotations and the prevalence of instances without explicitly pre-defined labels due to low annotation coverage. Existing approaches, which treat biomedical RE as a multi-class classification task, often result in poor generalization in low-resource settings and do not have the ability to make selective prediction on unknown cases but give a guess from seen relations, hindering the applicability of those approaches. We present NBR, which converts biomedical RE as natural language inference formulation through indirect supervision. By converting relations to natural language hypotheses, NBR is capable of exploiting semantic cues to alleviate annotation scarcity. By incorporating a ranking-based loss that implicitly calibrates abstinent instances, NBR learns a clearer decision boundary and is instructed to abstain on uncertain instances. Extensive experiments on three widely-used biomedical RE benchmarks, namely ChemProt, DDI and GAD, verify the effectiveness of NBR in both full-set and low-resource regimes. Our analysis demonstrates that indirect supervision benefits biomedical RE even when a domain gap exists, and combining NLI knowledge with biomedical knowledge leads to the best performance gains.
@inproceedings{xu-etal-2023-nli, title = {Can {NLI} Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?}, author = {Xu, Jiashu and Ma, Mingyu Derek and Chen, Muhao}, booktitle = {Association for Computational Linguistics (ACL)}, month = jul, abbr2 = {Oral}, year = {2023}, doi = {10.18653/v1/2023.acl-long.138}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.acl-long.138}, pages = {2450--2467}, }
Arxiv Extension
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation

Yunhao Ge*, Jiashu Xu*, Brian Nlong Zhao, Neel Joshi, Laurent Itti, and Vibhav Vineet

arXiv preprint arXiv:2309.05956, Jul 2023

Abs arXiv Bib Code

We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-to-image synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach1 decouples training data generation into foreground object generation, and contextually coherent background generation. To generate foreground objects, we employ a straightforward textual template, incorporating the object class name as input prompts. This is fed into a text-to-image synthesis framework, producing various foreground images set against isolated backgrounds. A foreground-background segmentation algorithm is then used to generate foreground object masks. To generate context images, we begin by creating language descriptions of the context. This is achieved by applying an image captioning method to a small set of images representing the desired context. These textual descriptions are then transformed into a diverse array of context images via a text-to-image synthesis framework. Subsequently, we composite these with the foreground object masks produced in the initial step, utilizing a cut-and-paste method, to formulate the training data. We demonstrate the advantages of our approach on five object detection and segmentation datasets, including Pascal VOC and COCO. We found that detectors trained solely on synthetic data produced by our method achieve performance comparable to those trained on real data (Fig. 1). Moreover, a combination of real and synthetic data yields even much better results. Further analysis indicates that the synthetic data distribution complements the real data distribution effectively. Additionally, we emphasize the compositional nature of our data generation approach in out-of-distribution and zero-shot data generation scenarios. We open-source our code at https://github.com/gyhandy/Text2Image-for-Detection
@article{ge2023beyond, title = {Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation}, author = {Ge, Yunhao and Xu, Jiashu and Zhao, Brian Nlong and Joshi, Neel and Itti, Laurent and Vineet, Vibhav}, journal = {arXiv preprint arXiv:2309.05956}, abbr2 = {Extension}, year = {2023} }

2022

Arxiv
Dall-e for detection: Language-driven context image synthesis for object detection

Yunhao Ge*, Jiashu Xu*, Brian Nlong Zhao, Laurent Itti, and Vibhav Vineet

arXiv preprint, Jul 2022

Abs arXiv Bib Code

We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-toimage synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach decouples training data generation into foreground object mask generation and background (context) image generation. For foreground object mask generation, we use a simple textual template with object class name as input to DALL-E to generate a diverse set of foreground images. A foreground-background segmentation algorithm is then used to generate foreground object masks. Next, in order to generate context images, first a language description of the context is generated by applying an image captioning method on a small set of images representing the context. These language descriptions are then used to generate diverse sets of context images using the DALL-E framework. These are then composited with object masks generated in the first step to provide an augmented training set for a classifier. We demonstrate the advantages of our approach on four object detection datasets including on Pascal VOC and COCO object detection tasks. Furthermore, we also highlight the compositional nature of our data generation approach on out-of-distribution and zero-shot data generation scenarios.
@article{ge2022dall, title = {Dall-e for detection: Language-driven context image synthesis for object detection}, author = {Ge, Yunhao and Xu, Jiashu and Zhao, Brian Nlong and Itti, Laurent and Vineet, Vibhav}, journal = {arXiv preprint}, year = {2022}, }
ICMI
X-Norm: Exchanging Normalization Parameters for Bimodal Fusion

Yufeng Yin*, Jiashu Xu*, Tianxin Zu, and Mohammad Soleymani

In Proceedings of the 2022 International Conference on Multimodal Interaction (ICMI) , Jul 2022

Abs Bib PDF

Multimodal learning aims to process and relate information from different modalities to enhance the model’s capacity for perception. Current multimodal fusion mechanisms either do not align the feature spaces closely or are expensive for training and inference. In this paper, we present X-Norm, a novel, simple and efficient method for bimodal fusion that generates and exchanges limited but meaningful normalization parameters between the modalities implicitly aligning the feature spaces. We conduct extensive experiments on two tasks of emotion and action recognition with different architectures including Transformer-based and CNN-based models using IEMOCAP and MSP-IMPROV for emotion recognition and EPIC-KITCHENS for action recognition. The experimental results show that X-Norm achieves comparable or superior performance compared to the existing methods including early and late fusion, Gradient-Blending (G-Blend), Tensor Fusion Network, and Multimodal Transformer, with a relatively low training cost.
@inproceedings{yin2022x, title = {X-Norm: Exchanging Normalization Parameters for Bimodal Fusion}, author = {Yin, Yufeng and Xu, Jiashu and Zu, Tianxin and Soleymani, Mohammad}, booktitle = {Proceedings of the 2022 International Conference on Multimodal Interaction (ICMI)}, pages = {605--614}, year = {2022} }
ECCV
Neural-Sim: Learning to Generate Training Data with NeRF

Yunhao Ge, Harkirat Behl*, Jiashu Xu*, Suriya Gunasekar, Neel Joshi, Yale Song , Xin Wang, Laurent Itti, and Vibhav Vineet

In European Conference on Computer Vision (ECCV) , Jul 2022

Abs arXiv Bib PDF Code

Training computer vision models usually requires collecting and labeling vast amounts of imagery under a diverse set of scene configurations and properties. This process is incredibly time-consuming, and it is challenging to ensure that the captured data distribution maps well to the target domain of an application scenario. Recently, synthetic data has emerged as a way to address both of these issues. However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain. We present the first fully differentiable synthetic data pipeline that uses Neural Radiance Fields (NeRFs) in a closed-loop with a target application’s loss function. Our approach generates data on-demand, with no human labor, to maximize accuracy for a target task. We illustrate the effectiveness of our method on synthetic and real-world object detection tasks. We also introduce a new "YCB-in-the-Wild" dataset and benchmark that provides a test scenario for object detection with varied poses in real-world environments.
@inproceedings{ge2022neural, title = {Neural-Sim: Learning to Generate Training Data with NeRF}, author = {Ge, Yunhao and Behl, Harkirat and Xu, Jiashu and Gunasekar, Suriya and Joshi, Neel and Song, Yale and Wang, Xin and Itti, Laurent and Vineet, Vibhav}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} }
NAACL
Unified Semantic Typing with Meaningful Label Inference

James Y. Huang, Bangzheng Li*, Jiashu Xu*, and Muhao Chen

In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , Jul 2022

Abs arXiv Bib PDF Code

Semantic typing aims at classifying tokens or spans of interest in a textual context into semantic categories such as relations, entity types, and event types. The inferred labels of semantic categories meaningfully interpret how machines understand components of text. In this paper, we present UniST, a unified framework for semantic typing that captures label semantics by projecting both inputs and labels into a joint semantic embedding space. To formulate different lexical and relational semantic typing tasks as a unified task, we incorporate task descriptions to be jointly encoded with the input, allowing UniST to be adapted to different tasks without introducing task-specific model components. UniST optimizes a margin ranking loss such that the semantic relatedness of the input and labels is reflected from their embedding similarity. Our experiments demonstrate that UniST achieves strong performance across three semantic typing tasks: entity typing, relation classification and event typing. Meanwhile, UniST effectively transfers semantic knowledge of labels and substantially improves generalizability on inferring rarely seen and unseen types. In addition, multiple semantic typing tasks can be jointly trained within the unified framework, leading to a single compact multi-tasking model that performs comparably to dedicated single-task models, while offering even better transferability.
@inproceedings{huang-etal-2022-unified, title = {Unified Semantic Typing with Meaningful Label Inference}, author = {Huang, James Y. and Li, Bangzheng and Xu, Jiashu and Chen, Muhao}, booktitle = {Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2022}, }
NPJ Digit Med
Dissection Gesture Sequence during Nerve Sparing Predicts Erectile Function Recovery after Robot-Assisted Radical Prostatectomy

Runzhuo Ma, Jiashu Xu, Ivan Rodriguez, Gina DeMeo, Aditya Desai, Loc Trinh, Jessica Nguyen, Anima Anandkumar, Jim Hu, and Andrew Hung

NPJ Digit Medicine, Jul 2022

Abs Bib HTML PDF

How well a surgery is performed impacts a patient’s outcomes; however, objective quantification of performance remains an unsolved challenge. Deconstructing a procedure into discrete instrument-tissue “gestures” is a emerging way to understand surgery. To establish this paradigm in a procedure where performance is the most important factor for patient outcomes, we identify 34,323 individual gestures performed in 80 nerve-sparing robot-assisted radical prostatectomies from two international medical centers. Gestures are classified into nine distinct dissection gestures (e.g., hot cut) and four supporting gestures (e.g., retraction). Our primary outcome is to identify factors impacting a patient’s 1-year erectile function (EF) recovery after radical prostatectomy. We find that less use of hot cut and more use of peel/push are statistically associated with better chance of 1-year EF recovery. Our results also show interactions between surgeon experience and gesture types—similar gesture selection resulted in different EF recovery rates dependent on surgeon experience. To further validate this framework, two teams independently constructe distinct machine learning models using gesture sequences vs. traditional clinical features to predict 1-year EF. In both models, gesture sequences are able to better predict 1-year EF (Team 1: AUC 0.77, 95% CI 0.73–0.81; Team 2: AUC 0.68, 95% CI 0.66–0.70) than traditional clinical features (Team 1: AUC 0.69, 95% CI 0.65–0.73; Team 2: AUC 0.65, 95% CI 0.62–0.68). Our results suggest that gestures provide a granular method to objectively indicate surgical performance and outcomes. Application of this methodology to other surgeries may lead to discoveries on methods to improve surgery.
@article{ma2021dss, title = {Dissection Gesture Sequence during Nerve Sparing Predicts Erectile Function Recovery after Robot-Assisted Radical Prostatectomy}, author = {Ma, Runzhuo and Xu, Jiashu and Rodriguez, Ivan and DeMeo, Gina and Desai, Aditya and Trinh, Loc and Nguyen, H., Jessica and Anandkumar, Anima and Hu, C., Jim and Hung, J., Andrew}, journal = {NPJ Digit Medicine}, year = {2022}, }
AUA
Dissection Assessment for Robotic Technique (DART) to Evaluate Nerve-Spare of Robot-Assisted Radical Prostatectomy

Runzhuo Ma, Alvin Hui, Jiashu Xu, Aditya Desai, Michael Tzeng, Emily Cheng, Loc Trinh, Jessica Nguyen, Anima Anandkumar, Jim Hu, and Andrew Hung

American Urological Association Annual Conference (AUA), Jul 2022

Abs Bib HTML PDF

High quality nerve-spare (NS) is essential for the preservation of erectile function (EF) after robot-assisted radical prostatectomy (RARP). In a previous study, we developed an assessment tool for tissue dissection, Dissection Assessment for Robotic Technique (DART). Herein, we further apply DART scores to the NS step and evaluate whether DART can predict 1-year EF recovery after RARP.
@article{ma2021dart, title = {Dissection Assessment for Robotic Technique (DART) to Evaluate Nerve-Spare of Robot-Assisted Radical Prostatectomy}, author = {Ma, Runzhuo and Hui, Alvin and Xu, Jiashu and Desai, Aditya and Tzeng, Michael and Cheng, Emily and Trinh, Loc and Nguyen, H., Jessica and Anandkumar, Anima and Hu, C., Jim and Hung, J., Andrew}, journal = {American Urological Association Annual Conference (AUA)}, year = {2022}, }

2021

NeurIPS
SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning

Aaron Chan, Jiashu Xu, Boyuan Long, Soumya Sanyal, Tanishq Gupta, and Xiang Ren

Advances in Neural Information Processing Systems (NeurIPS), Jul 2021

Abs arXiv Bib PDF Code

Augmenting pre-trained language models with knowledge graphs (KGs) has achieved success on various commonsense reasoning tasks. However, for a given task instance, the KG, or certain parts of the KG, may not be useful. Although KG-augmented models often use attention to focus on specific KG components, the KG is still always used, and the attention mechanism is never explicitly taught which KG components should be used. Meanwhile, saliency methods can measure how much a KG feature (e.g., graph, node, path) influences the model to make the correct prediction, thus explaining which KG features are useful. This paper explores how saliency explanations can be used to improve KG-augmented models’ performance. First, we propose to create coarse (Is the KG useful?) and fine (Which nodes/paths in the KG are useful?) saliency explanations. Second, to motivate saliency-based supervision, we analyze oracle KG-augmented models which directly use saliency explanations as extra inputs for guiding their attention. Third, we propose SalKG, a framework for KG-augmented models to learn from coarse and/or fine saliency explanations. Given saliency explanations created from a task’s training set, SalKG jointly trains the model to predict the explanations, then solve the task by attending to KG features highlighted by the predicted explanations. On three commonsense QA benchmarks (CSQA, OBQA, CODAH) and a range of KG-augmented models, we show that SalKG can yield considerable performance gains – up to 2.76% absolute improvement on CSQA.
@article{chan2021salkg, title = {SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning}, author = {Chan, Aaron and Xu, Jiashu and Long, Boyuan and Sanyal, Soumya and Gupta, Tanishq and Ren, Xiang}, journal = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2021}, }