Welcome to the IKCEST

Information Sciences | Vol.601, Issue. | 2022-07-01 | Pages 71-89

Information Sciences

Best-in-class imitation: Non-negative positive-unlabeled imitation learning from imperfect demonstrations

Quan Liu   Fei Zhu   Xinghong Ling   Lin Zhang  
Abstract

Although imitation learning can learn an optimal policy from expert demonstrations, it may fail to be transferred to practical environments because it is difficult to collect high-quality demonstrations for which the ultimate policy is not accurate enough and converges slowly. To solve the problem, an algorithm that utilizes Non-negative Positive-unlabeled learning (nnPU) as the probabilistic classifier to evaluate the quality of demonstrations, referred to as Non-negative Positive-unlabeled Importance Weighting Imitation Learning (PUIWIL), is proposed to increase the utilization of imperfect demonstrations and improve the performance of imitation learning. PUIWIL introduces confidence scores calculated by the nnPU classifier for expert demonstrations, which indicates the probability that the demonstration is generated by an optimal policy, and reweights all expert demonstrations according to their confidence scores. In addition, PUIWIL reconstructs the standard GAIL framework to make high-quality demonstrations have a more significant impact on imitation learning, which is called Best-in-class Imitation. The experiments demonstrate that PUIWIL improves both the performance and robustness of imitation learning from imperfect demonstrations.

Original Text (This is the original text for your reference.)

Best-in-class imitation: Non-negative positive-unlabeled imitation learning from imperfect demonstrations

Although imitation learning can learn an optimal policy from expert demonstrations, it may fail to be transferred to practical environments because it is difficult to collect high-quality demonstrations for which the ultimate policy is not accurate enough and converges slowly. To solve the problem, an algorithm that utilizes Non-negative Positive-unlabeled learning (nnPU) as the probabilistic classifier to evaluate the quality of demonstrations, referred to as Non-negative Positive-unlabeled Importance Weighting Imitation Learning (PUIWIL), is proposed to increase the utilization of imperfect demonstrations and improve the performance of imitation learning. PUIWIL introduces confidence scores calculated by the nnPU classifier for expert demonstrations, which indicates the probability that the demonstration is generated by an optimal policy, and reweights all expert demonstrations according to their confidence scores. In addition, PUIWIL reconstructs the standard GAIL framework to make high-quality demonstrations have a more significant impact on imitation learning, which is called Best-in-class Imitation. The experiments demonstrate that PUIWIL improves both the performance and robustness of imitation learning from imperfect demonstrations.

+More

Cite this article
APA

APA

MLA

Chicago

Quan Liu, Fei Zhu, Xinghong Ling,Lin Zhang,.Best-in-class imitation: Non-negative positive-unlabeled imitation learning from imperfect demonstrations. 601 (),71-89.

Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
Translate engine
Article's language
English
中文
Pусск
Français
Español
العربية
Português
Kikongo
Dutch
kiswahili
هَوُسَ
IsiZulu
Action
Recommended articles

Report

Select your report category*



Reason*



By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

Submit
Cancel