<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/terms/"
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
         xmlns:bibo="http://purl.org/ontology/bibo/"
         xmlns:fabio="http://purl.org/spar/fabio/"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns:event="http://purl.org/NET/c4dm/event.owl#"
         xmlns:ore="http://www.openarchives.org/ore/terms/">

    <rdf:Description rdf:about="https://research-explorer.ista.ac.at/record/7479">
        <ore:isDescribedBy rdf:resource="https://research-explorer.ista.ac.at/record/7479"/>
        <dc:title>Distillation-based training for multi-exit architectures</dc:title>
        <bibo:authorList rdf:parseType="Collection">
            <foaf:Person>
                <foaf:name></foaf:name>
                <foaf:surname></foaf:surname>
                <foaf:givenname></foaf:givenname>
            </foaf:Person>
            <foaf:Person>
                <foaf:name></foaf:name>
                <foaf:surname></foaf:surname>
                <foaf:givenname></foaf:givenname>
            </foaf:Person>
        </bibo:authorList>
        <bibo:abstract>Multi-exit architectures, in which a stack of processing layers is interleaved with early output layers, allow the processing of a test example to stop early and thus save computation time and/or energy.  In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. The method encourage searly exits to mimic later, more accurate exits, by matching their output probabilities.
Experiments  on  CIFAR100  and  ImageNet  show  that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy  for  late  ones.   The  method  is  particularly  beneficial when  training  data  is  limited  and  it  allows  a  straightforward extension to semi-supervised learning,i.e. making use of unlabeled data at training time. Moreover, it takes only afew lines to implement and incurs almost no computational overhead at training time, and none at all at test time.</bibo:abstract>
        <bibo:volume>2019-October</bibo:volume>
        <bibo:startPage>1355-1364</bibo:startPage>
        <bibo:endPage>1355-1364</bibo:endPage>
        <dc:publisher>IEEE</dc:publisher>
        <dc:format>application/pdf</dc:format>
        <ore:aggregates rdf:resource="https://research-explorer.ista.ac.at/download/7479/7480/main.pdf"/>
        <bibo:doi rdf:resource="10.1109/ICCV.2019.00144" />
        <ore:similarTo rdf:resource="info:doi/10.1109/ICCV.2019.00144"/>
    </rdf:Description>
</rdf:RDF>
