Search the portal

Please enter a term

Full record

« Back to home page
TitleEfficiently Scheduling Remote And Local Resources in ML Data Input Pipeline
AuthorLi, Muyu
ContributorKlimovic, Ana
Identifier
Identifierinfo:doi:10.3929/ethz-b-000563201
AbstractMachine Learning model training is costly and time-consuming. According to recent research, the bottleneck of the model training process lies in the input data processing stages. tf.data.service, as well as its extension Cachew, tries to solve this problem by disaggregating the input pipelines from the model and moving them onto the cloud. This successfully removes the bottleneck from the input pipeline. However, such an approach introduces extra cost from the cloud service and fails to fully utilize the computation resources on the host machine. In this thesis, we discuss two different approaches to solving this problem: utilizing local workers and pipeline splitting, and propose a final policy integrating both of them to minimize the extra cost while keeping the pipeline fast enough. This policy is implemented upon v2.8 of tf.data and tested on different input pipelines, seeing a 9% to 26% cost saving compared to Cachew’s autoscaling policy.
PublisherETH Zurich, Department of Computer Science, Systems Group
Date2022
Formatapplication/application/pdf
Languageen
Rightsinfo:eu-repo/semantics/openAccess
Rightshttp://rightsstatements.org/page/InC-NC/1.0/
RightsIn Copyright - Non-Commercial Use Permitted