• Login
    View Item 
    •   DBS eSource Home
    • Masters Dissertations
    • Information & Communications Technology
    • View Item
    •   DBS eSource Home
    • Masters Dissertations
    • Information & Communications Technology
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Performance improvement and reporting techniques using SparklyR and H2O.ai

    View/Open
    msc_pilli_hj_2020.pdf (1.387Mb)
    Author
    Pilli, Happy Justin
    Date
    2020
    Degree
    MSc in Data Analytics
    URI
    https://esource.dbs.ie/handle/10788/4227
    Publisher
    Dublin Business School
    Rights holder
    http://esource.dbs.ie/copyright
    Rights
    Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
    Metadata
    Show full item record
    Abstract
    The aim of the current study was to analyse ways of reducing cost and improving performance for machine learning by integrating driverless AI such as H2O with Spark in R and generate report. The current research pits regression models such as LM, GBM, XGBoost and Random Forest with one another and focuses on identifying the best performing model in terms of RMSE, time to execute and hardware cost. The datasets contained 29 variables and 65000+ observations out of which, Origin, Dest, UniqueCarrier, FlightNum, Month, DayOfWeek, DayofMonth, Distance, DepDelay, ArrDelay, AirTime, Cancelled, hour and gain were considered. The analyses showed that, GBM was the best performing model with optimal cost followed by XGBoost, Random Forest and LM. In conclusion, it was proved that machine learning is cost effective by integrating H2O with Spark in R and professional reports can be generated with feedback and test results from Shiny.
    Collections
    • Information & Communications Technology

    Browse

    All of DBS eSourceCommunities & CollectionsBy Issue DateAuthorsSupervisorTitlesSubjectsThis CollectionBy Issue DateAuthorsSupervisorTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV