Abstract
Healthcare information is usually collected and stored in form of numbers, texts or images. This data consists of important details such as their visits, symptoms, prescriptions, notes or vital statistics of the patients. Most of these documents are huge in amounts and difficult to maintain or access, hence most of the health institutions maintain such details in the form of Electronic Health Records (EHR) in order to avoid manual error and avoid redundancy. This dissertation uses text mining techniques on textual notes from a real time EHR database (MIMIC – III); to identify the most effective vectorization technique to retrieve meaningful information. A comparison among machine learning models alongside of deep learning model is made using the novel H2O framework and Rapid Miner to predict the ICD9 code based on the extracted data.