Machine Learning is the process of extracting some kind of knowledge from data and later on using this knowledge to make predictions about new unseen data.
Main aim behind machine learning is to automate decision making from data without developers manually specifying rules about decision making process and progress towards Artifical General Intelligence. Process of manually specifying rules will exhausting and developers won't be able to specify all rules explicitly covering all scenarios.
As a part of machine learning process, developers are given data which is generally
historical data and fields for which prediction needs to be made in future. If historical data provided is not clean then developers need to spend time
cleaning data doing various activities like taking decision on NAs, removing duplicate entries, converting data into proper format for doing ananlysis and using with Machine Learning Algorithm/Model. Developers then spends time analyzing data doing
exploratory data analysis, analyzing relationship between target and other variables, making visualisation explaining relationships. Once developers are done with initial analysis, they divide data generally into 3 sets
(1. Train Set 2. Validation Set 3. Test Set ). After that developer spends time finding out right
algorithm/model on which he/she trains model and test it's performance on validation set in parallel. Once developer finds out that algorithm/model is performing well on Train & Validation Set then final round of testing is performed on Test Set. If algorithm/model performs well on Test Set as well then developer can be sure that model
generalizes well. If algorithms does not perform well on Validation/Test Set then developer generally modifies few parameters of algorithm/model and then trains again. If that does not work as well then developers generally needs to resort to designing another algorithm/model. Process is repeated from designing model till end until developers find right algorithm/model which
generalizes well. Once developers find right algorithm which genelizes when then it's put into production for making
predictions on future unseen data.
Machine Learning Process has below mentioned
stages generally performed by most of developers:
Supervised Machine Learning : In supervised machine learning, developers are given which target variable they need to predict in future upon completion of building successful ML Algorithm/Model. Task of learning for algorithm/model is supervised by telling it what to predict hence called supervised machine learning.
Regression : Refers to type of supervised machine learning where predicted output variable is continous.
Examples of regression:
* Based on persons age, education and few other personal attributes, predict his/her salary. * Based on attributes of apartment, predict its expected sale price.
Classification : Refers to type of supervised machine learning where predicted output varible is discrete/categorical (nominal variable, not ordinal).
Examples of classification:
* Given list of images of flowers, determine flower name * Given data about few tumor patients data, predict for future patient whether he/she has malignant tumor or not. * Determine comment on quora is toxic or not.
Unsupervised Machine Learning : In unsupervised machine learning, developers are not given any insight into data like supervised and they need to design algorithm/model which can help find out some insight/structure from data without any prior knowledge about data.
Clustering : Refers to type of unsupervised machine learning where algorithm/model tries to divide data into clusters where each cluster has data of same type/characteristics.
Examples of clustering:
* Given mixture of sound from different sources, seprate sounds from different sources. * Cluster bunch of PDFs into clusters based on their contents. * Clusters students into cluster based on their performances through various exams to design custom coaching plan as per performance of student.
Dimensionality Reduction : Refers to type of unsupervised machine learning where algorith/model tries to represent original data as lower dimension data hence resulting in taking less space but still does have same meaning/characteristics of original data. Also refers to as
data compression by some in ML community.
Examples of dimensionality reduction:
* Data compression which will be used by other Machine Learning Models.