You will see 3 Machine Learning models in your AML workspace

  1. Opportunity Scoring model is used for Training/Retraining.  You can pass new data and model will be trained on your Data.
  2. Opportunity scoring [predictive] makes prediction based on above trained model.  It refers to item 1.  It is scoring experiment.
  3. Ablation is used to provide insight - why certain score is given.

Most of feature engineering steps are common

Training Experiment

 Training experiment copied into your workspace is doing following

It has 2 sub tree/path  starting from - Training Data and TestData.  Sub tee are almost identical and test tree is used to evaluate experiment on test data.

Training tree is used to train model.  It has web service input.  When you call retraining web service, this portion of tree execute.

Training path

  1. Make string features as categorical
  2. Make float/numeric features as non-categorical
  3. Clean Missing Data  - Replace missing string value with <!missing>
  4. Clean Missing Data -   Replace missing float values with Median
  5. After that divide data into 2 branches
    • Branch1
      • Normalize Numeric Data with MinMax.
      • Do feature hashing on Categorical column.  Model is using 12 bitsize feature hashing
      • Exclude label
    • Branch2
      • Take String columns and exclude categorical columns. you are left with non-categorical string column
      • Do feature hashing.  Model use 10 bit feature hashing
  6. Combine the above 2 branches
  7. Use two class Boosted Decision Tree algorithm to train model

Test Path

  1. Score the result.  It is used if you are evaluating experiment in AML.

Web Service Output - Model

If you are calling web service - output of webservice is trained model.

 

Last edited Oct 19, 2016 at 7:47 PM by prashdesh, version 3