In this quickstart guide you will learn about APAflow and you will create your first machine learning model.
Installation and Setup
Minimum Requirements: Windows 64 bits, Quad core, 3GHz or faster processor, 16GB RAM, 10GB free disk space.
The first time you run APAflow, Miniconda, H2O, and PyCaret will be downloaded and installed. Installation time takes from 5 to 10 min (depending on your internet speed).
Data Science Workbench
The APAflow user interface is a complete visual workbench designer to create Data Science projects. It has the following components:
Project Explorer: Used to create and manage your own Data Science and machine learning projects. Projects are directories located in the APAflow\workspace directory. To start your fist machine learning project, right click in the Project Explorer and select the new option (or click on the menu File – and then select the New option)
Flow : The Flow window is where you create your machine learning models visually. Your task is to build Data Science Workflows or Flows following these steps:
-You insert Operators from the Palette window.
-Then you connect the Operators using the Connection tool.
-Setup the properties for each Operator
-Run Operators and check the Result window
The buttons in the Flow toolbar can be used to control the Flow execution:
Palette: The Palette is the repository for all the operators (icons). An operator is a component that executes a predefined task. Load Dataset, Data Exploration, Classification Predictor, and Save Model are examples of operators. To use an operator you only need to drag and drop it from the Palette window into the Flow window. You can create your own operators using APAflow Plugin Builder for Data Science.
Operators can have different states: Requires Input Data, Not Configured, Ready to Run, and Executed.
Operators can only be connected if the source Operator Output matches the target Operator Input. For example the Data Exploration operator expects a table as input, so it can be connected to any Operator that provides a table as output.
Properties: Under the Properties window you can check the operator’s properties along with the Python code that the operator executes.
Console: All steps that take place during the execution of the operators are printed in the Console. You can also use it to monitor the active Python Kernels. (Under the hood APAflow sends commands to a Python Kernel that execute Python code and return the Results)
Outline: Shows the Data Science workflow
Build your first Classification Model
You will use the Default of Credit Card Clients Dataset from UCI to create a machine learning classification model to predict default. The target column is default (1=yes, 0=no). This example uses the PyCaret operators.
Step 1: Drag and drop the PyCaret Load Dataset operator into the Flow window. Double click the operator to select the credit dataset. The Load Dataset operator task is to read datasets that are available in the PyCaret library. Run the operator using the options in the toolbar or the right click menu.
Step 2: After the Load Dataset operator is executed, you can see the results using the Right + Click menu, selecting the Results option (or using the icons in the toolbar). You can also explore the Flow Outputs (information that will be passed to the next operator once connected)
Step 3: Drag and drop the Data Exploration Column operator into the Flow. Then use the Connect tool to connect the Load Dataset operator with the Data Exploration column operator. The Connect operator is available in the toolbar in the Palette or under Connect.
Step 4: Double click the Data Exploration Column operator and set “default” as the Column property (the target for this classification model is the column named default). Then Run the operator using the Right + Click menu and selecting the option Run and Show Results.
Step 5: Drag and drop the PyCaret Setup Classification operator into the Flow. Connect it to the Load Dataset operator and setup default as the target column. The Setup Classification operator does data preparation and feature engineering automatically.
Step 6: Drag and drop the PyCaret Classification operator into the Flow. Connect it to the PyCaret Setup Classification operator and set it up to create a Random Forest Classifier. Run it and explore Results. The PyCaret Classification operator creates a classification model and provides key information like AUC Plot, Precision-Recall Curve, Confusion Matrix, Feature Importance Plot, along with Predictions and metrics on the test/hold-out sample.
Step 7: APAflow Designer is not a black box. At any time you can export your Flow’s code as a Jupyter Notebook. Click on the Options menu, then select the Export as Jupyter Notebook option. You can use the underlying Python code to rerun the machine learning model in Jupyter or to take the ml model into production.
Step 8: Create documentation automatically that explains the parts of the Data Science Workflow. Click on the Options menu, then select the Generate Documentation option. A new Word document is created that highlights the Flow, its operators, the data used, the underlying code and the results.
Know issues and bugs
Sometimes Python Kernels stop responding. To fix it, Restart the Kernel and rerun the Flow.