Plugin Development

Note: Plugins and Operators are the way to extend APAflow. Plugin pages only contain information on how to create Plugins and do not contain information on how to use APAflow or how to create machine learning models.

Plugins define operators (icons) that execute tasks. A task could be as simple as a process that shows the content of a dataframe, or as complex as a process that generates and evaluates a machine learning model. Typically, a plugin contains support for all related tasks that rely on a specific library. For example the PyCaret plugin supports all operators (icons) that rely on the PyCaret library.

PyCaret Plugin

Plugins are open source and can be found in our website or in GitHub. For a Plugin to be distributed with APAflow you need to submit the plugin by email for review. Plugins can also be private. You can create your own Plugins and share the Plugin definition file with your team. A Plugin definition file is an xml file that describes the inputs, outputs, properties and Python code of one or more operators.

Plugins Definition files are XML files

In this tutorial we will explain how to create a simple data reader for the data sources that come with the library PyCaret.

Step 1: Library installation

If you are creating an operator that relies in a specific library that is not distributed with APAflow, first you need to install that library. Please note that in this plugin the code to install the library is not included since PyCaret is already included with APAflow.

The recommended option to install a library is to use the pip command inside the operator code. This way if you distribute the plugin on your own, the operator will install the library. You can use the following instructions:

import sys
!{sys.executable} -m pip install [Your library name]==[library version]

For PyCaret it will look like:
import sys
!{sys.executable} -m pip install pycaret==2.3.0

As an alternative you can use conda:
import sys
!conda install –yes –prefix {sys.prefix} [Your library name]=[library version]

APAflow uses the environment apaflow_env located in C:[installation directory] \APAflow\miniconda3\envs\apaflow_env If you know how to use miniconda environments, during development you can install the library directly in the environment used by APAflow.

Warning: Installing new libraries can break existing libraries if there are conflicting dependencies. For example, library A uses library B version 1.2 and you install library C that also uses library B but version 1.3
Library A may break if library A does not support library B version 1.3

Tip: When installing libraries always include the version since new versions can break your code if features are removed from the library, or call to functions have been modified or depreciated.

Step 2 : Understand components of the code that you want to turn into an operator

Ideally before you start creating a plugin and an operator, you should have a Jupyter notebook with code that executes the task that you want to turn into an operator. You need to make sure the code runs without any errors.

Start from code that works

From the code above, you should identify the flow input (what is received by the operator), flow output (the output of the operator), and the operator properties (what is available for the user to setup the operator):
-For this data reader there is no input since the code reads the data using the PyCaret library. Examples of inputs include dataframes and models.
-The user should have access to select the desired dataset from a list of all available datasets in PyCaret. So ‘boston’ (the dataset name) should be a property in the operator.
-The flow output is a dataframe. Thus pyc_dataset should be the output.
-Additionally, to make a better operator, we can add extra cells that print the types of the dataframe columns and shape using code like:

Keep in mind that APAflow only displays the cell results but does not control the format. You control the format manipulating the code of the Python cell.

Step 3: Create the Plugin, the operator inputs, outputs, properties, and enumerations (if any)

In APAflow, go to the Plugin Tools menu and click on new APAflow plugin. Type a name for the Plugin Definition file. You should see an empty plugin.

Empty Plugin

Under the Operator List box, click the new APAflow operator icon

Type the following information:
-For Name, use PyCaret_LoadDataset. It is recommended to use the name format LibraryName_Task
-For Type select Source
-For Category select Data Ingestion
-For Description use Loads dataset from PyCaret repository. This text will be displayed under the operator help in APAflow.

-You can select a 64×64 png image to be used as the operator icon. If you do not select and image, a default icon is used.
-Under Outputs, click on the icon and create a new output. For the name use pyc_dataset, use the type Table. Types in inputs and outputs define the connections that are allowed in and out of an operator. Table type is used for dataframes.
There are many predefined types or you can use any of the 40 generic types that are available. If you want two operators to connect, you need to make source that they use the same types: the source operator output type should be the same as the target operator input type.

Metadata Suppliers: If your operator uses columns of the input dataframe as properties, you need to define a metadata supplier. Typically, this is used on operators that allow the users to filter columns or select the target column.
The image below shows an operator used to join two tables. This operator requires the user to select columns from the input tables. Thus, two metadata suppliers are used. One per each input table.

The property for the operator you are creating should allow the user to select a dataset using the dataset name. So you need to create a list of available datasets. This is done with Enumerations.

Click OK to close the operator definition. Then at the bottom look for the Enumeration List box and click on the Create and Link new Enumerator icon. Type a name and using the plus icon add all dataset names.

Click on OK and go back to the Operator definition. Double click on the Operator name to open the operator definition.
Under properties click on Create and link new Property
Type Dataset for the name, then type Select a PyCaret dataset under description. This text will be used as the property help that is shown as a tooltip.
From Category select enumeration. Then at the bottom select the Enumeration you just created.

Step 4: Create the Python code that supports the operator task

Under Operator Codes, click on the Create and link new Operator Code icon
Type PyCaret_LoadDatasetCode for the name.

Copy the code you have from your Jupyter notebook into the Code field. The code field is a template that is used by the code generation engine. Basically, it defines the name for the inputs and outputs, and replaces variables with the user selections.

In the code field add the apaflow_cell tag to identify each cell in your code (This would be a Jupyter notebook cell when you export the flow as a Jupyter notebook). Add a title for the cell if needed.

Note on cell definitions: There are three ways to define a cell in APAflow:
-apaflow_cell title=”Your title goes here”, defines a cell with a title
-apaflow_cell : defines a cell without any title
-apaflow_cell title=” Your title goes here ” interactive=”true”, defines a cell that has an interactive plot.
The image below shows the code for the Clustering operator. At the bottom, you can see the definition for a PCA interactive plot, followed by a regular Elbow plot.

Now we need to replace the inputs, outputs, and properties with variables. Select ‘boston’ and click on Insert property to replace the existing text with a property variable.

[block.Dataset/] means get the dataset name from the user selection (from the dropdown menu in the operator property)

Select pyc_dataset and click on Replace All Instances. Then click on Insert output.
pyc_dataset is replaced with the property variable [pyc_dataset/]

Keep in mind that during code generation an id is added to the input and output variables that identifies the operator. So [pyc_dataset/] may end up in the Jupyter notebook as pyc_dataset_id1 (id1 means operator 1). This convention is required to allow the user to use the same operator more than once in the same flow.

Click OK. We are done creating our first plugin with one operator.

If you need to install your own library for your operator code to work, you could do something like the following: (We won’t include that code in our plugin since PyCaret is already installed with APAflow)

Step 5: Generate and install the plugin

A plugin definition file is just an xml file. To install the plugin, APAflow must compile the plugin first. Plugin Export is the process of reading the plugin definition file and generating a java component (a jar file) that will be used by APAflow. Once you have an export file, you can install it. To make the process simpler, the recommended way is to use the Export Plugin and Install option. This option exports the plugin (creates a jar file) and then installs the plugin (includes the jar in APAflow).

After you click on Export Plugin and Install option, you need to define where the compiled plugin will be saved. Then the compilation takes place and the installation window opens.
Select the plugin, Click on Next and then Click on Finish.

APAflow will restart and the operators that are part of the plugin will be ready to be used.

Please note that APAflow does not allow you to install the current or previous version of a plugin. You always need to install a new version.

Step 6: Submit your plugin for distribution

If you want your plugin to be included with APAflow, you need to submit for review the plugin definition file, the related icons as 64×64 png images, and the library installation instructions (only if the library is not being used in APAflow). All plugins are open source and are available in our site and in GitHub. Submitting a plugin means that you transfer ownership of the plugin to APAflow and that the plugin will be open source and will be distributed for free with APAflow, in the APAflow website, and in GitHub. Please note that some plugins may not be able to be included with APAflow if there are libraries that break existing plugins.

Click here to download the plugin created in this tutorial along with other plugins as a zip file. These are the plugins used in the current APAflow version. If you want to modify and install any of these plugins, you may need to increase the version number so APAflow installs the plugin as an upgrade.

Please contact support at support at APAflow dot com if you have any questions and send us your feedback to improve the way we create plugins!