Scanflow [WORK IN PROGRESS]¶

Machine Learning (ML) is more than just training models, the whole workflow must be considered. Once deployed, a ML model needs to be watched and constantly supervised and debugged to guarantee its validity and robustness in unexpected situations. Debugging in ML aims to identify (and address) the model weaknesses in not trivial contexts such as bias in classification, model decay, adversarial attacks, etc., yet there is not a generic framework that allows them to work in a collaborative, modular, portable, iterative way and, more importantly, flexible enough to allow both human and machine tasks to work in a multi-agent environment.

Scanflow allows defining and deploying ML workflows in containers, tracking their metadata, checking their behavior in production, and improving the models by using both learned and human-provided knowledge.

Scanflow is a high-level library that is built on top of MLflow and Docker to manage and supervise workflows efficiently. Its main goals are usability, integration for deployment and real-time checking.

Free software: MIT license
Documentation: https://scanflow.readthedocs.io.

Workflow + Agents¶

https://drive.google.com/uc?export=view&id=1lJxQ693Rjr7zYiy2MjDi07dug-uBcGed

Features¶

Ease of use, fast prototyping. Write once use everywhere.
Portability, scalability (based on docker containers).
Dynamic nested and parallel workflow execution.
Workflow tracking (e.g, logs, metrics, settings, results, etc.).
Workflow checking (e.g, drift distribution, quality data, etc.).
Orchestrator-agnostic.
Model version control.

Getting started¶

Define your working folder and workflows.

import scanflow

from scanflow.setup import Setup, Executor, Workflow
from scanflow.special import Tracker, Checker, Improver, Planner
from scanflow.deploy import Deploy

# App folder
base_path = os.path.dirname(os.getcwd())
app_dir = os.path.join(base_path, "examples/demo_leaf/")

gathering = Executor(name='gathering',
                       file='gathering.py',
                       dockerfile='Dockerfile_gathering')

preprocessing = Executor(name='preprocessing',
                       file='preprocessing.py',
                       requirements='req_preprocessing.txt')


executors = [gathering, preprocessing]

Append your workflows and set a tracker. Besides, you can set how to run your workflows (sequentially or in parallel).

# Workflows
workflow1 = Workflow(name='workflow1',
                 executors=executors,
                 tracker=Tracker(port=8001))

Setup your configuration: build and start the containers. Then, run each workflow.

workflows = Setup(app_dir, workflows=[workflow1],
                         verbose=False)

# Define a deployer to build, start and run the workflows
deployer = Deploy(app_dir, setup, verbose=False)

# Build the docker images
deployer.build_workflows()

# Start the containers
deployer.start_workflows()

# Run the user's code on the containers
deployer.run_workflows()

All the containers will be shown in the scanflow UI, run the following to start the server.

python cli.py server --server_port 8050

Dashboard alpha¶

Go to: http://localhost:8050

https://drive.google.com/uc?export=view&id=1ii7wyXqsDA-eiyA5pI3Y1yccWg-p4FFC

Tutorials¶

Please check the jupyter notebooks for more examples:

tutorials/

Installation¶

Install docker.
sudo usermod -aG docker <your-user> (on Linux)

Using conda

conda create -n scanflow python=3.6
source activate scanflow
git clone https://github.com/gusseppe/scanflow
cd scanflow
pip install -r requirements.txt

Using pip (not yet available)

pip install scanflow

References¶

Colab experiments