Configuration

pydolphinscheduler has a built-in module setting necessary configuration to start and run your workflow code. You could directly use them if you only want to run a quick start or for a simple job like POC. But if you want to deep use pydolphinscheduler and even use it in production. You should probably need to modify and change the built-in configuration.

We have two ways to modify the configuration:

  • Using Environment Variables: The more lightweight way to modify the configuration. it is useful in containerization scenarios, like docker and k8s, or when you like to temporarily override configs in the configuration file.

  • Using Configuration File: The more general way to modify the configuration. It is useful when you want to persist and manage configuration files in one single file.

Using Environment Variables

You could change the configuration by adding or modifying the operating system’s environment variables. No matter what way you used, as long as you can successfully modify the environment variables. We use two common ways, Bash and Python OS Module, as examples:

By Bash

Setting environment variables via Bash is the most straightforward and easiest way. We give some examples about how to change them by Bash.

# Modify Java Gateway Address
export PYDS_JAVA_GATEWAY_ADDRESS="192.168.1.1"

# Modify Workflow Default User
export PYDS_WORKFLOW_USER="custom-user"

After executing the commands above, both PYDS_JAVA_GATEWAY_ADDRESS and PYDS_WORKFLOW_USER will be changed. The next time you execute and submit your workflow, it will submit to host 192.168.1.1, and with workflow’s user named custom-user.

By Python OS Module

pydolphinscheduler is a Python API for Apache DolphinScheduler, and you could modify or add system environment variables via Python os module. In this example, we change variables as the same value as we change in Bash. It will take effect the next time you run your workflow, and call workflow run or submit method next to os.environ statement.

import os
# Modify Java Gateway Address
os.environ["PYDS_JAVA_GATEWAY_ADDRESS"] = "192.168.1.1"

# Modify Workflow Default User
os.environ["PYDS_WORKFLOW_USER"] = "custom-user"

All Configurations in Environment Variables

All environment variables as below, and you could modify their value via Bash or Python OS Module

Variable Section

Variable Name

description

Java Gateway

PYDS_JAVA_GATEWAY_AUTH_TOKEN

Default Java gateway auth token, should changed to custom value when deploy in public network or in production.

PYDS_JAVA_GATEWAY_ADDRESS

Default Java gateway address, will use its value when it is set.

PYDS_JAVA_GATEWAY_PORT

Default Java gateway port, will use its value when it is set.

PYDS_JAVA_GATEWAY_AUTO_CONVERT

Default boolean Java gateway auto convert, will use its value when it is set.

Default User

PYDS_USER_NAME

Default user name, will use when user’s name when does not specify.

PYDS_USER_PASSWORD

Default user password, will use when user’s password when does not specify.

PYDS_USER_EMAIL

Default user email, will use when user’s email when does not specify.

PYDS_USER_PHONE

Default user phone, will use when user’s phone when does not specify.

PYDS_USER_STATE

Default user state, will use when user’s state when does not specify.

Default Workflow

PYDS_WORKFLOW_PROJECT

Default workflow project name, will use its value when workflow does not specify the attribute project.

PYDS_WORKFLOW_USER

Default workflow user, will use its value when workflow does not specify the attribute user.

PYDS_WORKFLOW_QUEUE

Default workflow queue, will use its value when workflow does not specify the attribute queue.

PYDS_WORKFLOW_WORKER_GROUP

Default workflow worker group, will use its value when workflow does not specify the attribute worker_group.

PYDS_WORKFLOW_RELEASE_STATE

Default workflow release state, will use its value when workflow does not specify the attribute release_state.

PYDS_WORKFLOW_TIME_ZONE

Default workflow worker group, will use its value when workflow does not specify the attribute timezone.

PYDS_WORKFLOW_WARNING_TYPE

Default workflow warning type, will use its value when workflow does not specify the attribute warning_type.

PYDS_WORKFLOW_EXECUTION_TYPE

Default workflow execution type, will use its value when workflow does not specify the attribute execution_type.

Note

The scope of setting configuration via environment variable is in the workflow, and it will not change the value of the configuration file. The CLI command config --get and config --set operate the value of the configuration file, so the command config --get may return a different value from what you set in the environment variable, and command config --get will never change your environment variable.

Using Configuration File

If you want to persist and manage configuration in a file instead of environment variables, or maybe you want want to save your configuration file to a version control system, like Git or SVN, and the way to change configuration by file is the best choice.

Export Configuration File

pydolphinscheduler allows you to change the built-in configurations via CLI or editor you like. pydolphinscheduler integrated built-in configurations in its package, but you could also export it locally by CLI

pydolphinscheduler config --init

And it will create a new YAML file in the path ~/pydolphinscheduler/config.yaml by default. If you want to export it to another path, you should set PYDS_HOME before you run command pydolphinscheduler config --init.

export PYDS_HOME=<CUSTOM_PATH>
pydolphinscheduler config --init

After that, your configuration file will export into <CUSTOM_PATH>/config.yaml instead of the default path.

Change Configuration

In section export configuration file you export the configuration file locally, and as a local file, you could edit it with any editor you like. After you save your change in your editor, the latest configuration will work when you run your workflow code.

You could also query or change the configuration via CLI config --get <config> or config --get <config> <val>. Both –get and –set could be called one or more times in single command, and you could only set the leaf node of the configuration but could get the parent configuration, there are simple examples below:

# Get single configuration in the leaf node,
# The output look like below:
# java_gateway.address = 127.0.0.1
pydolphinscheduler config --get java_gateway.address

# Get multiple configuration in the leaf node,
# The output look like below:
# java_gateway.address = 127.0.0.1
# java_gateway.port = 25333
pydolphinscheduler config --get java_gateway.address --get java_gateway.port


# Get parent configuration which contain multiple leaf nodes,
# The output look like below:
# java_gateway = ordereddict([('address', '127.0.0.1'), ('port', 25333), ('auto_convert', True)])
pydolphinscheduler config --get java_gateway

# Set single configuration,
# The output look like below:
# Set configuration done.
pydolphinscheduler config --set java_gateway.address 192.168.1.1

# Set multiple configuration
# The output look like below:
# Set configuration done.
pydolphinscheduler config --set java_gateway.address 192.168.1.1 --set java_gateway.port 25334

# Set configuration not in leaf node will fail
# The output look like below:
# Raise error.
pydolphinscheduler config --set java_gateway 192.168.1.1,25334,True

For more information about our CLI, you could see document Command Line Interface.

All Configurations in File

Here are all our configurations for pydolphinscheduler.

# Setting about Java gateway server
java_gateway:
  # Authentication token for connection from python api to python gateway server. Should be changed the default value
  # when you deploy in public network.
  auth_token: jwUDzpLsNKEFER4*a8gruBH_GsAurNxU7A@Xc

  # The address of Python gateway server start. Set its value to `0.0.0.0` if your Python API run in different
  # between Python gateway server. It could be be specific to other address like `127.0.0.1` or `localhost`
  address: 127.0.0.1

  # The port of Python gateway server start. Define which port you could connect to Python gateway server from
  # Python API models.
  port: 25333

  # Whether automatically convert Python objects to Java Objects. Default value is ``True``. There is some
  # performance lost when set to ``True`` but for now pydolphinscheduler do not handle the convert issue between
  # java and Python, mark it as TODO item in the future.
  auto_convert: true

# Setting about dolphinscheduler default value, will use the value set below if property do not set, which
# including ``user``, ``workflow`` 
default:
  # Default value for dolphinscheduler's user object
  user:
    name: userPythonGateway
    password: userPythonGateway
    email: userPythonGateway@dolphinscheduler.com
    tenant: tenant_pydolphin
    phone: 11111111111
    state: 1
  # Default value for dolphinscheduler's workflow object
  workflow:
    project: project-pydolphin
    user: userPythonGateway
    queue: queuePythonGateway
    worker_group: default
    # Release state of workflow, default value is ``online`` which mean setting workflow online when it submits
    # to Java gateway, if you want to set workflow offline set its value to ``offline``
    release_state: online
    time_zone: Asia/Shanghai
    # Warning type of the workflow, default value is ``NONE`` mean do not warn user in any cases of workflow state,
    # change to ``FAILURE`` if you want to warn users when workflow failed. All available enum value are
    # ``NONE``, ``SUCCESS``, ``FAILURE``, ``ALL`` 
    warning_type: NONE
    # Default execution type about how to run multiple workflow instances, default value is ``parallel`` which
    # mean run all workflow instances parallel and the other value is ``SERIAL_WAIT``, ``SERIAL_DISCARD``, ``SERIAL_PRIORITY``
    execution_type: parallel

Priority

We have two ways to modify the configuration and there is a built-in config in pydolphinscheduler too. It is very important to understand the priority of the configuration when you use them. The overview of configuration priority is.

Environment Variables > Configurations File > Built-in Configurations

This means that your setting in environment variables or configurations file will overwrite the built-in one. And you could temporarily modify configurations by setting environment variables without modifying the global config in the configuration file.