Configuration
pydolphinscheduler has a built-in module setting necessary configuration to start and run your workflow code. You could directly use them if you only want to run a quick start or for a simple job like POC. But if you want to deep use pydolphinscheduler and even use it in production. You should probably need to modify and change the built-in configuration.
We have two ways to modify the configuration:
Using Environment Variables: The more lightweight way to modify the configuration. it is useful in containerization scenarios, like docker and k8s, or when you like to temporarily override configs in the configuration file.
Using Configuration File: The more general way to modify the configuration. It is useful when you want to persist and manage configuration files in one single file.
Using Environment Variables
You could change the configuration by adding or modifying the operating system’s environment variables. No matter what way you used, as long as you can successfully modify the environment variables. We use two common ways, Bash and Python OS Module, as examples:
By Bash
Setting environment variables via Bash is the most straightforward and easiest way. We give some examples about how to change them by Bash.
# Modify Java Gateway Address
export PYDS_JAVA_GATEWAY_ADDRESS="192.168.1.1"
# Modify Workflow Default User
export PYDS_WORKFLOW_USER="custom-user"
After executing the commands above, both PYDS_JAVA_GATEWAY_ADDRESS
and PYDS_WORKFLOW_USER
will be changed.
The next time you execute and submit your workflow, it will submit to host 192.168.1.1, and with workflow’s user
named custom-user.
By Python OS Module
pydolphinscheduler is a Python API for Apache DolphinScheduler, and you could modify or add system environment
variables via Python os
module. In this example, we change variables as the same value as we change in
Bash. It will take effect the next time you run your workflow, and call workflow run
or submit
method next to os.environ
statement.
import os
# Modify Java Gateway Address
os.environ["PYDS_JAVA_GATEWAY_ADDRESS"] = "192.168.1.1"
# Modify Workflow Default User
os.environ["PYDS_WORKFLOW_USER"] = "custom-user"
All Configurations in Environment Variables
All environment variables as below, and you could modify their value via Bash or Python OS Module
Variable Section |
Variable Name |
description |
---|---|---|
Java Gateway |
|
Default Java gateway auth token, should changed to custom value when deploy in public network or in production. |
|
Default Java gateway address, will use its value when it is set. |
|
|
Default Java gateway port, will use its value when it is set. |
|
|
Default boolean Java gateway auto convert, will use its value when it is set. |
|
Default User |
|
Default user name, will use when user’s |
|
Default user password, will use when user’s |
|
|
Default user email, will use when user’s |
|
|
Default user phone, will use when user’s |
|
|
Default user state, will use when user’s |
|
Default Workflow |
|
Default workflow project name, will use its value when workflow does not specify the attribute |
|
Default workflow user, will use its value when workflow does not specify the attribute |
|
|
Default workflow queue, will use its value when workflow does not specify the attribute |
|
|
Default workflow worker group, will use its value when workflow does not specify the attribute |
|
|
Default workflow release state, will use its value when workflow does not specify the attribute |
|
|
Default workflow worker group, will use its value when workflow does not specify the attribute |
|
|
Default workflow warning type, will use its value when workflow does not specify the attribute |
|
|
Default workflow execution type, will use its value when workflow does not specify the attribute |
Note
The scope of setting configuration via environment variable is in the workflow, and it will not change the
value of the configuration file. The CLI command config --get
and config --set
operate
the value of the configuration file, so the command config --get
may return a different value from what
you set in the environment variable, and command config --get
will never change your environment variable.
Using Configuration File
If you want to persist and manage configuration in a file instead of environment variables, or maybe you want want to save your configuration file to a version control system, like Git or SVN, and the way to change configuration by file is the best choice.
Export Configuration File
pydolphinscheduler allows you to change the built-in configurations via CLI or editor you like. pydolphinscheduler integrated built-in configurations in its package, but you could also export it locally by CLI
pydolphinscheduler config --init
And it will create a new YAML file in the path ~/pydolphinscheduler/config.yaml by default. If you want to export
it to another path, you should set PYDS_HOME before you run command pydolphinscheduler config --init
.
export PYDS_HOME=<CUSTOM_PATH>
pydolphinscheduler config --init
After that, your configuration file will export into <CUSTOM_PATH>/config.yaml instead of the default path.
Change Configuration
In section export configuration file you export the configuration file locally, and as a local file, you could edit it with any editor you like. After you save your change in your editor, the latest configuration will work when you run your workflow code.
You could also query or change the configuration via CLI config --get <config>
or config --get <config> <val>
.
Both –get and –set could be called one or more times in single command, and you could only set the leaf
node of the configuration but could get the parent configuration, there are simple examples below:
# Get single configuration in the leaf node,
# The output look like below:
# java_gateway.address = 127.0.0.1
pydolphinscheduler config --get java_gateway.address
# Get multiple configuration in the leaf node,
# The output look like below:
# java_gateway.address = 127.0.0.1
# java_gateway.port = 25333
pydolphinscheduler config --get java_gateway.address --get java_gateway.port
# Get parent configuration which contain multiple leaf nodes,
# The output look like below:
# java_gateway = ordereddict([('address', '127.0.0.1'), ('port', 25333), ('auto_convert', True)])
pydolphinscheduler config --get java_gateway
# Set single configuration,
# The output look like below:
# Set configuration done.
pydolphinscheduler config --set java_gateway.address 192.168.1.1
# Set multiple configuration
# The output look like below:
# Set configuration done.
pydolphinscheduler config --set java_gateway.address 192.168.1.1 --set java_gateway.port 25334
# Set configuration not in leaf node will fail
# The output look like below:
# Raise error.
pydolphinscheduler config --set java_gateway 192.168.1.1,25334,True
For more information about our CLI, you could see document Command Line Interface.
All Configurations in File
Here are all our configurations for pydolphinscheduler.
# Setting about Java gateway server
java_gateway:
# Authentication token for connection from python api to python gateway server. Should be changed the default value
# when you deploy in public network.
auth_token: jwUDzpLsNKEFER4*a8gruBH_GsAurNxU7A@Xc
# The address of Python gateway server start. Set its value to `0.0.0.0` if your Python API run in different
# between Python gateway server. It could be be specific to other address like `127.0.0.1` or `localhost`
address: 127.0.0.1
# The port of Python gateway server start. Define which port you could connect to Python gateway server from
# Python API models.
port: 25333
# Whether automatically convert Python objects to Java Objects. Default value is ``True``. There is some
# performance lost when set to ``True`` but for now pydolphinscheduler do not handle the convert issue between
# java and Python, mark it as TODO item in the future.
auto_convert: true
# Setting about dolphinscheduler default value, will use the value set below if property do not set, which
# including ``user``, ``workflow``
default:
# Default value for dolphinscheduler's user object
user:
name: userPythonGateway
password: userPythonGateway
email: userPythonGateway@dolphinscheduler.com
tenant: tenant_pydolphin
phone: 11111111111
state: 1
# Default value for dolphinscheduler's workflow object
workflow:
project: project-pydolphin
user: userPythonGateway
queue: queuePythonGateway
worker_group: default
# Release state of workflow, default value is ``online`` which mean setting workflow online when it submits
# to Java gateway, if you want to set workflow offline set its value to ``offline``
release_state: online
time_zone: Asia/Shanghai
# Warning type of the workflow, default value is ``NONE`` mean do not warn user in any cases of workflow state,
# change to ``FAILURE`` if you want to warn users when workflow failed. All available enum value are
# ``NONE``, ``SUCCESS``, ``FAILURE``, ``ALL``
warning_type: NONE
# Default execution type about how to run multiple workflow instances, default value is ``parallel`` which
# mean run all workflow instances parallel and the other value is ``SERIAL_WAIT``, ``SERIAL_DISCARD``, ``SERIAL_PRIORITY``
execution_type: parallel
Priority
We have two ways to modify the configuration and there is a built-in config in pydolphinscheduler too. It is very important to understand the priority of the configuration when you use them. The overview of configuration priority is.
Environment Variables > Configurations File > Built-in Configurations
This means that your setting in environment variables or configurations file will overwrite the built-in one. And you could temporarily modify configurations by setting environment variables without modifying the global config in the configuration file.