API

Core

Init pydolphinscheduler.core package.

class pydolphinscheduler.core.Database(database_name: str, type_key, database_key, *args, **kwargs)[source]

Bases: dict

database object, get information about database.

You provider database_name contain connection information, it decisions which database type and database instance would run task.

clear() None.  Remove all items from D.
copy() a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

get_database_info(name) Dict[source]

Get database info from java gateway, contains database id, type, name.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
property database_id: str

Get database id from java gateway, a wrapper for get_database_info().

property database_type: str

Get database type from java gateway, a wrapper for get_database_info().

class pydolphinscheduler.core.Engine(name: str, task_type: str, main_class: str, main_package: str, program_type: ProgramType | None = 'SCALA', *args, **kwargs)[source]

Bases: Task

Task engine object, declare behavior for engine task to dolphinscheduler.

This is the parent class of spark, flink and mr tasks, and is used to provide the programType, mainClass and mainJar task parameters for reuse.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_jar_id() int[source]

Get jar id from java gateway, a wrapper for get_resource_info().

get_resource_info(program_type, main_package)[source]

Get resource info from java gateway, contains resource id, name.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_task_custom_attr: set = {}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for engine children task.

children task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.core.ProcessDefinition(name: str, description: str | None = None, schedule: str | None = None, start_time: str | None = None, end_time: str | None = None, timezone: str | None = 'Asia/Shanghai', user: str | None = 'userPythonGateway', project: str | None = 'project-pydolphin', tenant: str | None = 'tenant_pydolphin', worker_group: str | None = 'default', warning_type: str | None = 'NONE', warning_group_id: int | None = 0, timeout: int | None = 0, release_state: str | None = 'online', param: Dict | None = None, resource_list: List[Resource] | None = None)[source]

Bases: Base

process definition object, will define process definition attribute, task, relation.

TODO: maybe we should rename this class, currently use DS object name.

Parameters:
  • user – The user for current process definition. Will create a new one if it do not exists. If your parameter project already exists but project’s create do not belongs to user, will grant project to user automatically.

  • project – The project for current process definition. You could see the workflow in this project thought Web UI after it submit() or run(). It will create a new project belongs to user if it does not exists. And when project exists but project’s create do not belongs to user, will grant project to user automatically.

  • resource_list – Resource files required by the current process definition.You can create and modify resource files from this field. When the process definition is submitted, these resource files are also submitted along with it.

_ensure_side_model_exists()[source]

Ensure process definition models model exists.

For now, models object including pydolphinscheduler.models.project.Project, pydolphinscheduler.models.tenant.Tenant, pydolphinscheduler.models.user.User. If these model not exists, would create default value in pydolphinscheduler.constants.ProcessDefinitionDefault.

_handle_root_relation()[source]

Handle root task property pydolphinscheduler.core.task.TaskRelation.

Root task in DAG do not have dominant upstream node, but we have to add an exactly default upstream task with task_code equal to 0. This is requests from java gateway interface.

static _parse_datetime(val: Any) Any[source]
_pre_submit_check()[source]

Check specific condition satisfy before.

This method should be called before process definition submit to java gateway For now, we have below checker: * self.param or at least one local param of task should be set if task switch in this workflow.

add_task(task: Task) None[source]

Add a single task to process definition.

add_tasks(tasks: List[Task]) None[source]

Add task sequence to process definition, it a wrapper of add_task().

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_one_task_by_name(name: str) Task[source]

Get exact one task from process definition by given name.

Function always return one task even though this process definition have more than one task with this name.

get_task(code: str) Task[source]

Get task object from process definition by given code.

get_tasks_by_name(name: str) Set[Task][source]

Get tasks object by given name, if will return all tasks with this name.

run()[source]

Submit and Start ProcessDefinition instance.

Shortcut for function submit() and function start(). Only support manual start workflow for now, and schedule run will coming soon. :return:

start() None[source]

Create and start ProcessDefinition instance.

which post to start-process-instance to java gateway

submit() int[source]

Submit ProcessDefinition instance to java gateway.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'_project', '_tenant', 'description', 'name', 'param', 'release_state', 'resource_list', 'task_definition_json', 'task_relation_json', 'tasks', 'timeout', 'warning_group_id', 'warning_type', 'worker_group'}
_KEY_ATTR: set = {'name', 'param', 'project', 'release_state', 'tenant'}
property end_time: Any

Get attribute end_time.

property param_json: List[Dict] | None

Return param json base on self.param.

property project: Project

Get attribute project.

property release_state: int

Get attribute release_state.

property schedule_json: Dict | None

Get schedule parameter json object. This is requests from java gateway interface.

property start_time: Any

Get attribute start_time.

property task_definition_json: List[Dict]

Return all tasks definition in list of dict.

property task_list: List[Task]

Return list of tasks objects.

property task_relation_json: List[Dict]

Return all relation between tasks pair in list of dict.

property tenant: Tenant

Get attribute tenant.

property user: User

Get user object.

For now we just get from python models but not from java gateway models, so it may not correct.

class pydolphinscheduler.core.Task(name: str, task_type: str, description: str | None = None, flag: str | None = 'YES', task_priority: str | None = 'MEDIUM', worker_group: str | None = 'default', environment_name: str | None = None, delay_time: int | None = 0, fail_retry_times: int | None = 0, fail_retry_interval: int | None = 1, timeout_flag: int | None = 'CLOSE', timeout_notify_strategy: Optional = None, timeout: int | None = 0, process_definition: ProcessDefinition | None = None, local_params: List | None = None, resource_list: List | None = None, dependence: Dict | None = None, wait_start_timeout: Dict | None = None, condition_result: Dict | None = None)[source]

Bases: Base

Task object, parent class for all exactly task type.

_get_attr() Set[str][source]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None[source]

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple[source]

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None[source]

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None[source]

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_task_custom_attr: set = {}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

Models

Init Models package, keeping object related to DolphinScheduler covert from Java Gateway Service.

class pydolphinscheduler.models.Base(name: str, description: str | None = None)[source]

Bases: object

DolphinScheduler Base object.

get_define(camel_attr: bool = True) Dict[source]

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict[source]

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}
class pydolphinscheduler.models.BaseSide(name: str, description: str | None = None)[source]

Bases: Base

Base class for models object, it declare base behavior for them.

classmethod create_if_not_exists(user='userPythonGateway')[source]

Create Base if not exists.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}
class pydolphinscheduler.models.Project(name: str = 'project-pydolphin', description: str | None = None)[source]

Bases: BaseSide

DolphinScheduler Project object.

create_if_not_exists(user='userPythonGateway') None[source]

Create Project if not exists.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}
class pydolphinscheduler.models.Queue(name: str = 'queuePythonGateway', description: str | None = '')[source]

Bases: BaseSide

DolphinScheduler Queue object.

classmethod create_if_not_exists(user='userPythonGateway')

Create Base if not exists.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}
class pydolphinscheduler.models.Tenant(name: str = 'tenant_pydolphin', queue: str = 'queuePythonGateway', description: str | None = None)[source]

Bases: BaseSide

DolphinScheduler Tenant object.

create_if_not_exists(queue_name: str, user='userPythonGateway') None[source]

Create Tenant if not exists.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}
class pydolphinscheduler.models.User(name: str, password: str | None = 'userPythonGateway', email: str | None = 'userPythonGateway@dolphinscheduler.com', phone: str | None = '11111111111', tenant: str | None = 'tenant_pydolphin', queue: str | None = 'queuePythonGateway', status: int | None = 1)[source]

Bases: BaseSide

DolphinScheduler User object.

create_if_not_exists(**kwargs)[source]

Create User if not exists.

create_tenant_if_not_exists() None[source]

Create tenant object.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'email', 'name', 'password', 'phone', 'queue', 'status', 'tenant'}
class pydolphinscheduler.models.WorkerGroup(name: str, address: str, description: str | None = None)[source]

Bases: BaseSide

DolphinScheduler Worker Group object.

classmethod create_if_not_exists(user='userPythonGateway')

Create Base if not exists.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {}
_KEY_ATTR: set = {'description', 'name'}

Tasks

Init pydolphinscheduler.tasks package.

class pydolphinscheduler.tasks.Condition(name: str, condition: ConditionOperator, success_task: Task, failed_task: Task, *args, **kwargs)[source]

Bases: Task

Task condition object, declare behavior for condition task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_dep() None[source]

Set upstream according to parameter condition.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get condition result define for java gateway.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for Condition task.

Condition task have some specials attribute dependence, and in most of the task this attribute is None and use empty dict {} as default value. We do not use class attribute _task_custom_attr due to avoid attribute cover.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.CustomDataX(name: str, json: str, xms: int | None = 1, xmx: int | None = 1, *args, **kwargs)[source]

Bases: Task

Task CustomDatax object, declare behavior for custom DataX task to dolphinscheduler.

You provider json template for DataX, it can synchronize data according to the template you provided.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

CUSTOM_CONFIG = 1
DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'custom_config', 'json', 'xms', 'xmx'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.DVCDownload(name: str, repository: str, data_path_in_dvc_repository: str, data_path_in_worker: str, version: str, *args, **kwargs)[source]

Bases: BaseDVC

Task DVC Download object, declare behavior for DVC Download task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_dvc_attr = {'dvc_data_location', 'dvc_load_save_data_path', 'dvc_version'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'dvc_repository', 'dvc_task_type'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

dvc_task_type = 'Download'
property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.DVCInit(name: str, repository: str, store_url: str, *args, **kwargs)[source]

Bases: BaseDVC

Task DVC Init object, declare behavior for DVC Init task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_dvc_attr = {'dvc_store_url'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'dvc_repository', 'dvc_task_type'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

dvc_task_type = 'Init DVC'
property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.DVCUpload(name: str, repository: str, data_path_in_worker: str, data_path_in_dvc_repository: str, version: str, message: str, *args, **kwargs)[source]

Bases: BaseDVC

Task DVC Upload object, declare behavior for DVC Upload task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_dvc_attr = {'dvc_data_location', 'dvc_load_save_data_path', 'dvc_message', 'dvc_version'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'dvc_repository', 'dvc_task_type'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

dvc_task_type = 'Upload'
property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.DataX(name: str, datasource_name: str, datatarget_name: str, sql: str, target_table: str, job_speed_byte: int | None = 0, job_speed_record: int | None = 1000, pre_statements: List[str] | None = None, post_statements: List[str] | None = None, xms: int | None = 1, xmx: int | None = 1, *args, **kwargs)[source]

Bases: Task

Task DataX object, declare behavior for DataX task to dolphinscheduler.

It should run database datax job in multiply sql link engine, such as: - MySQL - Oracle - Postgresql - SQLServer You provider datasource_name and datatarget_name contain connection information, it decisions which database type and database instance would synchronous data.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

CUSTOM_CONFIG = 0
DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'custom_config', 'job_speed_byte', 'job_speed_record', 'post_statements', 'pre_statements', 'sql', 'target_table', 'xms', 'xmx'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for datax task.

datax task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Dependent(name: str, dependence: DependentOperator, *args, **kwargs)[source]

Bases: Task

Task dependent object, declare behavior for dependent task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for dependent task.

Dependent task have some specials attribute dependence, and in most of the task this attribute is None and use empty dict {} as default value. We do not use class attribute _task_custom_attr due to avoid attribute cover.

property user_name: str | None

Return user name of process definition.

Bases: Engine

Task flink object, declare behavior for flink task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_jar_id() int

Get jar id from java gateway, a wrapper for get_resource_info().

get_resource_info(program_type, main_package)

Get resource info from java gateway, contains resource id, name.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'app_name', 'deploy_mode', 'flink_version', 'job_manager_memory', 'main_args', 'others', 'parallelism', 'slot', 'task_manager', 'task_manager_memory'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for engine children task.

children task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Http(name: str, url: str, http_method: str | None = 'GET', http_params: str | None = None, http_check_condition: str | None = 'STATUS_CODE_DEFAULT', condition: str | None = None, connect_timeout: int | None = 60000, socket_timeout: int | None = 60000, *args, **kwargs)[source]

Bases: Task

Task HTTP object, declare behavior for HTTP task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'condition', 'connect_timeout', 'http_check_condition', 'http_method', 'http_params', 'socket_timeout', 'url'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.MLFlowProjectsAutoML(name: str, data_path: str, automl_tool: str | None = 'flaml', mlflow_tracking_uri: str | None = 'http://127.0.0.1:5000', experiment_name: str | None = '', model_name: str | None = '', parameters: str | None = '', *args, **kwargs)[source]

Bases: BaseMLflow

Task MLflow projects object, declare behavior for AutoML task to dolphinscheduler.

Parameters:
  • name – task name

  • data_path – data path of MLflow Project, Support git address and directory on worker.

  • automl_tool – The AutoML tool used, currently supports autosklearn and flaml.

  • mlflow_tracking_uri – MLflow tracking server uri, default is http://127.0.0.1:5000

  • experiment_name – MLflow experiment name, default is empty

  • model_name – MLflow model name, default is empty

  • parameters – MLflow project parameters, default is empty

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_mlflow_attr = {'automl_tool', 'data_path', 'experiment_name', 'mlflow_job_type', 'model_name', 'params', 'register_model'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'mlflow_task_type', 'mlflow_tracking_uri'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

mlflow_job_type = 'AutoML'
mlflow_task_type = 'MLflow Projects'
property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.MLFlowProjectsBasicAlgorithm(name: str, data_path: str, algorithm: str | None = 'lightgbm', mlflow_tracking_uri: str | None = 'http://127.0.0.1:5000', experiment_name: str | None = '', model_name: str | None = '', parameters: str | None = '', search_params: str | None = '', *args, **kwargs)[source]

Bases: BaseMLflow

Task MLflow projects object, declare behavior for BasicAlgorithm task to dolphinscheduler.

Parameters:
  • name – task name

  • data_path – data path of MLflow Project, Support git address and directory on worker.

  • algorithm – The selected algorithm currently supports LR, SVM, LightGBM and XGboost based on scikit-learn form.

  • mlflow_tracking_uri – MLflow tracking server uri, default is http://127.0.0.1:5000

  • experiment_name – MLflow experiment name, default is empty

  • model_name – MLflow model name, default is empty

  • parameters – MLflow project parameters, default is empty

  • search_params – Whether to search the parameters, default is empty

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_mlflow_attr = {'algorithm', 'data_path', 'experiment_name', 'mlflow_job_type', 'model_name', 'params', 'register_model', 'search_params'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'mlflow_task_type', 'mlflow_tracking_uri'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

mlflow_job_type = 'BasicAlgorithm'
mlflow_task_type = 'MLflow Projects'
property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.MLFlowProjectsCustom(name: str, repository: str, mlflow_tracking_uri: str | None = 'http://127.0.0.1:5000', experiment_name: str | None = '', parameters: str | None = '', version: str | None = 'master', *args, **kwargs)[source]

Bases: BaseMLflow

Task MLflow projects object, declare behavior for MLflow Custom projects task to dolphinscheduler.

Parameters:
  • name – task name

  • repository – Repository url of MLflow Project, Support git address and directory on worker. If it’s in a subdirectory, We add # to support this (same as mlflow run) , for example https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native.

  • mlflow_tracking_uri – MLflow tracking server uri, default is http://127.0.0.1:5000

  • experiment_name – MLflow experiment name, default is empty

  • parameters – MLflow project parameters, default is empty

  • version – MLflow project version, default is master

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_mlflow_attr = {'experiment_name', 'mlflow_job_type', 'mlflow_project_repository', 'mlflow_project_version', 'params'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'mlflow_task_type', 'mlflow_tracking_uri'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

mlflow_job_type = 'CustomProject'
mlflow_task_type = 'MLflow Projects'
property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.MLflowModels(name: str, model_uri: str, mlflow_tracking_uri: str | None = 'http://127.0.0.1:5000', deploy_mode: str | None = 'DOCKER', port: int | None = 7000, *args, **kwargs)[source]

Bases: BaseMLflow

Task MLflow models object, declare behavior for MLflow models task to dolphinscheduler.

Deploy machine learning models in diverse serving environments.

Parameters:
  • name – task name

  • model_uri – Model-URI of MLflow , support models:/<model_name>/suffix format and runs:/ format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_tracking_uri – MLflow tracking server uri, default is http://127.0.0.1:5000

  • deploy_mode – MLflow deploy mode, support MLFLOW, DOCKER, default is DOCKER

  • port – deploy port, default is 7000

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_child_task_mlflow_attr = {'deploy_model_key', 'deploy_port', 'deploy_type'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'mlflow_task_type', 'mlflow_tracking_uri'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

mlflow_task_type = 'MLflow Models'
property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Return task params.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.MR(name: str, main_class: str, main_package: str, program_type: ProgramType | None = 'SCALA', app_name: str | None = None, main_args: str | None = None, others: str | None = None, *args, **kwargs)[source]

Bases: Engine

Task mr object, declare behavior for mr task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_jar_id() int

Get jar id from java gateway, a wrapper for get_resource_info().

get_resource_info(program_type, main_package)

Get resource info from java gateway, contains resource id, name.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'app_name', 'main_args', 'others'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for engine children task.

children task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.OpenMLDB(name, zookeeper, zookeeper_path, execute_mode, sql, *args, **kwargs)[source]

Bases: Task

Task OpenMLDB object, declare behavior for OpenMLDB task to dolphinscheduler.

Parameters:
  • name – task name

  • zookeeper – OpenMLDB cluster zookeeper address, e.g. 127.0.0.1:2181.

  • zookeeper_path – OpenMLDB cluster zookeeper path, e.g. /openmldb.

  • execute_mode – Determine the init mode, offline or online. You can switch it in sql statementself.

  • sql – SQL statement.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'execute_mode', 'sql', 'zk', 'zk_path'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Procedure(name: str, datasource_name: str, method: str, *args, **kwargs)[source]

Bases: Task

Task Procedure object, declare behavior for Procedure task to dolphinscheduler.

It should run database procedure job in multiply sql lik engine, such as: - ClickHouse - DB2 - HIVE - MySQL - Oracle - Postgresql - Presto - SQLServer You provider datasource_name contain connection information, it decisions which database type and database instance would run this sql.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'method'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for produce task.

produce task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Python(name: str, definition: str | LambdaType, *args, **kwargs)[source]

Bases: Task

Task Python object, declare behavior for Python task to dolphinscheduler.

Python task support two types of parameters for :param:code, and here is an example:

Using str type of :param:code

python_task = Python(name="str_type", code="print('Hello Python task.')")

Or using Python callable type of :param:code

def foo():
    print("Hello Python task.")

python_task = Python(name="str_type", code=foo)
Parameters:
  • name – The name for Python task. It define the task name.

  • definition – String format of Python script you want to execute or Python callable you want to execute.

_build_exe_str() str[source]

Build executable string from given definition.

Attribute self.definition almost is a function, we need to call this function after parsing it to string. The easier way to call a function is using syntax func() and we use it to call it too.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'raw_script'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property raw_script: str

Get python task define attribute raw_script.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Pytorch(name: str, script: str, script_params: str = '', project_path: str | None = '.', is_create_environment: bool | None = False, python_command: str | None = '${PYTHON_HOME}', python_env_tool: str | None = 'conda', requirements: str | None = 'requirements.txt', conda_python_version: str | None = '3.7', *args, **kwargs)[source]

Bases: Task

Task Pytorch object, declare behavior for Pytorch task to dolphinscheduler.

See also: DolphinScheduler Pytorch Task Plugin

Parameters:
  • name – task name

  • script – Entry to the Python script file that you want to run.

  • script_params – Input parameters at run time.

  • project_path – The path to the project. Default “.” .

  • is_create_environment – is create environment. Default False.

  • python_command – The path to the python command. Default “${PYTHON_HOME}”.

  • python_env_tool – The python environment tool. Default “conda”.

  • requirements – The path to the requirements.txt file. Default “requirements.txt”.

  • conda_python_version – The python version of conda environment. Default “3.7”.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'conda_python_version', 'is_create_environment', 'other_params', 'python_command', 'python_env_tool', 'python_path', 'requirements', 'script', 'script_params'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property other_params

Return other params.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.SageMaker(name: str, sagemaker_request_json: str, *args, **kwargs)[source]

Bases: Task

Task SageMaker object, declare behavior for SageMaker task to dolphinscheduler.

Parameters:
  • name – A unique, meaningful string for the SageMaker task.

  • sagemaker_request_json – Request parameters of StartPipelineExecution, see also AWS API

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'sagemaker_request_json'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Shell(name: str, command: str, *args, **kwargs)[source]

Bases: Task

Task shell object, declare behavior for shell task to dolphinscheduler.

Parameters:
  • name – A unique, meaningful string for the shell task.

  • command

    One or more command want to run in this task.

    It could be simply command:

    Shell(name=..., command="echo task shell")
    

    or maybe same commands trying to do complex task:

    command = '''echo task shell step 1;
    echo task shell step 2;
    echo task shell step 3
    '''
    
    Shell(name=..., command=command)
    

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'raw_script'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Spark(name: str, main_class: str, main_package: str, program_type: ProgramType | None = 'SCALA', deploy_mode: DeployMode | None = 'cluster', spark_version: SparkVersion | None = 'SPARK2', app_name: str | None = None, driver_cores: int | None = 1, driver_memory: str | None = '512M', num_executors: int | None = 2, executor_memory: str | None = '2G', executor_cores: int | None = 2, main_args: str | None = None, others: str | None = None, *args, **kwargs)[source]

Bases: Engine

Task spark object, declare behavior for spark task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_jar_id() int

Get jar id from java gateway, a wrapper for get_resource_info().

get_resource_info(program_type, main_package)

Get resource info from java gateway, contains resource id, name.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'app_name', 'deploy_mode', 'driver_cores', 'driver_memory', 'executor_cores', 'executor_memory', 'main_args', 'num_executors', 'others', 'spark_version'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for engine children task.

children task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Sql(name: str, datasource_name: str, sql: str, sql_type: str | None = None, pre_statements: str | None = None, post_statements: str | None = None, display_rows: int | None = 10, *args, **kwargs)[source]

Bases: Task

Task SQL object, declare behavior for SQL task to dolphinscheduler.

It should run sql job in multiply sql lik engine, such as: - ClickHouse - DB2 - HIVE - MySQL - Oracle - Postgresql - Presto - SQLServer You provider datasource_name contain connection information, it decisions which database type and database instance would run this sql.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'display_rows', 'post_statements', 'pre_statements', 'sql', 'sql_type'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property sql_type: str

Judgement sql type, it will return the SQL type for type SELECT or NOT_SELECT.

If param_sql_type dot not specific, will use regexp to check which type of the SQL is. But if param_sql_type is specific will use the parameter overwrites the regexp way

property task_params: Dict

Override Task.task_params for sql task.

sql task have some specials attribute for task_params, and is odd if we directly set as python property, so we Override Task.task_params here.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.SubProcess(name: str, process_definition_name: str, *args, **kwargs)[source]

Bases: Task

Task SubProcess object, declare behavior for SubProcess task to dolphinscheduler.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

get_process_definition_info(process_definition_name: str) Dict[source]

Get process definition info from java gateway, contains process definition id, name, code.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {'process_definition_code'}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property process_definition_code: str

Get process definition code, a wrapper for get_process_definition_info().

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict | None

Get task parameter object.

Will get result to combine _task_custom_attr and custom_attr.

property user_name: str | None

Return user name of process definition.

class pydolphinscheduler.tasks.Switch(name: str, condition: SwitchCondition, *args, **kwargs)[source]

Bases: Task

Task switch object, declare behavior for switch task to dolphinscheduler.

Param of process definition or at least one local param of task must be set if task switch in this workflow.

_get_attr() Set[str]

Get final task task_params attribute.

Base on _task_default_attr, append attribute from _task_custom_attr and subtract attribute from _task_ignore_attr.

_set_dep() None[source]

Set downstream according to parameter condition.

_set_deps(tasks: Task | Sequence[Task], upstream: bool = True) None

Set parameter tasks dependent to current task.

it is a wrapper for set_upstream() and set_downstream().

gen_code_and_version() Tuple

Generate task code and version from java gateway.

If task name do not exists in process definition before, if will generate new code and version id equal to 0 by java gateway, otherwise if will return the exists code and version.

get_define(camel_attr: bool = True) Dict

Get object definition attribute communicate to Java gateway server.

use attribute self._DEFINE_ATTR to determine which attributes should including when object tries to communicate with Java gateway server.

get_define_custom(camel_attr: bool = True, custom_attr: set = None) Dict

Get object definition attribute by given attr set.

set_downstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as downstream to current task.

set_upstream(tasks: Task | Sequence[Task]) None

Set parameter tasks as upstream to current task.

DEFAULT_CONDITION_RESULT = {'failedNode': [''], 'successNode': ['']}
_DEFAULT_ATTR: Dict = {}
_DEFINE_ATTR: set = {'code', 'delay_time', 'description', 'environment_code', 'fail_retry_interval', 'fail_retry_times', 'flag', 'name', 'task_params', 'task_priority', 'task_type', 'timeout', 'timeout_flag', 'timeout_notify_strategy', 'version', 'worker_group'}
_KEY_ATTR: set = {'description', 'name'}
_downstream_task_codes: Set[int]
_task_custom_attr: set = {}
_task_default_attr = {'condition_result', 'dependence', 'local_params', 'resource_list', 'wait_start_timeout'}
_task_ignore_attr: set = {'condition_result', 'dependence'}
_task_relation: Set[TaskRelation]
_upstream_task_codes: Set[int]
property condition_result: Dict

Get attribute condition_result.

property environment_code: str

Convert environment name to code.

property process_definition: ProcessDefinition | None

Get attribute process_definition.

property resource_list: List

Get task define attribute resource_list.

property task_params: Dict

Override Task.task_params for switch task.

switch task have some specials attribute switch, and in most of the task this attribute is None and use empty dict {} as default value. We do not use class attribute _task_custom_attr due to avoid attribute cover.

property user_name: str | None

Return user name of process definition.

Constants

Constants for pydolphinscheduler.

class pydolphinscheduler.constants.DefaultTaskCodeNum[source]

Bases: str

Constants and default value for default task code number.

DEFAULT = 1
class pydolphinscheduler.constants.Delimiter[source]

Bases: str

Constants for delimiter.

BAR = '-'
COLON = ':'
DASH = '/'
DIRECTION = '->'
UNDERSCORE = '_'
class pydolphinscheduler.constants.JavaGatewayDefault[source]

Bases: str

Constants and default value for java gateway.

RESULT_DATA = 'data'
RESULT_MESSAGE_KEYWORD = 'msg'
RESULT_MESSAGE_SUCCESS = 'success'
RESULT_STATUS_KEYWORD = 'status'
RESULT_STATUS_SUCCESS = 'SUCCESS'
class pydolphinscheduler.constants.ResourceKey[source]

Bases: str

Constants for key of resource.

ID = 'id'
class pydolphinscheduler.constants.TaskFlag[source]

Bases: str

Constants for task flag.

NO = 'NO'
YES = 'YES'
class pydolphinscheduler.constants.TaskPriority[source]

Bases: str

Constants for task priority.

HIGH = 'HIGH'
HIGHEST = 'HIGHEST'
LOW = 'LOW'
LOWEST = 'LOWEST'
MEDIUM = 'MEDIUM'
class pydolphinscheduler.constants.TaskTimeoutFlag[source]

Bases: str

Constants for task timeout flag.

CLOSE = 'CLOSE'
class pydolphinscheduler.constants.TaskType[source]

Bases: str

Constants for task type, it will also show you which kind we support up to now.

CONDITIONS = 'CONDITIONS'
DATAX = 'DATAX'
DEPENDENT = 'DEPENDENT'
DVC = 'DVC'
HTTP = 'HTTP'
MLFLOW = 'MLFLOW'
MR = 'MR'
OPENMLDB = 'OPENMLDB'
PROCEDURE = 'PROCEDURE'
PYTHON = 'PYTHON'
PYTORCH = 'PYTORCH'
SAGEMAKER = 'SAGEMAKER'
SHELL = 'SHELL'
SPARK = 'SPARK'
SQL = 'SQL'
SUB_PROCESS = 'SUB_PROCESS'
SWITCH = 'SWITCH'
class pydolphinscheduler.constants.Time[source]

Bases: str

Constants for date.

FMT_DASH_DATE = '%Y/%m/%d'
FMT_NO_COLON_TIME = '%H%M%S'
FMT_SHORT_DATE = '%Y%m%d'
FMT_STD_DATE = '%Y-%m-%d'
FMT_STD_TIME = '%H:%M:%S'
LEN_SHORT_DATE = 8
LEN_STD_DATE = 10

Exceptions

Exceptions for pydolphinscheduler.

exception pydolphinscheduler.exceptions.PyDSBaseException[source]

Bases: Exception

Base exception for pydolphinscheduler.

exception pydolphinscheduler.exceptions.PyDSConfException[source]

Bases: PyDSBaseException

Exception for pydolphinscheduler configuration error.

exception pydolphinscheduler.exceptions.PyDSJavaGatewayException[source]

Bases: PyDSBaseException

Exception for pydolphinscheduler Java gateway error.

exception pydolphinscheduler.exceptions.PyDSParamException[source]

Bases: PyDSBaseException

Exception for pydolphinscheduler parameter verify error.

exception pydolphinscheduler.exceptions.PyDSProcessDefinitionNotAssignException[source]

Bases: PyDSBaseException

Exception for pydolphinscheduler process definition not assign error.

exception pydolphinscheduler.exceptions.PyDSTaskNoFoundException[source]

Bases: PyDSBaseException

Exception for pydolphinscheduler workflow task no found error.