Map Reduce

A Map Reduce task type’s example and dive into information of PyDolphinScheduler.

Example

"""A example workflow for task mr."""

from pydolphinscheduler.core.engine import ProgramType
from pydolphinscheduler.core.process_definition import ProcessDefinition
from pydolphinscheduler.tasks.map_reduce import MR

with ProcessDefinition(name="task_map_reduce_example", tenant="tenant_exists") as pd:
    task = MR(
        name="task_mr",
        main_class="wordcount",
        main_package="hadoop-mapreduce-examples-3.3.1.jar",
        program_type=ProgramType.JAVA,
        main_args="/dolphinscheduler/tenant_exists/resources/file.txt /output/ds",
    )
    pd.run()

Dive Into

Task MR.

class pydolphinscheduler.tasks.map_reduce.MR(name: str, main_class: str, main_package: str, program_type: ProgramType | None = 'SCALA', app_name: str | None = None, main_args: str | None = None, others: str | None = None, *args, **kwargs)[source]

Bases: Engine

Task mr object, declare behavior for mr task to dolphinscheduler.

_downstream_task_codes: Set[int]

_task_custom_attr: set = {'app_name', 'main_args', 'others'}

_task_relation: Set[TaskRelation]

_upstream_task_codes: Set[int]

YAML file example

# Define the workflow
workflow:
  name: "MapReduce"

# Define the tasks under the workflow
tasks:
  - name: task
    task_type: MR
    main_class: wordcount
    main_package: test_java.jar
    program_type: SCALA
    main_args: /dolphinscheduler/tenant_exists/resources/file.txt /output/ds