Map Reduce
A Map Reduce task type’s example and dive into information of PyDolphinScheduler.
Example
"""A example workflow for task mr."""
from pydolphinscheduler.core.engine import ProgramType
from pydolphinscheduler.core.process_definition import ProcessDefinition
from pydolphinscheduler.tasks.map_reduce import MR
with ProcessDefinition(name="task_map_reduce_example", tenant="tenant_exists") as pd:
task = MR(
name="task_mr",
main_class="wordcount",
main_package="hadoop-mapreduce-examples-3.3.1.jar",
program_type=ProgramType.JAVA,
main_args="/dolphinscheduler/tenant_exists/resources/file.txt /output/ds",
)
pd.run()
Dive Into
Task MR.
- class pydolphinscheduler.tasks.map_reduce.MR(name: str, main_class: str, main_package: str, program_type: ProgramType | None = 'SCALA', app_name: str | None = None, main_args: str | None = None, others: str | None = None, *args, **kwargs)[source]
Bases:
Engine
Task mr object, declare behavior for mr task to dolphinscheduler.
- _downstream_task_codes: Set[int]
- _task_custom_attr: set = {'app_name', 'main_args', 'others'}
- _task_relation: Set[TaskRelation]
- _upstream_task_codes: Set[int]
YAML file example
# Define the workflow
workflow:
name: "MapReduce"
# Define the tasks under the workflow
tasks:
- name: task
task_type: MR
main_class: wordcount
main_package: test_java.jar
program_type: SCALA
main_args: /dolphinscheduler/tenant_exists/resources/file.txt /output/ds