SageMaker Node

Overview

Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly build and train machine learning models, and then deploy them into a production-ready hosted environment.

Amazon SageMaker Model Building Pipelines is a tool for building machine learning pipelines that take advantage of direct SageMaker integration.

For users using big data and machine learning, SageMaker task plugin help users connect big data workflows with SageMaker usage scenarios.

DolphinScheduler SageMaker task plugin features are as follows:

  • Start a SageMaker pipeline execution. Continuously get the execution status until the pipeline completes execution.

Create Task

  • Click Project -> Management-Project -> Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
  • Drag from the toolbar task node to canvas.

Task Example

Here are some specific parameters for the SagaMaker plugin:

  • SagemakerRequestJson: Request parameters of StartPipelineExecution,see also AWS API

The task plugin are shown as follows:

sagemaker_pipeline

Environment to prepare

Some AWS configuration is required, modify a field in file common.properties

# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=<YOUR AWS ACCESS KEY>
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=<YOUR AWS SECRET KEY>
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=<AWS REGION>