Please refer to Quick Start
The home page contains task status statistics, process status statistics, and workflow definition statistics for all projects of the user.
Click "Project Management" to enter the project management page, click the "Create Project" button, enter the project name, project description, and click "Submit" to create a new project.
Click the project name link on the project management page to enter the project home page, as shown in the figure below, the project home page contains the task status statistics, process status statistics, and workflow definition statistics of the project.
Task status statistics: within the specified time range, count the number of task instances as successful submission, running, ready to pause, pause, ready to stop, stop, failure, success, fault tolerance, kill, and waiting threads
Process status statistics: within the specified time range, count the number of the status of the workflow instance as submission success, running, ready to pause, pause, ready to stop, stop, failure, success, fault tolerance, kill, and waiting threads
Workflow definition statistics: Count the workflow definitions created by this user and the workflow definitions granted to this user by the administrator
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, and click the "Create Workflow" button to enter the workflow DAG edit page, as shown in the following figure:
Drag in the toolbar Add a Shell task to the drawing board, as shown in the figure below:
Add parameter settings for this shell task:
test.sh
, and the command to call the resource in the script is sh test.sh
;Increase the order of task execution: Click the icon in the upper right corner to connect the task; as shown in the figure below, task 2 and task 3 are executed in parallel, When task 1 finished execute, tasks 2 and 3 will be executed simultaneously.
Delete dependencies: Click the "arrow" icon in the upper right corner , select the connection line, and click the "Delete" icon in the upper right corner
, delete dependencies between tasks.
For other types of tasks, please refer to Task Node Type and Parameter Settings.
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, as shown below:
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, as shown in the figure below, click the "Go Online" button ,Go online workflow.
Click the "Run" button to pop up the startup parameter setting pop-up box, as shown in the figure below, set the startup parameters, click the "Run" button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute the strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks being executed, and terminate the entire process.
* Notification strategy: When the process is over, the process execution information notification email is sent according to the process status, including any status is not sent, successful sent, failed sent, successful or failed sent.
* Process priority: The priority of process operation, divided into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high-level processes will be executed first in the execution queue, and processes with the same priority will be executed in a first-in first-out order.
* Worker group: The process can only be executed in the specified worker machine group. The default is Default, which can be executed on any worker.
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process information or email will be sent to all members in the notification group.
* Recipient: Select notification policy||Timeout alarm||When fault tolerance occurs, process information or alarm email will be sent to the recipient list.
* Cc: Select the notification strategy||Timeout alarm||When fault tolerance occurs, the process information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: Two modes including serial complement and parallel complement. Serial complement: Within the specified time range, the complements are executed from the start date to the end date and N process instances are generated in turn; parallel complement: within the specified time range, multiple days are complemented at the same time to generate N process instances.
Serial mode: The complement is executed sequentially from May 1 to May 10, and ten process instances are generated on the process instance page;
Parallel mode: The tasks from May 1 to may 10 are executed simultaneously, and 10 process instances are generated on the process instance page.
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, click the "Import Workflow" button to import the local workflow file, the workflow definition list displays the imported workflow, and the status is offline.
Click Project Management -> Workflow -> Workflow Instance to enter the Workflow Instance page, as shown in the figure below:
kill
worker process, and then execute kill -9
operation
Click Project Management -> Workflow -> Task Instance to enter the task instance page, as shown in the figure below, click the name of the workflow instance, you can jump to the workflow instance DAG chart to view the task status.
View log:Click the "view log" button in the operation column to view the log of task execution.
conf/common/common.properties
# Users who have permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# data base dir, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/escheduler" is recommended
data.store2hdfs.basepath=/dolphinscheduler
# resource upload startup type : HDFS,S3,NONE
res.upload.startup.type=HDFS
# whether kerberos starts
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# loginUserFromKeytab user
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# loginUserFromKeytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
conf/common/hadoop.properties
# ha or single namenode,If namenode ha needs to copy core-site.xml and hdfs-site.xml
# to the conf directory,support s3,for example : s3a://dolphinscheduler
fs.defaultFS=hdfs://mycluster:8020
#resourcemanager ha note this need ips , this empty if single
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# If it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine
yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
It is the management of various resource files, including creating basic txt/log/sh/conf/py/java and other files, uploading jar packages and other types of files, and can do edit, rename, download, delete and other operations.
The file format supports the following types: txt, log, sh, conf, cfg, py, java, sql, xml, hql, properties
Upload file: Click the "Upload File" button to upload, drag the file to the upload area, the file name will be automatically completed with the uploaded file name
For the file types that can be viewed, click the file name to view the file details
Click the "Download" button in the file list to download the file or click the "Download" button in the upper right corner of the file details to download the file
File list -> Click the "Delete" button to delete the specified file
The resource management and file management functions are similar. The difference is that the resource management is the uploaded UDF function, and the file management uploads the user program, script and configuration file. Operation function: rename, download, delete.
Same as uploading files.
Click "Create UDF Function", enter the udf function parameters, select the udf resource, and click "Submit" to create the udf function.
Currently only supports temporary UDF functions of HIVE
Data source center supports MySQL, POSTGRESQL, HIVE/IMPALA, SPARK, CLICKHOUSE, ORACLE, SQLSERVER and other data sources
Click "Data Source Center -> Create Data Source" to create different types of data sources according to requirements.
Data source: select MYSQL
Data source name: enter the name of the data source
Description: Enter a description of the data source
IP hostname: enter the IP to connect to MySQL
Port: Enter the port to connect to MySQL
Username: Set the username for connecting to MySQL
Password: Set the password for connecting to MySQL
Database name: Enter the name of the database connected to MySQL
Jdbc connection parameters: parameter settings for MySQL connection, filled in in JSON form
Click "Test Connection" to test whether the data source can be successfully connected.
1.Use HiveServer2 to connect
Data source: select HIVE
Data source name: enter the name of the data source
Description: Enter a description of the data source
IP/Host Name: Enter the IP connected to HIVE
Port: Enter the port connected to HIVE
Username: Set the username for connecting to HIVE
Password: Set the password for connecting to HIVE
Database name: Enter the name of the database connected to HIVE
Jdbc connection parameters: parameter settings for HIVE connection, filled in in JSON form
2.Use HiveServer2 HA Zookeeper to connect
Note: If you enable kerberos, you need to fill in Principal
* Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, resources, data sources, projects, etc. Authorization
* Administrator login, default user name and password: admin/dolphinscheduler123
Users are divided into administrator users and normal users
The administrator enters the Security Center -> User Management page and clicks the "Create User" button to create a user.
Edit user information
Modify user password
The administrator enters the Security Center -> Alarm Group Management page and clicks the "Create Alarm Group" button to create an alarm group.
Since the back-end interface has login check, token management provides a way to perform various operations on the system by calling the interface.
The administrator enters the Security Center -> Token Management page, clicks the "Create Token" button, selects the expiration time and user, clicks the "Generate Token" button, and clicks the "Submit" button, then the selected user's token is created successfully.
Token call example
/**
* test token
*/
public void doPOSTParam()throws Exception{
// create HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault();
// create http post request
HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create");
httpPost.setHeader("token", "123");
// set parameters
List<NameValuePair> parameters = new ArrayList<NameValuePair>();
parameters.add(new BasicNameValuePair("projectName", "qzw"));
parameters.add(new BasicNameValuePair("desc", "qzw"));
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
httpPost.setEntity(formEntity);
CloseableHttpResponse response = null;
try {
// execute
response = httpclient.execute(httpPost);
// response status code 200
if (response.getStatusLine().getStatusCode() == 200) {
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
} finally {
if (response != null) {
response.close();
}
httpclient.close();
}
}
* Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
* The administrator can authorize the projects, resources, data sources and UDF functions not created by ordinary users. Because the authorization methods for projects, resources, data sources and UDF functions are the same, we take project authorization as an example.
* Note: For projects created by users themselves, the user has all permissions. The project list and the selected project list will not be displayed.
Each worker node will belong to its own worker group, and the default group is "default".
When the task is executed, the task can be assigned to the specified worker group, and the task will be executed by the worker node in the group.
Add/Update worker group
Example:
worker.groups=default,test
Shell node, when the worker is executed, a temporary shell script is generated, and the linux user with the same name as the tenant executes the script.
Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
Drag from the toolbar to the drawing board, as shown in the figure below:
Node name: The node name in a workflow definition is unique.
Run flag: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
Descriptive information: describe the function of the node.
Task priority: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
Worker grouping: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
Number of failed retry attempts: The number of times the task failed to be resubmitted. It supports drop-down and hand-filling.
Failed retry interval: The time interval for resubmitting the task after a failed task. It supports drop-down and hand-filling.
Timeout alarm: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
Script: SHELL program developed by users.
Resource: Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management.
User-defined parameters: It is a user-defined parameter that is part of SHELL, which will replace the content with ${variable} in the script.
Drag the
task node in the toolbar to the drawing board, as shown in the following figure:
Drag the
task node in the toolbar to the drawing board, as shown in the following figure:
The dependent node provides a logical judgment function, such as checking whether the B process was successful yesterday, or whether the C process was executed successfully.
For example, process A is a weekly report task, processes B and C are daily tasks, and task A requires tasks B and C to be successfully executed every day of the last week, as shown in the figure:
If the weekly report A also needs to be executed successfully last Tuesday:
Drag in the toolbar
The task node to the drawing board, as shown in the following figure:
spark-submit
method to submit tasksDrag in the toolbar
The task node to the drawing board, as shown in the following figure:
Note: JAVA and Scala are only used for identification, there is no difference, if it is Spark developed by Python, there is no main function class, and the others are the same
hadoop jar
method to submit tasksDrag the
task node in the toolbar to the drawing board, as shown in the following figure:
python **
to submit tasks.Drag in the toolbar
The task node to the drawing board, as shown in the following figure:
Note: JAVA and Scala are only used for identification, there is no difference, if it is Flink developed by Python, there is no class of the main function, the others are the same
Drag in the toolbarTask node into the drawing board
Custom template: When you turn on the custom template switch, you can customize the content of the json configuration file of the datax node (applicable when the control configuration does not meet the requirements)
Data source: select the data source to extract the data
sql statement: the sql statement used to extract data from the target database, the sql query column name is automatically parsed when the node is executed, and mapped to the target table synchronization column name. When the source table and target table column names are inconsistent, they can be converted by column alias (as)
Target library: select the target library for data synchronization
Target table: the name of the target table for data synchronization
Pre-sql: Pre-sql is executed before the sql statement (executed by the target library).
Post-sql: Post-sql is executed after the sql statement (executed by the target library).
json: json configuration file for datax synchronization
Custom parameters: SQL task type, and stored procedure is a custom parameter order to set values for the method. The custom parameter type and data type are the same as the stored procedure task type. The difference is that the SQL task type custom parameter will replace the ${variable} in the SQL statement.
variable | meaning |
---|---|
${system.biz.date} | The day before the scheduled time of the daily scheduling instance, the format is yyyyMMdd, when the data is supplemented, the date is +1 |
${system.biz.curdate} | The timing time of the daily scheduling instance, the format is yyyyMMdd, when the data is supplemented, the date is +1 |
${system.datetime} | The timing time of the daily scheduling instance, the format is yyyyMMddHHmmss, when the data is supplemented, the date is +1 |
Support custom variable names in the code, declaration method: ${variable name}. It can refer to "system parameters" or specify "constants".
We define this benchmark variable as $[...] format, $[yyyyMMddHHmmss] can be decomposed and combined arbitrarily, such as: $[yyyyMMdd], $[HHmmss], $[yyyy-MM-dd], etc.
The following format can also be used:
* Next N years:$[add_months(yyyyMMdd,12*N)]
* N years before:$[add_months(yyyyMMdd,-12*N)]
* Next N months:$[add_months(yyyyMMdd,N)]
* N months before:$[add_months(yyyyMMdd,-N)]
* Next N weeks:$[yyyyMMdd+7*N]
* First N weeks:$[yyyyMMdd-7*N]
* Next N days:$[yyyyMMdd+N]
* N days before:$[yyyyMMdd-N]
* Next N hours:$[HHmmss+N/24]
* First N hours:$[HHmmss-N/24]
* Next N minutes:$[HHmmss+N/24/60]
* First N minutes:$[HHmmss-N/24/60]