Tips: DolphinScheduler itself does not rely on Hadoop, Hive, Spark, only use their clients for the corresponding task of running.
# Create the deployment directory. Do not choose a deployment directory with a high-privilege directory such as / root or / home.
mkdir -p /opt/dolphinscheduler;
cd /opt/dolphinscheduler;
# uncompress
tar -zxvf apache-dolphinscheduler-1.3.9-bin.tar.gz -C /opt/dolphinscheduler;
mv apache-dolphinscheduler-1.3.9-bin dolphinscheduler-bin
# To create a user, you need to log in as root and set the deployment user name. Please modify it yourself. The following uses dolphinscheduler as an example.
useradd dolphinscheduler;
# Set the user password, please modify it yourself. The following takes dolphinscheduler123 as an example.
echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
# Configure sudo passwordless
echo 'dolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
Notes:
- Because the task execution service is based on 'sudo -u {linux-user}' to switch between different Linux users to implement multi-tenant running jobs, the deployment user needs to have sudo permissions and is passwordless. The first-time learners can ignore it if they don't understand.
- If find the "Default requiretty" in the "/etc/sudoers" file, also comment out.
- If you need to use resource upload, you need to assign the user of permission to operate the local file system, HDFS or MinIO.
Use the first machine (hostname is ds1) as the deployment machine, configure the hosts of all machines to be deployed on ds1, and login as root on ds1.
vi /etc/hosts
#add ip hostname
192.168.xxx.xxx ds1
192.168.xxx.xxx ds2
192.168.xxx.xxx ds3
192.168.xxx.xxx ds4
Note: Please delete or comment out the line 127.0.0.1
Sync /etc/hosts on ds1 to all deployment machines
for ip in ds2 ds3; # Please replace ds2 ds3 here with the hostname of machines you want to deploy
do
sudo scp -r /etc/hosts $ip:/etc/ # Need to enter root password during operation
done
Note: can use sshpass -p xxx sudo scp -r /etc/hosts $ip:/etc/
to avoid type password.
Install sshpass in Centos:
Install epel
yum install -y epel-release
yum repolist
After installing epel, you can install sshpass
yum install -y sshpass
On ds1, switch to the deployment user and configure ssh passwordless login
su dolphinscheduler;
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
Note: If configure success, the dolphinscheduler user does not need to enter a password when executing the command ssh localhost
On ds1, configure the deployment user dolphinscheduler ssh to connect to other machines to be deployed.
su dolphinscheduler;
for ip in ds2 ds3; # Please replace ds2 ds3 here with the hostname of the machine you want to deploy.
do
ssh-copy-id $ip # You need to manually enter the password of the dolphinscheduler user during the operation.
done
# can use `sshpass -p xxx ssh-copy-id $ip` to avoid type password.
On ds1, modify the directory permissions so that the deployment user has operation permissions on the dolphinscheduler-bin directory.
sudo chown -R dolphinscheduler:dolphinscheduler dolphinscheduler-bin
mysql -h192.168.xx.xx -P3306 -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> flush privileges;
Create tables and import basic data
vi conf/datasource.properties
#postgre
#spring.datasource.driver-class-name=org.postgresql.Driver
#spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
# mysql
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://xxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true # Replace the correct IP address
spring.datasource.username=xxx # replace the correct {user} value
spring.datasource.password=xxx # replace the correct {password} value
sh script/create-dolphinscheduler.sh
Note: If you execute the above script and report "/bin/java: No such file or directory" error, please configure JAVA_HOME and PATH variables in /etc/profile
Modify the environment variable in dolphinscheduler_env.sh
file which on the 'conf/env' directory (take the relevant software installed under '/opt/soft' as an example)
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
#export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
```
`Note: This step is very important. For example, JAVA_HOME and PATH must be configured. Those that are not used can be ignored or commented out.`
Create Soft link jdk to /usr/bin/java (still JAVA_HOME=/opt/soft/java as an example)
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
Modify the parameters in the one-click deployment config file conf/config/install_config.conf
, pay special attention to the configuration of the following parameters.
# choose mysql or postgresql
dbtype="mysql"
# Database connection address and port
dbhost="192.168.xx.xx:3306"
# database name
dbname="dolphinscheduler"
# database username
username="xxx"
# database password
# NOTICE: if there are special characters, please use the \ to escape, for example, `[` escape to `\[`
password="xxx"
#Zookeeper cluster
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# Note: the target installation path for dolphinscheduler, please not config as the same as the current path (pwd)
installPath="/opt/soft/dolphinscheduler"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate HDFS. If HDFS is enabled, the root directory needs to be created by itself
deployUser="dolphinscheduler"
# alert config,take QQ email for example
# mail protocol
mailProtocol="SMTP"
# mail server host
mailServerHost="smtp.qq.com"
# mail server port
# note: Different protocols and encryption methods correspond to different ports, when SSL/TLS is enabled, make sure the port is correct.
mailServerPort="25"
# mail sender
mailSender="xxx@qq.com"
# mail user
mailUser="xxx@qq.com"
# mail sender password
# note: The mail.passwd is email service authorization code, not the email login password.
mailPassword="xxx"
# Whether TLS mail protocol is supported, true is supported and false is not supported
starttlsEnable="true"
# Whether TLS mail protocol is supported, true is supported and false is not supported.
# note: only one of TLS and SSL can be in the true state.
sslEnable="false"
# note: sslTrust is the same as mailServerHost
sslTrust="smtp.qq.com"
# resource storage type:HDFS,S3,NONE
resourceStorageType="HDFS"
# If resourceStorageType = HDFS, and your Hadoop Cluster NameNode has HA enabled, you need to put core-site.xml and hdfs-site.xml in the installPath/conf directory. In this example, it is placed under /opt/soft/dolphinscheduler/conf, and configure the namenode cluster name; if the NameNode is not HA, modify it to a specific IP or host name.
# if S3,write S3 address,HA,for example: s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# if not use Hadoop resourcemanager, please keep default value; if resourcemanager HA enable, please type the HA ips ; if resourcemanager is single, make this value empty
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# if resourcemanager HA enable or not use resourcemanager, please skip this value setting; If resourcemanager is single, you only need to replace yarnIp1 with actual resourcemanager hostname.
singleYarnIp="yarnIp1"
# resource store on HDFS/S3 path, resource file will store to this Hadoop HDFS path, self configuration, please make sure the directory exists on HDFS and have read-write permissions. /dolphinscheduler is recommended
resourceUploadPath="/dolphinscheduler"
# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
hdfsRootUser="hdfs"
# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
ips="ds1,ds2,ds3,ds4"
# ssh port, default 22
# Note: if ssh port is not default, modify here
sshPort="22"
# run master machine
# Note: list of hosts hostname for deploying master
masters="ds1,ds2"
# run worker machine
# note: need to write the worker group name of each worker, the default value is "default"
workers="ds3:default,ds4:default"
# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="ds2"
# run api machine
# note: list of machine hostnames for deploying api server
apiServers="ds1"
Attention:
Switch to the deployment user and execute the one-click deployment script
sh install.sh
Note:
For the first deployment, the following message appears in step 3 of `3, stop server` during operation. This message can be ignored.
sh: bin/dolphinscheduler-daemon.sh: No such file or directory
After the script is completed, the following 5 services will be started. Use the jps
command to check whether the services are started (jps
comes with java JDK
)
MasterServer ----- master service
WorkerServer ----- worker service
LoggerServer ----- logger service
ApiApplicationServer ----- api service
AlertServer ----- alert service
If the above services are started normally, the automatic deployment is successful.
After the deployment is successful, you can view the logs. The logs are stored in the logs folder.
logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-master-server.log
|—— dolphinscheduler-worker-server.log
|—— dolphinscheduler-api-server.log
|—— dolphinscheduler-logger-server.log
Access the address of the front page, interface IP (self-modified) http://localhost:12345/dolphinscheduler
Stop all services
sh ./bin/stop-all.sh
Start all services
sh ./bin/start-all.sh
Start and stop master service
sh ./bin/dolphinscheduler-daemon.sh start master-server
sh ./bin/dolphinscheduler-daemon.sh stop master-server
sh ./bin/dolphinscheduler-daemon.sh start worker-server
sh ./bin/dolphinscheduler-daemon.sh stop worker-server
sh ./bin/dolphinscheduler-daemon.sh start api-server
sh ./bin/dolphinscheduler-daemon.sh stop api-server
sh ./bin/dolphinscheduler-daemon.sh start logger-server
sh ./bin/dolphinscheduler-daemon.sh stop logger-server
sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
Note: Please refer to the "Architecture Design" section for service usage