Azkaban安装部署



Azkaban是由Linkedin开源的一个批量工作流任务调度器。用于在一个工作流内以一个特定的顺序运行一组工作和流程。Azkaban定义了一种KV文件格式来建立任务之间的依赖关系,并提供一个易于使用的web用户界面维护和跟踪你的工作流。

1、简介

官网: https://azkaban.github.io/

下载: https://github.com/azkaban/azkaban/releases

1.1、架构

1.2、三个关键组件的作用

  • Relational Database:存储元数据,如项目名称、项目描述、项目权限、任务状态、SLA规则等。
  • AzkabanWebServer:项目管理、权限授权、任务调度、监控executor。
  • AzkabanExecutorServer:作业流执行的Server。

1.3、三种部署模式

1)、solo-server模式

​ DB使用的是一个内嵌的H2,Web Server和Executor Server运行在同一个进程里。这种模式包含Azkaban的所有特性,但一般用来学习和测试。

2)、two-server模式

​ DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同的进程中

3)、分布式multiple-executor模式

​ DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同机器上,且有多个Executor Server

2、下载安装

2.1、编译环境

1
2
# 安装jdk 1.8
yum install -y git

2.2、下载源码编译

1
2
wget https://github.com/azkaban/azkaban/archive/3.38.0.tar.gz -O azkaban-3.38.0.tar.gz
tar -zxf azkaban-3.38.0.tar.gz

1
2
3
cd azkaban-3.38.0
./gradlew clean build -x test
# ./gradlew build -x test

2.3、列出编译产物

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# solo-server模式安装包路径
[root@localhost azkaban-3.38.0]# ll azkaban-solo-server/build/distributions/
-rw-r--r--. 1 root root 24166624 Nov 21 15:27 azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz
-rw-r--r--. 1 root root 24303950 Nov 21 15:27 azkaban-solo-server-0.1.0-SNAPSHOT.zip
# two-server模式和multiple-executor模式web-server安装包路径
[root@localhost azkaban-3.38.0]# ll azkaban-web-server/build/distributions/
-rw-r--r--. 1 root root 20120241 Nov 21 15:21 azkaban-web-server-0.1.0-SNAPSHOT.tar.gz
-rw-r--r--. 1 root root 20246140 Nov 21 15:21 azkaban-web-server-0.1.0-SNAPSHOT.zip
# two-server模式和multiple-executor模式exec-server安装包路径
[root@localhost azkaban-3.38.0]# ll azkaban-exec-server/build/distributions/
-rw-r--r--. 1 root root 16055070 Nov 21 15:27 azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz
-rw-r--r--. 1 root root 16060378 Nov 21 15:27 azkaban-exec-server-0.1.0-SNAPSHOT.zip
# 建表sql
[root@localhost azkaban-3.38.0]# ll azkaban-db/build/sql/
-rw-r--r--. 1 root root 13887 Nov 21 15:13 create-all-sql-0.1.0-SNAPSHOT.sql

3、multiple-executor集群搭建

1、WebServer 负责管理Job,存储在mysql中

2、mysql存储Job

3、ExecutorServer从mysql中读取任务执行,ExecutorServer要部署在定时任务机器上;

Azkaban WebServer挂掉,不影响已经提交的任务执行,主要是不能通过WebUI查看Job、管理Job、跟踪Job状态。
因此,对于这个架构,主要是要解决MySQL HA和ExecutorServer HA。官方支持ExecutorServer HA,我们只需要配一个MySQL HA就行了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mkdir -p /data/azkaban
tar -zxf azkaban-db/build/distributions/azkaban-db-0.1.0-SNAPSHOT.tar.gz -C /data/azkaban
tar -zxf azkaban-web-server/build/distributions/azkaban-web-server-0.1.0-SNAPSHOT.tar.gz -C /data/azkaban
tar -zxf azkaban-exec-server/build/distributions/azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz -C /data/azkaban
tar -zxf azkaban-solo-server/build/distributions/azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz -C /data/azkaban
# 结果
[root@localhost azkaban]# pwd
/data/azkaban
[root@localhost azkaban]# ll
total 4
drwxr-xr-x. 2 root root 4096 Nov 21 14:50 azkaban-db-0.1.0-SNAPSHOT
drwxr-xr-x. 6 root root 55 Nov 21 15:27 azkaban-exec-server-0.1.0-SNAPSHOT
drwxr-xr-x. 8 root root 77 Nov 21 15:27 azkaban-solo-server-0.1.0-SNAPSHOT
drwxr-xr-x. 6 root root 51 Nov 21 15:21 azkaban-web-server-0.1.0-SNAPSHOT

mv azkaban-web-server-0.1.0-SNAPSHOT/ webserver
mv azkaban-exec-server-0.1.0-SNAPSHOT exec-server
mv azkaban-solo-server-0.1.0-SNAPSHOT soloserver

[root@localhost azkaban]# ll
drwxr-xr-x. 2 root root 4096 Nov 21 14:50 azkaban-db-0.1.0-SNAPSHOT
drwxr-xr-x. 10 root root 202 Nov 22 10:20 exec-server
drwxr-xr-x. 8 root root 77 Nov 21 15:27 soloserver
drwxr-xr-x. 9 root root 196 Nov 22 10:07 webserver
[root@localhost azkaban]#

3.1、建立Azkaban数据库

1
2
3
4
mysqladmin create db_azkaban -h 172.18.1.51 -P 3306 -uroot -p123456
mysql -h 172.18.1.51 -P 3306 -uroot -p123456 db_azkaban < /data/azkaban/azkaban-db-0.1.0-SNAPSHOT/create-all-sql-0.1.0-SNAPSHOT.sql
# 报错:ERROR 1071 (42000) at line 64: Specified key was too long; max key length is 767 bytes
# 将数据库编码改为Latin1可解决

3.2、生成SSL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@localhost azkaban]# cd ~
[root@localhost ~]# keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password: 123456
Re-enter new password: 123456
What is your first and last name?
[Unknown]: 回车
What is the name of your organizational unit?
[Unknown]: 回车
What is the name of your organization?
[Unknown]: 回车
What is the name of your City or Locality?
[Unknown]: 回车
What is the name of your State or Province?
[Unknown]: 回车
What is the two-letter country code for this unit?
[Unknown]: 回车
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
[no]: yes
Enter key password for <jetty>
(RETURN if same as keystore password): 123456
Re-enter new password: 123456
[root@localhost ~]#

3.3、设置 ExecutorServer

vi /data/azkaban/exec-server/conf/azkaban.properties

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Azkaban Personalization Settings
azkaban.name=AzkabanExecutor
azkaban.label=AzkabanExecutor
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8181
# Where the Azkaban web server is located
azkaban.webserver.url=http://localhost:8081
# mail settings
mail.sender=iceleader@126.com
mail.host=iceleader@126.com
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=iceleader@126.com
job.success.email=iceleader@126.com
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=172.18.1.51
mysql.database=db_azkaban
mysql.user=root
mysql.password=123456
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30

设置:

  • default.timezone.id 时区配置
  • user.manager.xml.file 登录用户配置
  • azkaban.webserver.url 指向 webserver
  • mysql.* mysql 相关配置
  • mail.sender、mail.host、job.failure.email、job.success.email 邮件相关配置

启动:/data/azkaban/exec-server/bin/start-exec.sh

curl http://172.18.1.51:41096/executor?action=activate

1
curl http://${executorHost}:${executorPort}/executor?action=activate

3.4、设置 WebServer

1
2
cd /data/azkaban/webserver/conf/
vi /data/azkaban/webserver/conf/azkaban.properties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 修改时区
default.timezone.id=Asia/Shanghai
# 邮箱配置
job.failure.email=iceleader@126.com
job.success.email=iceleader@126.com
# mysql配置
database.type=mysql
mysql.port=3306
mysql.host=172.18.1.51
mysql.database=db_azkaban
mysql.user=root
mysql.password=123456
mysql.numconnections=100
# SSL密码和文件路径

vi /data/azkaban/webserver/conf/azkaban-users.xml

1
2
3
4
5
6
7
8
<azkaban-users>
<user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/>
<user password="metrics" roles="metrics" username="metrics"/>
<user username="admin" password="admin" roles="admin" />

<role name="admin" permissions="ADMIN"/>
<role name="metrics" permissions="METRICS"/>
</azkaban-users>

mkdir -p /data/azkaban/webserver/plugins/jobtypes

vi /data/azkaban/webserver/plugins/jobtypes/commonprivate.properties

1
2
azkaban.native.lib=false
execute.as.user=false

启动服务器

sh /data/azkaban/webserver/bin/start-web.sh

sh /data/azkaban/webserver/bin/shutdown-web.sh

注意:没有 ExecutorServer 启动,WebServer就是退出

select * from db_azkaban.executors

update db_azkaban.executors set active = 1 where id = 1;

浏览器访问: http://172.18.1.51:8081/index

使用admin/admin登录

3.5、配置Job

1)、创建project

2)、定义任务并上传

每个任务格式如下

1
2
3
type=command
dependencies=dailysummary,advisor
command=echo dailysettlement

完成后打包为 dailysettlement.zip, 上传到 Project dailysettlement下

3)、定制执行计划

每小时30分执行一次

4)、查看执行计划

3.6、执行任务的问题

1、集群内存不足,去掉webserver中的内存限制

vi webserver/conf/azkaban.properties

1
2
# azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
ERROR [FlowTriggerScheduler] [Azkaban] Unable to get scheduled flow triggers
java.lang.NullPointerException
at azkaban.flowtrigger.quartz.FlowTriggerScheduler.getScheduledFlowTriggerJobs(FlowTriggerScheduler.java:139)
at azkaban.webapp.servlet.FlowTriggerServlet.ajaxFetchTrigger(FlowTriggerServlet.java:67)
at azkaban.webapp.servlet.FlowTriggerServlet.handleAJAXAction(FlowTriggerServlet.java:107)
at azkaban.webapp.servlet.FlowTriggerServlet.handleGet(FlowTriggerServlet.java:57)
at azkaban.webapp.servlet.LoginAbstractAzkabanServlet.doGet(LoginAbstractAzkabanServlet.java:122)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2、ExcutorServer未激活,需要调用url手动去激活一下 executor,方式如下:

1
curl http://${executorHost}:${executorPort}/executor?action=activate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2019/11/22 14:25:23.831 +0800 ERROR [ExecutorServlet] [Azkaban] executor became inactive before setting up the flow 4
azkaban.executor.ExecutorManagerException: executor became inactive before setting up the flow 4
at azkaban.execapp.FlowRunnerManager.createFlowRunner(FlowRunnerManager.java:403)
at azkaban.execapp.FlowRunnerManager.submitFlow(FlowRunnerManager.java:347)
at azkaban.execapp.ExecutorServlet.handleAjaxExecute(ExecutorServlet.java:288)
at azkaban.execapp.ExecutorServlet.handleRequest(ExecutorServlet.java:136)
at azkaban.execapp.ExecutorServlet.doPost(ExecutorServlet.java:93)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)