速进!SeaTunnel 2.3.11 用 Docker 部署,实现 Kafka 同步 Hive/ES
本文档详细介绍如何使用 Docker 部署 SeaTunnel 2.3.11并配置 Kafka 虚拟表、数据源以及 Kafka 同步到 Hive 和 Elasticsearch 的完整实战案例。安装准备目录结构seatunnel-docker/├── docker-compose.yml# 主编排文件├── hive/# Hive 配置│ ├── hive-site.xml │ └── lib/# 依赖 jar 包│ └── postgresql-42.5.1.jar ├── init-sql/# 初始化 SQL│ └── seatunnel_server_mysql.sql ├── seatunnel/# SeaTunnel 服务端配置│ ├── Dockerfile │ └── apache-seatunnel-2.3.11/# 解压后的二进制包│ └── lib/# 依赖 jar 包│ ├── hive-exec-3.1.3.jar │ ├── hive-metastore-3.1.3.jar │ ├── libfb303-0.9.3.jar │ ├── mysql-connector-java-8.0.28.jar │ └── seatunnel-hadoop3-3.1.4-uber.jar └── seatunnel-web/# SeaTunnel Web 配置├── Dockerfile └── apache-seatunnel-web-1.0.3-bin/# 解压后的二进制包└── libs/# 依赖 jar 包└── mysql-connector-java-8.0.28.jar下载 seatunnel# seatunnel-2.3.11https://dlcdn.apache.org/seatunnel/2.3.11/apache-seatunnel-2.3.11-bin.tar.gz# 源码构建seatunnel-web-1.0.3gitclone https://github.com/apache/seatunnel-web.gitcdseatunnel-webshbuild.sh code下载依赖包#hive-metastore容器需要PostgreSQL为Hive 元数据库https://jdbc.postgresql.org/download/postgresql-42.5.1.jar#hive-metastore同步报错缺少依赖包实际验证加前3个包即可https://repo1.maven.org/maven2/org/apache/hive/hive-exec/3.1.3/hive-exec-3.1.3.jar https://repo1.maven.org/maven2/org/apache/hive/hive-metastore/3.1.3/hive-metastore-3.1.3.jar https://repo.maven.apache.org/maven2/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar https://repo1.maven.org/maven2/org/apache/thrift/libthrift/0.12.0/libthrift-0.12.0.jar https://repo1.maven.org/maven2/org/apache/hive/hive-common/3.1.3/hive-common-3.1.3.jar创建项目目录将准备好的相关文件存放 seatunnel-docker 目录mkdirseatunnel-dockercdseatunnel-dockerDocker 部署docker-compose.yml 配置version:3.9networks:seatunnel-network:driver:bridgeipam:config:-subnet:172.16.0.0/24services:# Hive 相关服务 hive-metastore-db:image:postgres:15container_name:hive-metastore-dbhostname:hive-metastore-dbenvironment:POSTGRES_DB:metastore_dbPOSTGRES_USER:hivePOSTGRES_PASSWORD:hive123456ports:-5432:5432volumes:-./hive-metastore-db-data:/var/lib/postgresql/datanetworks:seatunnel-network:ipv4_address:172.16.0.2healthcheck:# 添加健康检查test:[CMD-SHELL,pg_isready -U hive -d metastore_db]interval:5stimeout:5sretries:10start_period:10shive-metastore:image:apache/hive:4.0.0container_name:hive-metastorehostname:hive-metastoredepends_on:hive-metastore-db:condition:service_healthy# 等待数据库健康后才启动environment:SERVICE_NAME:metastoreDB_DRIVER:postgresSERVICE_OPTS:--Djavax.jdo.option.ConnectionDriverNameorg.postgresql.Driver-Djavax.jdo.option.ConnectionURLjdbc:postgresql://hive-metastore-db:5432/metastore_db-Djavax.jdo.option.ConnectionUserNamehive-Djavax.jdo.option.ConnectionPasswordhive123456ports:-9083:9083volumes:-./hive/lib/postgresql-42.5.1.jar:/opt/hive/lib/postgresql-42.5.1.jar-./hive/hive-site.xml:/opt/hive/conf/hive-site.xml-./hive-warehouse:/opt/hive/data/warehousenetworks:seatunnel-network:ipv4_address:172.16.0.3hive-server2:image:apache/hive:4.0.0container_name:hive-server2hostname:hive-server2depends_on:-hive-metastoreenvironment:HIVE_SERVER2_THRIFT_PORT:10000SERVICE_NAME:hiveserver2IS_RESUME:trueSERVICE_OPTS:-Dhive.metastore.uristhrift://hive-metastore:9083ports:-10000:10000-10002:10002volumes:-./hive-warehouse:/opt/hive/data/warehousenetworks:seatunnel-network:ipv4_address:172.16.0.4# MySQL mysql-seatunnel:image:mysql:8.0.42container_name:mysql-seatunnelhostname:mysql-seatunnelenvironment:MYSQL_ROOT_PASSWORD:root123456MYSQL_DATABASE:seatunnelMYSQL_ROOT_HOST:%ports:-3806:3306volumes:-./mysql_data:/var/lib/mysql-./init-sql:/docker-entrypoint-initdb.dnetworks:seatunnel-network:ipv4_address:172.16.0.5command:--default-authentication-pluginmysql_native_passwordhealthcheck:test:[CMD,mysqladmin,ping,-h,localhost]interval:10stimeout:5sretries:5# SeaTunnel seatunnel-master:build:context:./seatunneldockerfile:Dockerfileimage:seatunnel:2.3.11container_name:seatunnel-masterhostname:seatunnel-masterextra_hosts:-hive-metastore:172.16.0.3-hive-metastore-db:172.16.0.2environment:-SEATUNNEL_HOME/opt/seatunnelcommand:sh -c cd /opt/seatunnel exec bin/seatunnel-cluster.sh -r master ports:-5801:5801volumes:-./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/-./logs/master:/opt/seatunnel/logs# [修改点] 挂载 Hive 仓库目录确保数据写入宿主机共享目录-./hive-warehouse:/opt/hive/data/warehousenetworks:seatunnel-network:ipv4_address:172.16.0.10seatunnel-worker1:image:seatunnel:2.3.11container_name:seatunnel-worker1hostname:seatunnel-worker1extra_hosts:-hive-metastore:172.16.0.3-hive-metastore-db:172.16.0.2environment:-SEATUNNEL_HOME/opt/seatunnelcommand:sh -c cd /opt/seatunnel exec bin/seatunnel-cluster.sh -r worker volumes:-./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/-./logs/worker1:/opt/seatunnel/logs# [修改点] 挂载 Hive 仓库目录确保数据写入宿主机共享目录-./hive-warehouse:/opt/hive/data/warehousedepends_on:-seatunnel-masternetworks:seatunnel-network:ipv4_address:172.16.0.11seatunnel-worker2:image:seatunnel:2.3.11container_name:seatunnel-worker2hostname:seatunnel-worker2extra_hosts:-hive-metastore:172.16.0.3-hive-metastore-db:172.16.0.2environment:-SEATUNNEL_HOME/opt/seatunnelcommand:sh -c cd /opt/seatunnel exec bin/seatunnel-cluster.sh -r worker volumes:-./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/-./logs/worker2:/opt/seatunnel/logs# [修改点] 挂载 Hive 仓库目录确保数据写入宿主机共享目录-./hive-warehouse:/opt/hive/data/warehousedepends_on:-seatunnel-masternetworks:seatunnel-network:ipv4_address:172.16.0.12seatunnel-web:build:context:./seatunnel-webdockerfile:Dockerfileimage:seatunnel-web:1.0.3container_name:seatunnel-webhostname:seatunnel-webextra_hosts:-hive-metastore:172.16.0.3-hive-metastore-db:172.16.0.2environment:-SEATUNNEL_HOME/opt/seatunnel-SEATUNNEL_WEB_HOME/opt/seatunnel-webports:-8801:8801volumes:-./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/-./seatunnel-web/apache-seatunnel-web-1.0.3-bin/:/opt/seatunnel-web/-./logs/web:/opt/seatunnel-web/logs# [修改点] 挂载 Hive 仓库目录保持环境一致性-./hive-warehouse:/opt/hive/data/warehousedepends_on:-seatunnel-masternetworks:seatunnel-network:ipv4_address:172.16.0.13SeaTunnel 配置DockerfileFROM eclipse-temurin:8-jdk-ubi9-minimal WORKDIR /opt/seatunnel/ # 设置环境变量 ENV SEATUNNEL_HOME/opt/seatunnel ENV PATH$PATH:$SEATUNNEL_HOME/bin # 暴露端口 EXPOSE 5801 # 启动命令 CMD [sh, bin/seatunnel-cluster.sh, -r, master]hazelcast-client.yaml 客户端配置编辑seatunnel/apache-seatunnel-2.3.11/config/hazelcast-client.yamlhazelcast-client:cluster-name:seatunnelproperties:hazelcast.logging.type:log4j2connection-strategy:connection-retry:cluster-connect-timeout-millis:3000network:cluster-members:-seatunnel-master:5801hazelcast-master.yaml 配置编辑seatunnel/apache-seatunnel-2.3.11/config/hazelcast-master.yamlhazelcast:cluster-name:seatunnelnetwork:rest-api:enabled:falseendpoint-groups:CLUSTER_WRITE:enabled:trueDATA:enabled:truejoin:tcp-ip:enabled:truemember-list:-seatunnel-master:5801-seatunnel-worker1:5802-seatunnel-worker2:5802port:auto-increment:falseport:5801properties:hazelcast.invocation.max.retry.count:20hazelcast.tcp.join.port.try.count:30hazelcast.logging.type:log4j2hazelcast.operation.generic.thread.count:50hazelcast.heartbeat.failuredetector.type:phi-accrualhazelcast.heartbeat.interval.seconds:2hazelcast.max.no.heartbeat.seconds:180hazelcast.heartbeat.phiaccrual.failuredetector.threshold:10hazelcast.heartbeat.phiaccrual.failuredetector.sample.size:200hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis:100hazelcast-worker.yaml 配置编辑seatunnel/apache-seatunnel-2.3.11/config/hazelcast-worker.yamlhazelcast:cluster-name:seatunnelnetwork:join:tcp-ip:enabled:truemember-list:-seatunnel-master:5801-seatunnel-worker1:5802-seatunnel-worker2:5802port:auto-increment:falseport:5802properties:hazelcast.invocation.max.retry.count:20hazelcast.tcp.join.port.try.count:30hazelcast.logging.type:log4j2hazelcast.operation.generic.thread.count:50hazelcast.heartbeat.failuredetector.type:phi-accrualhazelcast.heartbeat.interval.seconds:2hazelcast.max.no.heartbeat.seconds:180hazelcast.heartbeat.phiaccrual.failuredetector.threshold:10hazelcast.heartbeat.phiaccrual.failuredetector.sample.size:200hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis:100安装连接器依赖包配置同步任务点击 Source 组件源名称下拉框没有数据需要安装依赖才可以显示。cdseatunnel/apache-seatunnel-2.3.11/shbin/install-plugin.shSeaTunnel Web 配置Dockerfile 配置FROM eclipse-temurin:8-jdk-ubi9-minimal WORKDIR /opt/seatunnel-web/ # 设置环境变量 ENV SEATUNNEL_WEB_HOME/opt/seatunnel-web ENV SEATUNNEL_HOME/opt/seatunnel # 暴露端口 EXPOSE 8801 # 启动命令 CMD [sh, bin/seatunnel-backend-daemon.sh, start]application.yml 配置编辑seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/application.ymlserver:port:8801spring:main:allow-circular-references:trueapplication:name:seatunneljackson:date-format:yyyy-MM-dd HH:mm:ssdatasource:driver-class-name:com.mysql.cj.jdbc.Driverurl:jdbc:mysql://mysql-seatunnel:3306/seatunnel?useSSLfalseuseUnicodetruecharacterEncodingutf-8allowMultiQueriestrueallowPublicKeyRetrievaltrueusername:rootpassword:root123456jwt:expireTime:86400# please add key when deploy 要配置下token 32位secretKey:a3f5c8d2e1b4098765432109abcdef1234567890abcdefalgorithm:HS256hazelcast-client.yaml 客户端配置编辑seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/hazelcast-client.yamlhazelcast-client:cluster-name:seatunnelproperties:hazelcast.logging.type:log4j2connection-strategy:connection-retry:cluster-connect-timeout-millis:3000network:cluster-members:-seatunnel-master:5801seatunnel-backend-daemon.sh编辑seatunnel-web/apache-seatunnel-web-1.0.3-bin/bin/seatunnel-backend-daemon.sh去除后台模式 去掉 nohup 和最后的 $JAVA_HOME/bin/java$JAVA_OPTS\-cp$CLASSPATH$SPRING_OPTS\org.apache.seatunnel.app.SeatunnelApplication${LOGDIR}/seatunnel.out21echoseatunnel-web startedplugin-mapping.properties 配置实际验证此步骤可省略。拷贝 seatunnel/apache-seatunnel-2.3.11/connectors/plugin-mapping.properties 到seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/plugin-mapping.propertiescdseatunnel-dockercpseatunnel/apache-seatunnel-2.3.11/connectors/plugin-mapping.properties seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/plugin-mapping.propertiesHive 配置hive-site.xml 配置?xml version1.0 encodingUTF-8?configurationpropertynamehive.metastore.uris/namevaluethrift://hive-metastore:9083/value/propertypropertynamehive.metastore.warehouse.dir/namevalue/opt/hive/data/warehouse/value/propertypropertynamemetastore.metastore.event.db.notification.api.auth/namevaluefalse/value/property/configurationlib 目录 依赖包postgresql-42.5.1.jarMysql 配置init-sql 目录 初始 SQL 脚本拷贝 seatunnel-web/apache-seatunnel-web-1.0.3-bin/script/seatunnel_server_mysql.sql 到init-sql/seatunnel_server_mysql.sqlcdseatunnel-dockercpseatunnel-web/apache-seatunnel-web-1.0.3-bin/script/seatunnel_server_mysql.sql init-sql/seatunnel_server_mysql.sqldocker 启动# 启动所有服务dockercompose up-d--build# 访问web ui页面 默认登录账号admin / adminopenhttp://localhost:8801运行示例登录配置语言登录页面设置配置语言配置数据源kafka 数据源ES 数据源Hive-metastore 本地数据源配置为 thrift://hive-metastore:9083 也可以。配置虚拟表虚拟表列表创建虚拟表流程进入「虚拟表」菜单点击「创建」按钮选择数据源配置虚拟表信息点击「下一步」配置字段映射点击「下一步」确认信息并保存配置同步任务kafka-hive 同步任务任务组件配置Source 组件配置FieldMapper 组件配置模型视图Sink 组件配置Kafka-Elasticsearch 同步任务任务组件配置Source 组件配置FieldMapper 组件配置模型视图Sink 组件配置创建同步任务通用流程进入「任务」→「同步任务定义」点击「创建」按钮拖拽或选择 Source、FieldMapper、Sink 组件构建任务流程双击 Source 组件配置数据源信息选择已配置的 Kafka 数据源双击 FieldMapper 组件点击「模型」按钮配置字段映射关系双击 Sink 组件配置目标数据源信息Hive 或 Elasticsearch保存并启动任务需要配置 job mode 不然保存不了报错job env cant be empty, please change confighive 相关操作创建表# 进入 HiveServer2 容器dockerexec-ithive-server2 beeline-ujdbc:hive2://localhost:10000-e CREATE TABLE IF NOT EXISTS default.test_user_data3 ( user_id STRING, type STRING, content STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY \t STORED AS TEXTFILE; # 或创建 Parquet 格式推荐dockerexec-ithive-server2 beeline-ujdbc:hive2://localhost:10000-e CREATE TABLE IF NOT EXISTS default.test_user_data3 ( user_id STRING, type STRING, content STRING ) STORED AS PARQUET; 查看表结构dockerexec-ithive-server2 beeline-ujdbc:hive2://localhost:10000-e SHOW TABLES IN default; DESCRIBE default.test_user_data3; 查询表数据dockerexec-ithive-server2 beeline-ujdbc:hive2://localhost:10000-e SELECT * FROM default.test_user_data3 LIMIT 10; 备注seatunnel-web 容器启动就退出排查是否配置seatunnel-backend-daemon.sh编辑seatunnel-web/apache-seatunnel-web-1.0.3-bin/bin/seatunnel-backend-daemon.sh去除后台模式 去掉 nohup 和最后的 $JAVA_HOME/bin/java$JAVA_OPTS\-cp$CLASSPATH$SPRING_OPTS\org.apache.seatunnel.app.SeatunnelApplication${LOGDIR}/seatunnel.out21echoseatunnel-web startedseatunnel-web 启动后访问页面报错 Unknown exception. secret key byte array cannot be null or empty排查 application.yml 是否配置jwt:expireTime:86400# please add key when deploysecretKey:a3f5c8d2e1b4098765432109abcdef1234567890abcdefalgorithm:HS256hive 地址解析异常seatunnel seatunnel-web ERROR[qtp2135089262-20][MetaStoreUtils.logAndThrowMetaException():166]- Got exception: java.net.URISyntaxException Illegal characterinhostnameat index44: thrift://hive-metastore.seatunnel-docker_seatunnel-network:9083docker-compose.yml 对应容器加上 ip 绑定extra_hosts:-hive-metastore:172.16.0.3-hive-metastore-db:172.16.0.2Hive 同步报错 error java.lang.NoClassDefFoundErrorseatunnel/apache-seatunnel-2.3.11/lib 存放依赖包hive-exec-3.1.3.jar hive-metastore-3.1.3.jar libfb303-0.9.3.jarhive 同步任务显示成功实际没有数据写入docker-compose.yml 对应容器加上 hive 写入本地目录的配置volumes:# [修改点] 挂载 Hive 仓库目录确保数据写入宿主机共享目录-./hive-warehouse:/opt/hive/data/warehouse查看任务执行日志 will be executed on worker./logs/master/seatunnel-engine-master.logTask[TaskGroupLocation{jobId1080750681855361026,pipelineId1,taskGroupId2}]will be executed on worker[[seatunnel-worker2]:5801], slotID[2], resourceProfile[ResourceProfile{cpuCPU{core0},heapMemoryMemory{bytes0}}], sequence[db6b679c-67cc-43b8-b64a-acaa85c2a4c0], assigned[1080750681855361026]作者 | 云婷原文链接https://www.cnblogs.com/thao/p/19666609

相关新闻