Browse Source

init

master
xueyinfei 1 year ago
commit
07db7ad954
  1. 50
      .asf.yaml
  2. 32
      .dlc.json
  3. 28
      .flake8
  4. 3
      .gitattributes
  5. 48
      .github/CODEOWNERS
  6. 131
      .github/ISSUE_TEMPLATE/bug-report.yml
  7. 22
      .github/ISSUE_TEMPLATE/config.yml
  8. 73
      .github/ISSUE_TEMPLATE/document.yml
  9. 81
      .github/ISSUE_TEMPLATE/feature-request.yml
  10. 68
      .github/ISSUE_TEMPLATE/improvement-report.yml
  11. 34
      .github/PULL_REQUEST_TEMPLATE.md
  12. 54
      .github/actions/labeler/labeler.yml
  13. 38
      .github/actions/sanity-check/action.yml
  14. 46
      .github/mergeable.yml
  15. 142
      .github/workflows/api-test.yml
  16. 138
      .github/workflows/backend.yml
  17. 48
      .github/workflows/cluster-test/mysql/Dockerfile
  18. 43
      .github/workflows/cluster-test/mysql/deploy.sh
  19. 65
      .github/workflows/cluster-test/mysql/docker-compose-base.yaml
  20. 29
      .github/workflows/cluster-test/mysql/docker-compose-cluster.yaml
  21. 47
      .github/workflows/cluster-test/mysql/dolphinscheduler_env.sh
  22. 61
      .github/workflows/cluster-test/mysql/install_env.sh
  23. 91
      .github/workflows/cluster-test/mysql/running_test.sh
  24. 33
      .github/workflows/cluster-test/mysql/start-job.sh
  25. 39
      .github/workflows/cluster-test/postgresql/Dockerfile
  26. 40
      .github/workflows/cluster-test/postgresql/deploy.sh
  27. 65
      .github/workflows/cluster-test/postgresql/docker-compose-base.yaml
  28. 29
      .github/workflows/cluster-test/postgresql/docker-compose-cluster.yaml
  29. 47
      .github/workflows/cluster-test/postgresql/dolphinscheduler_env.sh
  30. 61
      .github/workflows/cluster-test/postgresql/install_env.sh
  31. 91
      .github/workflows/cluster-test/postgresql/running_test.sh
  32. 33
      .github/workflows/cluster-test/postgresql/start-job.sh
  33. 64
      .github/workflows/docs.yml
  34. 175
      .github/workflows/e2e.yml
  35. 64
      .github/workflows/frontend.yml
  36. 47
      .github/workflows/issue-robot.yml
  37. 48
      .github/workflows/owasp-dependency-check.yaml
  38. 78
      .github/workflows/publish-docker.yaml
  39. 69
      .github/workflows/publish-helm-chart.yaml
  40. 41
      .github/workflows/pull-request-robot.yml
  41. 52
      .github/workflows/stale.yml
  42. 133
      .github/workflows/unit-test.yml
  43. 52
      .gitignore
  44. 23
      .gitmodules
  45. 50
      .licenserc.yaml
  46. 24
      CONTRIBUTING.md
  47. 222
      LICENSE
  48. 87
      NOTICE
  49. 83
      README.md
  50. 76
      README_zh_CN.md
  51. 5
      deploy/README.md
  52. 24
      deploy/docker/.env
  53. 154
      deploy/docker/docker-compose.yml
  54. 132
      deploy/docker/docker-stack.yml
  55. 62
      deploy/kubernetes/dolphinscheduler/Chart.yaml
  56. 21
      deploy/kubernetes/dolphinscheduler/resources/config/common.properties
  57. 50
      deploy/kubernetes/dolphinscheduler/templates/NOTES.txt
  58. 245
      deploy/kubernetes/dolphinscheduler/templates/_helpers.tpl
  59. 29
      deploy/kubernetes/dolphinscheduler/templates/configmap-dolphinscheduler-common.yaml
  60. 26
      deploy/kubernetes/dolphinscheduler/templates/configmap.yaml
  61. 122
      deploy/kubernetes/dolphinscheduler/templates/deployment-dolphinscheduler-alert.yaml
  62. 126
      deploy/kubernetes/dolphinscheduler/templates/deployment-dolphinscheduler-api.yaml
  63. 60
      deploy/kubernetes/dolphinscheduler/templates/ingress.yaml
  64. 51
      deploy/kubernetes/dolphinscheduler/templates/job-dolphinscheduler-schema-initializer.yaml
  65. 34
      deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-alert.yaml
  66. 34
      deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-api.yaml
  67. 36
      deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-fs-file.yaml
  68. 36
      deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-shared.yaml
  69. 53
      deploy/kubernetes/dolphinscheduler/templates/rbac.yaml
  70. 28
      deploy/kubernetes/dolphinscheduler/templates/secret-external-database.yaml
  71. 136
      deploy/kubernetes/dolphinscheduler/templates/statefulset-dolphinscheduler-master.yaml
  72. 167
      deploy/kubernetes/dolphinscheduler/templates/statefulset-dolphinscheduler-worker.yaml
  73. 35
      deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-alert.yaml
  74. 61
      deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-api.yaml
  75. 32
      deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-master-headless.yaml
  76. 32
      deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-worker-headless.yaml
  77. 500
      deploy/kubernetes/dolphinscheduler/values.yaml
  78. 1269
      docs/configs/docsdev.js
  79. 131
      docs/configs/index.md.jsx
  80. 288
      docs/configs/site.js
  81. 92
      docs/docs/en/DSIP.md
  82. 20
      docs/docs/en/about/features.md
  83. 75
      docs/docs/en/about/glossary.md
  84. 49
      docs/docs/en/about/hardware.md
  85. 7
      docs/docs/en/about/introduction.md
  86. 42
      docs/docs/en/architecture/cache.md
  87. 384
      docs/docs/en/architecture/configuration.md
  88. 225
      docs/docs/en/architecture/design.md
  89. 60
      docs/docs/en/architecture/load-balance.md
  90. 43
      docs/docs/en/architecture/metadata.md
  91. 1116
      docs/docs/en/architecture/task-structure.md
  92. 131
      docs/docs/en/contribute/api-standard.md
  93. 92
      docs/docs/en/contribute/api-test.md
  94. 293
      docs/docs/en/contribute/architecture-design.md
  95. 62
      docs/docs/en/contribute/backend/mechanism/global-parameter.md
  96. 6
      docs/docs/en/contribute/backend/mechanism/overview.md
  97. 9
      docs/docs/en/contribute/backend/mechanism/task/switch.md
  98. 108
      docs/docs/en/contribute/backend/spi/alert.md
  99. 25
      docs/docs/en/contribute/backend/spi/datasource.md
  100. 28
      docs/docs/en/contribute/backend/spi/registry.md

50
.asf.yaml

@ -0,0 +1,50 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
github:
description: Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
homepage: https://dolphinscheduler.apache.org/
labels:
- airflow
- schedule
- job-scheduler
- oozie
- task-scheduler
- azkaban
- distributed-schedule-system
- workflow-scheduling-system
- etl-dependency
- workflow-platform
- cronjob-schedule
- job-schedule
- task-schedule
- workflow-schedule
- data-schedule
enabled_merge_buttons:
squash: true
merge: false
rebase: false
protected_branches:
dev:
required_status_checks:
contexts:
- Build
- Unit Test
- E2E
required_pull_request_reviews:
dismiss_stale_reviews: true
required_approving_review_count: 1

32
.dlc.json

@ -0,0 +1,32 @@
{
"ignorePatterns": [
{
"pattern": "^http://localhost"
},
{
"pattern": "^https://img.shields.io/badge"
},
{
"pattern": "/community/community.html$"
}
],
"replacementPatterns": [
{
"pattern": "^/en-us/download/download.html$",
"replacement": "https://dolphinscheduler.apache.org/en-us/download/download.html"
},
{
"pattern": "^/zh-cn/download/download.html$",
"replacement": "https://dolphinscheduler.apache.org/zh-cn/download/download.html"
}
],
"timeout": "10s",
"retryOn429": true,
"retryCount": 10,
"fallbackRetryDelay": "1000s",
"aliveStatusCodes": [
200,
401,
0
]
}

28
.flake8

@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
[flake8]
max-line-length = 110
exclude =
.git,
dist,
ignore =
# It's clear and not need to add docstring
D107, # D107: Don't require docstrings on __init__
D105, # D105: Missing docstring in magic method
# Conflict to Black
W503 # W503: Line breaks before binary operators

3
.gitattributes

@ -0,0 +1,3 @@
*.js linguist-language=java
*.css linguist-language=java
*.html linguist-language=java

48
.github/CODEOWNERS

@ -0,0 +1,48 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
/.github/ @kezhenxu94 @SbloodyS
/deploy/ @kezhenxu94 @caishunfeng
/dolphinscheduler-alert/ @kezhenxu94 @caishunfeng
/dolphinscheduler-e2e/ @kezhenxu94 @SbloodyS
/dolphinscheduler-api-test/ @SbloodyS
/dolphinscheduler-registry/ @kezhenxu94 @caishunfeng @ruanwenjun
/dolphinscheduler-api/ @caishunfeng @SbloodyS
/dolphinscheduler-dao/ @caishunfeng @SbloodyS
/dolphinscheduler-dao/src/main/resources/sql/ @zhongjiajie
/dolphinscheduler-common/ @caishunfeng
/dolphinscheduler-standalone-server/ @kezhenxu94 @caishunfeng
/dolphinscheduler-datasource-plugin/ @caishunfeng
/dolphinscheduler-dist/ @kezhenxu94 @caishunfeng
/dolphinscheduler-meter/ @caishunfeng @kezhenxu94 @ruanwenjun @EricGao888
/dolphinscheduler-scheduler-plugin/ @caishunfeng
/dolphinscheduler-master/ @caishunfeng @SbloodyS @ruanwenjun
/dolphinscheduler-worker/ @caishunfeng @SbloodyS @ruanwenjun
/dolphinscheduler-service/ @caishunfeng
/dolphinscheduler-remote/ @caishunfeng
/dolphinscheduler-spi/ @caishunfeng
/dolphinscheduler-task-plugin/ @caishunfeng @SbloodyS @zhuangchong
/dolphinscheduler-tools/ @caishunfeng @SbloodyS @zhongjiajie @EricGao888
/script/ @caishunfeng @SbloodyS @zhongjiajie @EricGao888
/dolphinscheduler-ui/ @songjianet @Amy0104
/docs/ @zhongjiajie @EricGao888
/licenses/ @kezhenxu94 @zhongjiajie
/images/ @zhongjiajie @EricGao888
/style/ @caishunfeng
# All pom files
pom.xml @kezhenxu94

131
.github/ISSUE_TEMPLATE/bug-report.yml

@ -0,0 +1,131 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Bug report
title: "[Bug] [Module Name] Bug title "
description: Problems and issues with code of Apache Dolphinscheduler
labels: [ "bug", "Waiting for reply" ]
body:
- type: markdown
attributes:
value: >
Please make sure what you are reporting is indeed a bug with reproducible steps, if you want to ask questions
or share ideas, you can head to our
[Discussions](https://github.com/apache/dolphinscheduler/discussions) tab, you can also
[join our slack](https://s.apache.org/dolphinscheduler-slack)
and send your question to channel `#troubleshooting`
For better global communication, Please write in English.
If you feel the description in English is not clear, then you can append description in Chinese, thanks!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please make sure to search in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue)
first to see whether the same issue was reported already.
options:
- label: >
I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found
no similar issues.
required: true
- type: textarea
attributes:
label: What happened
description: Describe what happened.
placeholder: >
Please provide the context in which the problem occurred and explain what happened
validations:
required: true
- type: textarea
attributes:
label: What you expected to happen
description: What do you think went wrong?
placeholder: >
Please explain why you think the behaviour is erroneous. It is extremely helpful if you copy and paste
the fragment of logs showing the exact error messages or wrong behaviour and screenshots for
UI problems. You can include files by dragging and dropping them here.
**NOTE**: please copy and paste texts instead of taking screenshots of them for easy future search.
validations:
required: true
- type: textarea
attributes:
label: How to reproduce
description: >
What should we do to reproduce the problem? If you are not able to provide a reproducible case,
please open a [Discussion](https://github.com/apache/dolphinscheduler/discussions) instead.
placeholder: >
Please make sure you provide a reproducible step-by-step case of how to reproduce the problem
as minimally and precisely as possible. Keep in mind we do not have access to your deployment.
Remember that non-reproducible issues will be closed! Opening a discussion is recommended as a
first step.
validations:
required: true
- type: textarea
attributes:
label: Anything else
description: Anything else we need to know?
placeholder: >
How often does this problem occur? (Once? Every time? Only when certain conditions are met?)
Any relevant logs to include? Put them here inside fenced
``` ``` blocks or inside a collapsable details tag if it's too long:
<details><summary>x.log</summary> lots of stuff </details>
- type: dropdown
id: version
attributes:
label: Version
description: >
Which version of Apache DolphinScheduler are you running? We only accept bugs report from the LTS projects.
options:
- dev
- 3.0.0
- 2.0.6
- 2.0.5
validations:
required: true
- type: checkboxes
attributes:
label: Are you willing to submit PR?
description: >
This is absolutely not required, but we are happy to guide you in the contribution process
especially if you already have a good understanding of how to implement the fix.
Dolphinscheduler is a totally community-driven project and we love to bring new contributors in.
options:
- label: Yes I am willing to submit a PR!
- type: checkboxes
attributes:
label: Code of Conduct
description: |
The Code of Conduct helps create a safe space for everyone. We require that everyone agrees to it.
options:
- label: >
I agree to follow this project's
[Code of Conduct](https://www.apache.org/foundation/policies/conduct)
required: true
- type: markdown
attributes:
value: "Thanks for completing our form!"

22
.github/ISSUE_TEMPLATE/config.yml

@ -0,0 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
blank_issues_enabled: false
contact_links:
- name: Ask a question or get support
url: https://github.com/apache/dolphinscheduler/discussions/
about: Ask a question or request support for using Apache DolphinScheduler

73
.github/ISSUE_TEMPLATE/document.yml

@ -0,0 +1,73 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Documentation Related
description: Suggest an improvement or report a bug for this project's Documentation
title: "[Doc][Module Name] Documentation bug or improvement"
labels: [ "document", "Waiting for reply" ]
body:
- type: markdown
attributes:
value: |
For better global communication, Please write in English.
If you feel the description in English is not clear, then you can append description in Chinese, thanks!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please make sure to search in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) first
to see whether the same feature was requested already.
options:
- label: >
I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no
similar feature requirement.
required: true
- type: textarea
attributes:
label: Description
description: A short description why your find in our document.
- type: textarea
attributes:
label: Documentation Links
description: Copy and paste one or more links of this documentation issue.
- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
This is absolutely not required, but we are happy to guide you in the contribution process
especially if you already have a good understanding of how to implement the improvement.
DolphinScheduler is a totally community-driven project and we love to bring new contributors in.
options:
- label: Yes I am willing to submit a PR!
- type: checkboxes
attributes:
label: Code of Conduct
description: |
The Code of Conduct helps create a safe space for everyone. We require that everyone agrees to it.
options:
- label: |
I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
required: true
- type: markdown
attributes:
value: "Thanks for completing our form!"

81
.github/ISSUE_TEMPLATE/feature-request.yml

@ -0,0 +1,81 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Feature request
description: Suggest an idea for this project
title: "[Feature][Module Name] Feature title"
labels: [ "new feature", "Waiting for reply" ]
body:
- type: markdown
attributes:
value: |
For better global communication, Please write in English.
If you feel the description in English is not clear, then you can append description in Chinese, thanks!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please make sure to search in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) first
to see whether the same feature was requested already.
options:
- label: >
I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no
similar feature requirement.
required: true
- type: textarea
attributes:
label: Description
description: A short description of your feature
- type: textarea
attributes:
label: Use case
description: What do you want to happen?
placeholder: >
Rather than telling us how you might implement this feature, try to take a
step back and describe what you are trying to achieve.
- type: textarea
attributes:
label: Related issues
description: Is there currently another issue associated with this?
- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
This is absolutely not required, but we are happy to guide you in the contribution process
especially if you already have a good understanding of how to implement the feature.
DolphinScheduler is a totally community-driven project and we love to bring new contributors in.
options:
- label: Yes I am willing to submit a PR!
- type: checkboxes
attributes:
label: Code of Conduct
description: |
The Code of Conduct helps create a safe space for everyone. We require that everyone agrees to it.
options:
- label: |
I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
required: true
- type: markdown
attributes:
value: "Thanks for completing our form!"

68
.github/ISSUE_TEMPLATE/improvement-report.yml

@ -0,0 +1,68 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Improvement request
description: Suggest an improvement for this project
title: "[Improvement][Module Name] Improvement title"
labels: [ "improvement", "Waiting for reply" ]
body:
- type: markdown
attributes:
value: |
For better global communication, Please write in English.
If you feel the description in English is not clear, then you can append description in Chinese, thanks!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please make sure to search in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) first
to see whether the same feature was requested already.
options:
- label: >
I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no
similar feature requirement.
required: true
- type: textarea
attributes:
label: Description
description: A short description why your want to do this improvement
- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
This is absolutely not required, but we are happy to guide you in the contribution process
especially if you already have a good understanding of how to implement the improvement.
DolphinScheduler is a totally community-driven project and we love to bring new contributors in.
options:
- label: Yes I am willing to submit a PR!
- type: checkboxes
attributes:
label: Code of Conduct
description: |
The Code of Conduct helps create a safe space for everyone. We require that everyone agrees to it.
options:
- label: |
I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
required: true
- type: markdown
attributes:
value: "Thanks for completing our form!"

34
.github/PULL_REQUEST_TEMPLATE.md

@ -0,0 +1,34 @@
<!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
## Purpose of the pull request
<!--(For example: This pull request adds checkstyle plugin).-->
## Brief change log
<!--*(for example:)*
- *Add maven-checkstyle-plugin to root pom.xml*
-->
## Verify this pull request
<!--*(Please pick either of the following options)*-->
This pull request is code cleanup without any test coverage.
*(or)*
This pull request is already covered by existing tests, such as *(please describe tests)*.
(or)
This change added tests and can be verified as follows:
<!--*(example:)*
- *Added dolphinscheduler-dao tests for end-to-end.*
- *Added CronUtilsTest to verify the change.*
- *Manually verified the change by testing locally.* -->
(or)
If your pull request contain incompatible change, you should also add it to `docs/docs/en/guide/upgrede/incompatible.md`

54
.github/actions/labeler/labeler.yml

@ -0,0 +1,54 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
backend:
- 'dolphinscheduler-alert/**/*'
- 'dolphinscheduler-api/**/*'
- 'dolphinscheduler-common/**/*'
- 'dolphinscheduler-dao/**/*'
- 'dolphinscheduler-data-quality/**/*'
- 'dolphinscheduler-datasource-plugin/**/*'
- 'dolphinscheduler-dist/**/*'
- 'dolphinscheduler-master/**/*'
- 'dolphinscheduler-registry/**/*'
- 'dolphinscheduler-remote/**/*'
- 'dolphinscheduler-scheduler-plugin/**/*'
- 'dolphinscheduler-service/**/*'
- 'dolphinscheduler-spi/**/*'
- 'dolphinscheduler-standalone-server/**/*'
- 'dolphinscheduler-task-plugin/**/*'
- 'dolphinscheduler-tools/**/*'
- 'dolphinscheduler-worker/**/*'
- 'script/**/*'
document:
- 'docs/**/*'
CI&CD:
- any: ['.github/**/*']
docker:
- any: ['.deploy/**/*']
UI:
- any: ['dolphinscheduler-ui/**/*']
e2e:
- any: ['dolphinscheduler-e2e/**/*']
test:
- any: ['dolphinscheduler-api-test/**/*']

38
.github/actions/sanity-check/action.yml

@ -0,0 +1,38 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
name: "Sanity Check"
description: |
Action to perform some very basic lightweight checks, like code styles, license headers, etc.,
and fail fast to avoid wasting resources running heavyweight checks, like unit tests, e2e tests.
inputs:
token:
description: 'The GitHub API token'
required: false
runs:
using: "composite"
steps:
- name: Check License Header
uses: apache/skywalking-eyes@30367d8286e324d5efc58de4c70c37ea3648306d
- shell: bash
run: ./mvnw spotless:check

46
.github/mergeable.yml

@ -0,0 +1,46 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
---
version: 2
mergeable:
# we can not use `pull_request.*` which including event `pull_request.labeled`, according to https://github.com/mergeability/mergeable/issues/643,
# otherwise mergeable will keep add or remove label endless, we just need this CI act like the default behavior as
# GitHub action workflow `pull_requests` https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request like,
# which only trigger runs when a pull_request event's activity type is opened, synchronize, or reopened
- when: pull_request.opened, pull_request.reopened, pull_request.synchronize
name: synchronize change for sql files
validate:
# Sql files must change synchronize
- do: dependent
files:
- 'dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_h2.sql'
- 'dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_mysql.sql'
- 'dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_postgresql.sql'
message: 'Sql files not change synchronize'
# Add labels 'sql not sync' and comment to reviewers if Sql files not change synchronize
fail:
- do: comment
payload:
body: >
:warning: This PR do not change database DDL synchronize.
leave_old_comment: false
- do: labels
add: 'sql not sync'
# Remove labels 'sql not sync' if pass
pass:
- do: labels
delete: 'sql not sync'

142
.github/workflows/api-test.yml

@ -0,0 +1,142 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
on:
pull_request:
push:
branches:
- dev
name: API-Test
concurrency:
group: api-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
paths-filter:
name: API-Test-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
build:
name: API-Test-Build
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Sanity Check
uses: ./.github/actions/sanity-check
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Cache local Maven repository
uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- name: Build Image
run: |
./mvnw -B clean install \
-Dmaven.test.skip \
-Dmaven.javadoc.skip \
-Dcheckstyle.skip=true \
-Pdocker,release -Ddocker.tag=ci \
-pl dolphinscheduler-standalone-server -am
- name: Export Docker Images
run: |
docker save apache/dolphinscheduler-standalone-server:ci -o /tmp/standalone-image.tar \
&& du -sh /tmp/standalone-image.tar
- uses: actions/upload-artifact@v2
name: Upload Docker Images
with:
name: standalone-image
path: /tmp/standalone-image.tar
retention-days: 1
api-test:
name: ${{ matrix.case.name }}
needs: build
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
matrix:
case:
- name: Tenant
class: org.apache.dolphinscheduler.api.test.cases.TenantAPITest
env:
RECORDING_PATH: /tmp/recording-${{ matrix.case.name }}
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Cache local Maven repository
uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- uses: actions/download-artifact@v2
name: Download Docker Images
with:
name: standalone-image
path: /tmp
- name: Load Docker Images
run: |
docker load -i /tmp/standalone-image.tar
- name: Run Test
run: |
./mvnw -B -f dolphinscheduler-api-test/pom.xml -am \
-DfailIfNoTests=false \
-Dcheckstyle.skip=false \
-Dtest=${{ matrix.case.class }} test
- uses: actions/upload-artifact@v2
if: always()
name: Upload Recording
with:
name: recording-${{ matrix.case.name }}
path: ${{ env.RECORDING_PATH }}
retention-days: 1
result:
name: API-Test-Result
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ api-test, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip API Test!"
exit 0
fi
if [[ ${{ needs.api-test.result }} != 'success' ]]; then
echo "API test Failed!"
exit -1
fi

138
.github/workflows/backend.yml

@ -0,0 +1,138 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Backend
on:
push:
branches:
- dev
paths:
- '.github/workflows/backend.yml'
- 'package.xml'
- 'pom.xml'
- 'dolphinscheduler-alert/**'
- 'dolphinscheduler-api/**'
- 'dolphinscheduler-common/**'
- 'dolphinscheduler-dao/**'
- 'dolphinscheduler-rpc/**'
pull_request:
concurrency:
group: backend-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
paths-filter:
name: Backend-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
build:
name: Backend-Build
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
strategy:
matrix:
java: [ '8', '11' ]
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Set up JDK ${{ matrix.java }}
uses: actions/setup-java@v2
with:
java-version: ${{ matrix.java }}
distribution: 'adopt'
- name: Sanity Check
uses: ./.github/actions/sanity-check
with:
token: ${{ secrets.GITHUB_TOKEN }}
- uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven
- name: Build and Package on ${{ matrix.java }}
run: |
./mvnw -B clean install \
-Prelease,docker \
-Dmaven.test.skip=true \
-Dcheckstyle.skip=true \
-Dhttp.keepAlive=false \
-Dmaven.wagon.http.pool=false \
-Dmaven.wagon.httpconnectionManager.ttlSeconds=120
- name: Check dependency license
run: tools/dependencies/check-LICENSE.sh
- uses: actions/upload-artifact@v2
if: ${{ matrix.java == '8' }}
name: Upload Binary Package
with:
name: binary-package-${{ matrix.java }}
path: ./dolphinscheduler-dist/target/apache-dolphinscheduler-*-SNAPSHOT-bin.tar.gz
retention-days: 1
cluster-test:
name: ${{ matrix.case.name }}
needs: build
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
matrix:
case:
- name: cluster-test-mysql
script: .github/workflows/cluster-test/mysql/start-job.sh
- name: cluster-test-postgresql
script: .github/workflows/cluster-test/postgresql/start-job.sh
steps:
- uses: actions/checkout@v2
with:
submodules: true
- uses: actions/download-artifact@v2
name: Download Binary Package
with:
# Only run cluster test on jdk8
name: binary-package-8
path: ./
- name: Running cluster test
run: |
/bin/bash ${{ matrix.case.script }}
result:
name: Build
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ build, paths-filter, cluster-test ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip Build!"
exit 0
fi
if [[ ${{ needs.build.result }} != 'success' || ${{ needs.cluster-test.result }} != 'success' ]]; then
echo "Build Failed!"
exit -1
fi

48
.github/workflows/cluster-test/mysql/Dockerfile

@ -0,0 +1,48 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM eclipse-temurin:8-jre
RUN apt update ; \
apt install -y wget default-mysql-client sudo openssh-server netcat-traditional ;
COPY ./apache-dolphinscheduler-*-SNAPSHOT-bin.tar.gz /root
RUN tar -zxvf /root/apache-dolphinscheduler-*-SNAPSHOT-bin.tar.gz -C ~
RUN mv /root/apache-dolphinscheduler-*-SNAPSHOT-bin /root/apache-dolphinscheduler-test-SNAPSHOT-bin
ENV DOLPHINSCHEDULER_HOME /root/apache-dolphinscheduler-test-SNAPSHOT-bin
#Setting install.sh
COPY .github/workflows/cluster-test/mysql/install_env.sh $DOLPHINSCHEDULER_HOME/bin/env/install_env.sh
#Setting dolphinscheduler_env.sh
COPY .github/workflows/cluster-test/mysql/dolphinscheduler_env.sh $DOLPHINSCHEDULER_HOME/bin/env/dolphinscheduler_env.sh
#Download mysql jar
ENV MYSQL_URL "https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar"
ENV MYSQL_DRIVER "mysql-connector-java-8.0.16.jar"
RUN wget -O $DOLPHINSCHEDULER_HOME/alert-server/libs/$MYSQL_DRIVER $MYSQL_URL ; \
cp $DOLPHINSCHEDULER_HOME/alert-server/libs/$MYSQL_DRIVER $DOLPHINSCHEDULER_HOME/api-server/libs/$MYSQL_DRIVER ; \
cp $DOLPHINSCHEDULER_HOME/alert-server/libs/$MYSQL_DRIVER $DOLPHINSCHEDULER_HOME/master-server/libs/$MYSQL_DRIVER ; \
cp $DOLPHINSCHEDULER_HOME/alert-server/libs/$MYSQL_DRIVER $DOLPHINSCHEDULER_HOME/worker-server/libs/$MYSQL_DRIVER ; \
cp $DOLPHINSCHEDULER_HOME/alert-server/libs/$MYSQL_DRIVER $DOLPHINSCHEDULER_HOME/tools/libs/$MYSQL_DRIVER
#Deploy
COPY .github/workflows/cluster-test/mysql/deploy.sh /root/deploy.sh
CMD [ "/bin/bash", "/root/deploy.sh" ]

43
.github/workflows/cluster-test/mysql/deploy.sh

@ -0,0 +1,43 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -euox pipefail
USER=root
#Create database
mysql -hmysql -P3306 -uroot -p123456 -e "CREATE DATABASE IF NOT EXISTS dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;"
#Sudo
sed -i '$a'$USER' ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
#SSH
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
service ssh start
#Init schema
/bin/bash $DOLPHINSCHEDULER_HOME/tools/bin/upgrade-schema.sh
#Start Cluster
/bin/bash $DOLPHINSCHEDULER_HOME/bin/start-all.sh
#Keep running
tail -f /dev/null

65
.github/workflows/cluster-test/mysql/docker-compose-base.yaml

@ -0,0 +1,65 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
version: "3"
services:
mysql:
container_name: mysql
image: mysql:5.7.36
command: --default-authentication-plugin=mysql_native_password
restart: always
environment:
MYSQL_ROOT_PASSWORD: 123456
ports:
- "3306:3306"
healthcheck:
test: mysqladmin ping -h 127.0.0.1 -u root --password=$$MYSQL_ROOT_PASSWORD
interval: 5s
timeout: 60s
retries: 120
zoo1:
image: zookeeper:3.8.0
restart: always
hostname: zoo1
ports:
- "2181:2181"
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
zoo2:
image: zookeeper:3.8.0
restart: always
hostname: zoo2
ports:
- "2182:2181"
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
zoo3:
image: zookeeper:3.8.0
restart: always
hostname: zoo3
ports:
- "2183:2181"
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181

29
.github/workflows/cluster-test/mysql/docker-compose-cluster.yaml

@ -0,0 +1,29 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
version: "3"
services:
ds:
container_name: ds
image: jdk8:ds_mysql_cluster
restart: always
ports:
- "12345:12345"
- "5679:5679"
- "1235:1235"
- "50053:50053"

47
.github/workflows/cluster-test/mysql/dolphinscheduler_env.sh

@ -0,0 +1,47 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://mysql:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
export SPRING_DATASOURCE_USERNAME=root
export SPRING_DATASOURCE_PASSWORD=123456
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-zoo1:2181,zoo2:2182,zoo3:2183}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH

61
.github/workflows/cluster-test/mysql/install_env.sh

@ -0,0 +1,61 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
ips=${ips:-"localhost"}
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"localhost"}
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"localhost:default"}
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"localhost"}
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"localhost"}
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd)
installPath=${installPath:-"/root/apache-dolphinscheduler-*-SNAPSHOT-bin"}
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"dolphinscheduler"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
zkRoot=${zkRoot:-"/dolphinscheduler"}

91
.github/workflows/cluster-test/mysql/running_test.sh

@ -0,0 +1,91 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -x
API_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:12345/dolphinscheduler/actuator/health"
MASTER_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:5679/actuator/health"
WORKER_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:1235/actuator/health"
ALERT_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:50053/actuator/health"
#Cluster start health check
TIMEOUT=120
START_HEALTHCHECK_EXITCODE=0
for ((i=1; i<=TIMEOUT; i++))
do
MASTER_HTTP_STATUS=$(eval "$MASTER_HEALTHCHECK_COMMAND")
WORKER_HTTP_STATUS=$(eval "$WORKER_HEALTHCHECK_COMMAND")
API_HTTP_STATUS=$(eval "$API_HEALTHCHECK_COMMAND")
ALERT_HTTP_STATUS=$(eval "$ALERT_HEALTHCHECK_COMMAND")
if [[ $MASTER_HTTP_STATUS -eq 200 && $WORKER_HTTP_STATUS -eq 200 && $API_HTTP_STATUS -eq 200 && $ALERT_HTTP_STATUS -eq 200 ]];then
START_HEALTHCHECK_EXITCODE=0
else
START_HEALTHCHECK_EXITCODE=2
fi
if [[ $START_HEALTHCHECK_EXITCODE -eq 0 ]];then
echo "cluster start health check success"
break
fi
if [[ $i -eq $TIMEOUT ]];then
docker exec -u root ds bash -c "cat /root/apache-dolphinscheduler-*-SNAPSHOT-bin/master-server/logs/dolphinscheduler-master.log"
echo "cluster start health check failed"
exit $START_HEALTHCHECK_EXITCODE
fi
sleep 1
done
#Stop Cluster
docker exec -u root ds bash -c "/root/apache-dolphinscheduler-*-SNAPSHOT-bin/bin/stop-all.sh"
#Cluster stop health check
sleep 5
MASTER_HTTP_STATUS=$(eval "$MASTER_HEALTHCHECK_COMMAND")
if [[ $MASTER_HTTP_STATUS -ne 200 ]];then
echo "master stop health check success"
else
echo "master stop health check failed"
exit 3
fi
WORKER_HTTP_STATUS=$(eval "$WORKER_HEALTHCHECK_COMMAND")
if [[ $WORKER_HTTP_STATUS -ne 200 ]];then
echo "worker stop health check success"
else
echo "worker stop health check failed"
exit 3
fi
API_HTTP_STATUS=$(eval "$API_HEALTHCHECK_COMMAND")
if [[ $API_HTTP_STATUS -ne 200 ]];then
echo "api stop health check success"
else
echo "api stop health check failed"
exit 3
fi
ALERT_HTTP_STATUS=$(eval "$ALERT_HEALTHCHECK_COMMAND")
if [[ $ALERT_HTTP_STATUS -ne 200 ]];then
echo "alert stop health check success"
else
echo "alert stop health check failed"
exit 3
fi

33
.github/workflows/cluster-test/mysql/start-job.sh

@ -0,0 +1,33 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -euox pipefail
#Start base service containers
docker-compose -f .github/workflows/cluster-test/mysql/docker-compose-base.yaml up -d
#Build ds mysql cluster image
docker build -t jdk8:ds_mysql_cluster -f .github/workflows/cluster-test/mysql/Dockerfile .
#Start ds mysql cluster container
docker-compose -f .github/workflows/cluster-test/mysql/docker-compose-cluster.yaml up -d
#Running tests
/bin/bash .github/workflows/cluster-test/mysql/running_test.sh
#Cleanup
docker rm -f $(docker ps -aq)

39
.github/workflows/cluster-test/postgresql/Dockerfile

@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM eclipse-temurin:8-jre
RUN apt update ; \
apt install -y wget sudo openssh-server netcat-traditional ;
COPY ./apache-dolphinscheduler-*-SNAPSHOT-bin.tar.gz /root
RUN tar -zxvf /root/apache-dolphinscheduler-*-SNAPSHOT-bin.tar.gz -C ~
RUN mv /root/apache-dolphinscheduler-*-SNAPSHOT-bin /root/apache-dolphinscheduler-test-SNAPSHOT-bin
ENV DOLPHINSCHEDULER_HOME /root/apache-dolphinscheduler-test-SNAPSHOT-bin
#Setting install.sh
COPY .github/workflows/cluster-test/postgresql/install_env.sh $DOLPHINSCHEDULER_HOME/bin/env/install_env.sh
#Setting dolphinscheduler_env.sh
COPY .github/workflows/cluster-test/postgresql/dolphinscheduler_env.sh $DOLPHINSCHEDULER_HOME/bin/env/dolphinscheduler_env.sh
#Deploy
COPY .github/workflows/cluster-test/postgresql/deploy.sh /root/deploy.sh
CMD [ "/bin/bash", "/root/deploy.sh" ]

40
.github/workflows/cluster-test/postgresql/deploy.sh

@ -0,0 +1,40 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -euox pipefail
USER=root
#Sudo
sed -i '$a'$USER' ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
#SSH
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
service ssh start
#Init schema
/bin/bash $DOLPHINSCHEDULER_HOME/tools/bin/upgrade-schema.sh
#Start Cluster
/bin/bash $DOLPHINSCHEDULER_HOME/bin/start-all.sh
#Keep running
tail -f /dev/null

65
.github/workflows/cluster-test/postgresql/docker-compose-base.yaml

@ -0,0 +1,65 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
version: "3"
services:
postgres:
container_name: postgres
image: postgres:14.1
restart: always
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: dolphinscheduler
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 60s
retries: 120
zoo1:
image: zookeeper:3.8.0
restart: always
hostname: zoo1
ports:
- "2181:2181"
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
zoo2:
image: zookeeper:3.8.0
restart: always
hostname: zoo2
ports:
- "2182:2181"
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
zoo3:
image: zookeeper:3.8.0
restart: always
hostname: zoo3
ports:
- "2183:2181"
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181

29
.github/workflows/cluster-test/postgresql/docker-compose-cluster.yaml

@ -0,0 +1,29 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
version: "3"
services:
ds:
container_name: ds
image: jdk8:ds_postgresql_cluster
restart: always
ports:
- "12345:12345"
- "5679:5679"
- "1235:1235"
- "50053:50053"

47
.github/workflows/cluster-test/postgresql/dolphinscheduler_env.sh

@ -0,0 +1,47 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:postgresql://postgres:5432/dolphinscheduler"
export SPRING_DATASOURCE_USERNAME=postgres
export SPRING_DATASOURCE_PASSWORD=postgres
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-zoo1:2181,zoo2:2182,zoo3:2183}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH

61
.github/workflows/cluster-test/postgresql/install_env.sh

@ -0,0 +1,61 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
ips=${ips:-"localhost"}
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"localhost"}
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"localhost:default"}
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"localhost"}
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"localhost"}
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd)
installPath=${installPath:-"/root/apache-dolphinscheduler-*-SNAPSHOT-bin"}
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"dolphinscheduler"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
zkRoot=${zkRoot:-"/dolphinscheduler"}

91
.github/workflows/cluster-test/postgresql/running_test.sh

@ -0,0 +1,91 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -x
API_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:12345/dolphinscheduler/actuator/health"
MASTER_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:5679/actuator/health"
WORKER_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:1235/actuator/health"
ALERT_HEALTHCHECK_COMMAND="curl -I -m 10 -o /dev/null -s -w %{http_code} http://0.0.0.0:50053/actuator/health"
#Cluster start health check
TIMEOUT=120
START_HEALTHCHECK_EXITCODE=0
for ((i=1; i<=TIMEOUT; i++))
do
MASTER_HTTP_STATUS=$(eval "$MASTER_HEALTHCHECK_COMMAND")
WORKER_HTTP_STATUS=$(eval "$WORKER_HEALTHCHECK_COMMAND")
API_HTTP_STATUS=$(eval "$API_HEALTHCHECK_COMMAND")
ALERT_HTTP_STATUS=$(eval "$ALERT_HEALTHCHECK_COMMAND")
if [[ $MASTER_HTTP_STATUS -eq 200 && $WORKER_HTTP_STATUS -eq 200 && $API_HTTP_STATUS -eq 200 && $ALERT_HTTP_STATUS -eq 200 ]];then
START_HEALTHCHECK_EXITCODE=0
else
START_HEALTHCHECK_EXITCODE=2
fi
if [[ $START_HEALTHCHECK_EXITCODE -eq 0 ]];then
echo "cluster start health check success"
break
fi
if [[ $i -eq $TIMEOUT ]];then
docker exec -u root ds bash -c "cat /root/apache-dolphinscheduler-*-SNAPSHOT-bin/master-server/logs/dolphinscheduler-master.log"
echo "cluster start health check failed"
exit $START_HEALTHCHECK_EXITCODE
fi
sleep 1
done
#Stop Cluster
docker exec -u root ds bash -c "/root/apache-dolphinscheduler-*-SNAPSHOT-bin/bin/stop-all.sh"
#Cluster stop health check
sleep 5
MASTER_HTTP_STATUS=$(eval "$MASTER_HEALTHCHECK_COMMAND")
if [[ $MASTER_HTTP_STATUS -ne 200 ]];then
echo "master stop health check success"
else
echo "master stop health check failed"
exit 3
fi
WORKER_HTTP_STATUS=$(eval "$WORKER_HEALTHCHECK_COMMAND")
if [[ $WORKER_HTTP_STATUS -ne 200 ]];then
echo "worker stop health check success"
else
echo "worker stop health check failed"
exit 3
fi
API_HTTP_STATUS=$(eval "$API_HEALTHCHECK_COMMAND")
if [[ $API_HTTP_STATUS -ne 200 ]];then
echo "api stop health check success"
else
echo "api stop health check failed"
exit 3
fi
ALERT_HTTP_STATUS=$(eval "$ALERT_HEALTHCHECK_COMMAND")
if [[ $ALERT_HTTP_STATUS -ne 200 ]];then
echo "alert stop health check success"
else
echo "alert stop health check failed"
exit 3
fi

33
.github/workflows/cluster-test/postgresql/start-job.sh

@ -0,0 +1,33 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -euox pipefail
#Start base service containers
docker-compose -f .github/workflows/cluster-test/postgresql/docker-compose-base.yaml up -d
#Build ds postgresql cluster image
docker build -t jdk8:ds_postgresql_cluster -f .github/workflows/cluster-test/postgresql/Dockerfile .
#Start ds postgresql cluster container
docker-compose -f .github/workflows/cluster-test/postgresql/docker-compose-cluster.yaml up -d
#Running tests
/bin/bash .github/workflows/cluster-test/postgresql/running_test.sh
#Cleanup
docker rm -f $(docker ps -aq)

64
.github/workflows/docs.yml

@ -0,0 +1,64 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Docs
on:
pull_request:
paths:
- '.github/workflows/docs.yml'
- '**/*.md'
- 'docs/**'
- '.dlc.json'
schedule:
- cron: '0 18 * * *' # TimeZone: UTC 0
concurrency:
group: doc-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
img-check:
name: Image Check
timeout-minutes: 15
runs-on: ubuntu-latest
defaults:
run:
working-directory: docs
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Run Dev Relative Reference
run: python img_utils.py -v dev-syntax
- name: Run Image Check
run: python img_utils.py -v check
dead-link:
name: Dead Link
if: (github.event_name == 'schedule' && github.repository == 'apache/dolphinscheduler') || (github.event_name != 'schedule')
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- run: sudo npm install -g markdown-link-check@3.10.0
# NOTE: Change command from `find . -name "*.md"` to `find . -not -path "*/node_modules/*" -not -path "*/.tox/*" -name "*.md"`
# if you want to run check locally
- run: |
for file in $(find . -name "*.md"); do
markdown-link-check -c .dlc.json -q "$file"
done

175
.github/workflows/e2e.yml

@ -0,0 +1,175 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
on:
pull_request:
push:
branches:
- dev
name: E2E
concurrency:
group: e2e-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
paths-filter:
name: E2E-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
build:
name: E2E-Build
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Sanity Check
uses: ./.github/actions/sanity-check
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Cache local Maven repository
uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- name: Build Image
run: |
./mvnw -B clean install \
-Dmaven.test.skip \
-Dmaven.javadoc.skip \
-Dcheckstyle.skip=true \
-Pdocker,release -Ddocker.tag=ci \
-pl dolphinscheduler-standalone-server -am
- name: Export Docker Images
run: |
docker save apache/dolphinscheduler-standalone-server:ci -o /tmp/standalone-image.tar \
&& du -sh /tmp/standalone-image.tar
- uses: actions/upload-artifact@v2
name: Upload Docker Images
with:
name: standalone-image
path: /tmp/standalone-image.tar
retention-days: 1
e2e:
name: ${{ matrix.case.name }}
needs: build
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
matrix:
case:
- name: Tenant
class: org.apache.dolphinscheduler.e2e.cases.TenantE2ETest
- name: User
class: org.apache.dolphinscheduler.e2e.cases.UserE2ETest
- name: WorkerGroup
class: org.apache.dolphinscheduler.e2e.cases.WorkerGroupE2ETest
- name: Project
class: org.apache.dolphinscheduler.e2e.cases.ProjectE2ETest
- name: Queue
class: org.apache.dolphinscheduler.e2e.cases.QueueE2ETest
- name: Environment
class: org.apache.dolphinscheduler.e2e.cases.EnvironmentE2ETest
- name: Cluster
class: org.apache.dolphinscheduler.e2e.cases.ClusterE2ETest
- name: Token
class: org.apache.dolphinscheduler.e2e.cases.TokenE2ETest
- name: Workflow
class: org.apache.dolphinscheduler.e2e.cases.WorkflowE2ETest
# - name: WorkflowForSwitch
# class: org.apache.dolphinscheduler.e2e.cases.WorkflowSwitchE2ETest
- name: FileManage
class: org.apache.dolphinscheduler.e2e.cases.FileManageE2ETest
- name: UdfManage
class: org.apache.dolphinscheduler.e2e.cases.UdfManageE2ETest
- name: FunctionManage
class: org.apache.dolphinscheduler.e2e.cases.FunctionManageE2ETest
- name: MysqlDataSource
class: org.apache.dolphinscheduler.e2e.cases.MysqlDataSourceE2ETest
- name: ClickhouseDataSource
class: org.apache.dolphinscheduler.e2e.cases.ClickhouseDataSourceE2ETest
- name: PostgresDataSource
class: org.apache.dolphinscheduler.e2e.cases.PostgresDataSourceE2ETest
- name: SqlServerDataSource
class: org.apache.dolphinscheduler.e2e.cases.SqlServerDataSourceE2ETest
- name: HiveDataSource
class: org.apache.dolphinscheduler.e2e.cases.HiveDataSourceE2ETest
env:
RECORDING_PATH: /tmp/recording-${{ matrix.case.name }}
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Cache local Maven repository
uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- uses: actions/download-artifact@v2
name: Download Docker Images
with:
name: standalone-image
path: /tmp
- name: Load Docker Images
run: |
docker load -i /tmp/standalone-image.tar
- name: Run Test
run: |
./mvnw -B -f dolphinscheduler-e2e/pom.xml -am \
-DfailIfNoTests=false \
-Dtest=${{ matrix.case.class }} test
- uses: actions/upload-artifact@v2
if: always()
name: Upload Recording
with:
name: recording-${{ matrix.case.name }}
path: ${{ env.RECORDING_PATH }}
retention-days: 1
result:
name: E2E
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ e2e, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip E2E!"
exit 0
fi
if [[ ${{ needs.e2e.result }} != 'success' ]]; then
echo "E2E Failed!"
exit -1
fi

64
.github/workflows/frontend.yml

@ -0,0 +1,64 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Frontend
on:
push:
branches:
- dev
paths:
- '.github/workflows/frontend.yml'
- 'dolphinscheduler-ui/**'
pull_request:
paths:
- '.github/workflows/frontend.yml'
- 'dolphinscheduler-ui/**'
defaults:
run:
working-directory: dolphinscheduler-ui
concurrency:
group: frontend-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: Build
runs-on: ${{ matrix.os }}
timeout-minutes: 20
strategy:
matrix:
os: [ ubuntu-latest, macos-latest ]
steps:
- uses: actions/checkout@v2
with:
submodules: true
- if: matrix.os == 'ubuntu-latest'
name: Sanity Check
uses: ./.github/actions/sanity-check
- name: Set up Node.js
uses: actions/setup-node@v2
with:
node-version: 16
- name: Compile and Build
run: |
npm install pnpm -g
pnpm install
pnpm run lint
pnpm run build:prod

47
.github/workflows/issue-robot.yml

@ -0,0 +1,47 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: issue-robot
on:
issues:
types: [opened]
jobs:
issueRobot:
runs-on: ubuntu-latest
steps:
- name: "Checkout ${{ github.ref }}"
uses: actions/checkout@v2
with:
persist-credentials: false
submodules: true
- name: "Translation into English in issue"
uses: ./.github/actions/translate-on-issue
with:
translate-title: true
translate-body: true
- name: "Comment in issue"
uses: ./.github/actions/comment-on-issue
with:
message: |
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
* In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
* If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

48
.github/workflows/owasp-dependency-check.yaml

@ -0,0 +1,48 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: OWASP Dependency Check
on:
push:
pull_request:
paths:
- '**/pom.xml'
env:
MAVEN_OPTS: -Dmaven.wagon.httpconnectionManager.ttlSeconds=25 -Dmaven.wagon.http.retryHandler.count=3
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Set up JDK 8
uses: actions/setup-java@v2
with:
java-version: 8
distribution: 'adopt'
- name: Run OWASP Dependency Check
run: ./mvnw -B clean install verify dependency-check:check -DskipDepCheck=false -Dmaven.test.skip=true -Dcheckstyle.skip=true
- name: Upload report
uses: actions/upload-artifact@v3
if: ${{ cancelled() || failure() }}
continue-on-error: true
with:
name: dependency report
path: target/dependency-check-report.html

78
.github/workflows/publish-docker.yaml

@ -0,0 +1,78 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: publish-docker
on:
push:
branches:
- dev
release:
types:
- released
jobs:
build:
if: github.repository == 'apache/dolphinscheduler'
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-
- name: Set environment variables
run: |
if [[ ${{ github.event_name }} == "release" ]]; then
echo "DOCKER_REGISTRY=docker.io" >> $GITHUB_ENV
echo "DOCKER_USERNAME=${{ secrets.DOCKERHUB_USER }}" >> $GITHUB_ENV
echo "DOCKER_PASSWORD=${{ secrets.DOCKERHUB_TOKEN }}" >> $GITHUB_ENV
echo "HUB=apache" >> $GITHUB_ENV
echo "DOCKER_TAG=${{ github.event.release.tag_name }}" >> $GITHUB_ENV
else
echo "DOCKER_REGISTRY=ghcr.io/apache/dolphinscheduler" >> $GITHUB_ENV
echo "DOCKER_USERNAME=${{ github.actor }}" >> $GITHUB_ENV
echo "DOCKER_PASSWORD=${{ secrets.GITHUB_TOKEN }}" >> $GITHUB_ENV
echo "HUB=ghcr.io/apache/dolphinscheduler" >> $GITHUB_ENV
echo "DOCKER_TAG=${{ github.sha }}" >> $GITHUB_ENV
fi
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ env.DOCKER_USERNAME }}
password: ${{ env.DOCKER_PASSWORD }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build and push docker images
run: |
./mvnw -B clean deploy \
-Dmaven.test.skip \
-Dmaven.javadoc.skip \
-Dcheckstyle.skip=true \
-Dmaven.deploy.skip \
-Ddocker.tag=${{ env.DOCKER_TAG }} \
-Ddocker.hub=${{ env.HUB }} \
-Pdocker,release

69
.github/workflows/publish-helm-chart.yaml

@ -0,0 +1,69 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: publish-helm-chart
on:
push:
branches:
- dev
release:
types:
- released
env:
HUB: ghcr.io/apache/dolphinscheduler
jobs:
build:
if: github.repository == 'apache/dolphinscheduler'
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- name: Set environment variables
run: |
# TODO
if [[ ${{ github.event_name }} == "release" ]]; then
echo "HUB=registry-1.docker.io/apache" >> $GITHUB_ENV
echo "DOCKER_REGISTRY=docker.io" >> $GITHUB_ENV
echo "DOCKER_USERNAME=${{ secrets.DOCKERHUB_USER }}" >> $GITHUB_ENV
echo "DOCKER_PASSWORD=${{ secrets.DOCKERHUB_TOKEN }}" >> $GITHUB_ENV
else
echo "HUB=ghcr.io/apache/dolphinscheduler" >> $GITHUB_ENV
echo "DOCKER_REGISTRY=c" >> $GITHUB_ENV
echo "DOCKER_USERNAME=${{ github.actor }}" >> $GITHUB_ENV
echo "DOCKER_PASSWORD=${{ secrets.GITHUB_TOKEN }}" >> $GITHUB_ENV
fi
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ env.DOCKER_USERNAME }}
password: ${{ env.DOCKER_PASSWORD }}
- name: Publish Helm Chart
working-directory: deploy/kubernetes
run: |
if [[ ${{ env.HUB }} == "ghcr.io/apache/dolphinscheduler" ]]; then
VERSION=0.0.0-$(git rev-parse --short HEAD)
sed -i "s/^version: .*/version: $VERSION/" dolphinscheduler/Chart.yaml
fi
helm dep up dolphinscheduler
helm package dolphinscheduler
helm push dolphinscheduler-helm-*.tgz oci://${{ env.HUB }}

41
.github/workflows/pull-request-robot.yml

@ -0,0 +1,41 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: "pull-request-robot"
on:
pull_request_target:
jobs:
labelRobot:
permissions:
contents: read
pull-requests: write
runs-on: ubuntu-latest
steps:
- name: "Checkout ${{ github.ref }}"
uses: actions/checkout@v2
with:
persist-credentials: false
submodules: true
- name: "Label in pull request"
uses: actions/labeler@v4
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
configuration-path: .github/actions/labeler/labeler.yml
sync-labels: true

52
.github/workflows/stale.yml

@ -0,0 +1,52 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# https://github.com/actions/stale
name: 'Close stale issues and PRs'
on:
schedule:
- cron: '0 0 * * *'
permissions:
# Stale recommended permissions
pull-requests: write
issues: write
jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v4
with:
# Stale Issues
days-before-issue-stale: 30
days-before-issue-close: 7
# We do not stale Issues with label `Waiting for reply`, `new feature` and `DSIP`
exempt-issue-labels: 'Waiting for reply,new feature,DSIP,security'
stale-issue-message: >
This issue has been automatically marked as stale because it has not had recent activity
for 30 days. It will be closed in next 7 days if no further activity occurs.
close-issue-message: >
This issue has been closed because it has not received response for too long time. You could
reopen it if you encountered similar problems in the future.
# Stale PRs
days-before-pr-stale: 120
days-before-pr-close: 7
stale-pr-message: >
This pull request has been automatically marked as stale because it has not had recent
activity for 120 days. It will be closed in 7 days if no further activity occurs.
close-pr-message: >
This pull request has been closed because it has not had recent activity. You could reopen it
if you try to continue your work, and anyone who are interested in it are encouraged to continue
work on this pull request.

133
.github/workflows/unit-test.yml

@ -0,0 +1,133 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Test
on:
pull_request:
push:
paths-ignore:
- '**/*.md'
- 'dolphinscheduler-ui'
branches:
- dev
env:
LOG_DIR: /tmp/dolphinscheduler
concurrency:
group: unit-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
paths-filter:
name: Unit-Test-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
unit-test:
name: Unit-Test
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
strategy:
matrix:
java: ['8', '11']
timeout-minutes: 45
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: Sanity Check
uses: ./.github/actions/sanity-check
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Set up JDK ${{ matrix.java }}
uses: actions/setup-java@v2
with:
java-version: ${{ matrix.java }}
distribution: 'adopt'
- uses: actions/cache@v3
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven
- name: Run Unit tests
run: ./mvnw clean verify -B -Dmaven.test.skip=false -Dcheckstyle.skip=true
- name: Upload coverage report to codecov
run: CODECOV_TOKEN="09c2663f-b091-4258-8a47-c981827eb29a" bash <(curl -s https://codecov.io/bash)
# Set up JDK 11 for SonarCloud.
- name: Set up JDK 11
uses: actions/setup-java@v2
with:
java-version: 11
distribution: 'adopt'
- name: Run SonarCloud Analysis
run: >
./mvnw --batch-mode verify sonar:sonar
-Dsonar.coverage.jacoco.xmlReportPaths=target/site/jacoco/jacoco.xml
-Dmaven.test.skip=true
-Dcheckstyle.skip=true
-Dsonar.host.url=https://sonarcloud.io
-Dsonar.organization=apache
-Dsonar.core.codeCoveragePlugin=jacoco
-Dsonar.projectKey=apache-dolphinscheduler
-Dsonar.login=e4058004bc6be89decf558ac819aa1ecbee57682
-Dsonar.exclusions=,dolphinscheduler-ui/src/**/i18n/locale/*.js,dolphinscheduler-microbench/src/**/*
-Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false -Dmaven.wagon.httpconnectionManager.ttlSeconds=120
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
- name: Collect logs
continue-on-error: true
run: |
mkdir -p ${LOG_DIR}
docker-compose -f $(pwd)/docker/docker-swarm/docker-compose.yml logs dolphinscheduler-postgresql > ${LOG_DIR}/db.txt
- name: Upload logs
uses: actions/upload-artifact@v2
continue-on-error: true
with:
name: unit-test-logs
path: ${LOG_DIR}
result:
name: Unit Test
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ unit-test, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip Unit Test!"
exit 0
fi
if [[ ${{ needs.unit-test.result }} != 'success' ]]; then
echo "Unit Test Failed!"
exit -1
fi

52
.gitignore

@ -0,0 +1,52 @@
.git
.svn
.hg
.zip
.gz
.DS_Store
.target
.idea/
.run/
target/
dist/
all-dependencies.txt
self-modules.txt
third-party-dependencies.txt
.settings
.nbproject
.classpath
.project
*.iml
*.ipr
*.iws
*.tgz
.*.swp
.vim
.tmp
node_modules
npm-debug.log
.vscode
logs/*
.mvn/
.www
t.*
.factorypath
Chart.lock
yarn.lock
package-lock.json
config.gypi
test/coverage
/docs/zh_CN/介绍
/docs/zh_CN/贡献代码.md
dolphinscheduler-common/src/main/resources/zookeeper.properties
dolphinscheduler-dao/src/main/resources/dao/data_source.properties
dolphinscheduler-alert/logs/
dolphinscheduler-alert/src/main/resources/alert.properties_bak
dolphinscheduler-alert/src/main/resources/logback.xml
dolphinscheduler-ui/dist
dolphinscheduler-ui/node
dolphinscheduler-common/sql
dolphinscheduler-common/test
dolphinscheduler-worker/logs
dolphinscheduler-master/logs
dolphinscheduler-api/logs

23
.gitmodules

@ -0,0 +1,23 @@
# Licensed to Apache Software Foundation (ASF) under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Apache Software Foundation (ASF) licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
[submodule ".github/actions/comment-on-issue"]
path = .github/actions/comment-on-issue
url = https://github.com/xingchun-chen/actions-comment-on-issue
[submodule ".github/actions/translate-on-issue"]
path = .github/actions/translate-on-issue
url = https://github.com/xingchun-chen/translation-helper

50
.licenserc.yaml

@ -0,0 +1,50 @@
# Licensed to Apache Software Foundation (ASF) under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Apache Software Foundation (ASF) licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
header:
license:
spdx-id: Apache-2.0
copyright-owner: Apache Software Foundation
paths-ignore:
- dist
- NOTICE
- LICENSE
- DISCLAIMER
- dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/utils/ScriptRunner.java
- dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/utils/CodeGenerateUtils.java
- mvnw.cmd
- dolphinscheduler-dao/src/main/resources/sql/soft_version
- .mvn
- .gitattributes
- '**/licenses/**/LICENSE-*'
- '**/*.md'
- '**/*.svg'
- '**/*.json'
- '**/*.iml'
- '**/*.ini'
- '**/.babelrc'
- '**/.eslintignore'
- '**/.gitignore'
- '**/LICENSE'
- '**/NOTICE'
- '**/node_modules/**'
- '.github/actions/comment-on-issue/**'
- '.github/actions/translate-on-issue/**'
- '**/.gitkeep'
comment: on-failure

24
CONTRIBUTING.md

@ -0,0 +1,24 @@
Please refer to the contribution document [How to contribute](docs/docs/en/contribute/join/contribute.md)
## How to Build
```bash
./mvnw clean install -Prelease
```
### Build with different Zookeeper versions
The default Zookeeper Server version supported is 3.8.0.
```bash
# Default Zookeeper 3.8.0
./mvnw clean install -Prelease
# Support to Zookeeper 3.4.6+
./mvnw clean install -Prelease -Dzk-3.4
```
Artifact:
```
dolphinscheduler-dist/target/apache-dolphinscheduler-${latest.release.version}-bin.tar.gz: Binary package of DolphinScheduler
dolphinscheduler-dist/target/apache-dolphinscheduler-${latest.release.version}-src.tar.gz: Source code package of DolphinScheduler
```

222
LICENSE

@ -0,0 +1,222 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
=======================================================================
Apache DolphinScheduler Subcomponents:
The Apache DolphinScheduler project contains subcomponents with separate copyright
notices and license terms. Your use of the source code for the these
subcomponents is subject to the terms and conditions of the following
licenses.
========================================================================
Apache 2.0 licenses
========================================================================
The following components are provided under the Apache License. See project link for details.
The text of each license is the standard Apache 2.0 license.
ScriptRunner from https://github.com/mybatis/mybatis-3 Apache 2.0
mvnw files from https://github.com/apache/maven-wrapper Apache 2.0
PropertyPlaceholderHelper from https://github.com/spring-projects/spring-framework Apache 2.0
DolphinPluginClassLoader from https://github.com/prestosql/presto Apache 2.0
DolphinPluginDiscovery from https://github.com/prestosql/presto Apache 2.0
DolphinPluginLoader from https://github.com/prestosql/presto Apache 2.0
CodeGenerateUtils from https://github.com/twitter-archive/snowflake/tree/snowflake-2010 Apache 2.0

87
NOTICE

@ -0,0 +1,87 @@
Apache DolphinScheduler
Copyright 2019-2023 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
mybatis-3
iBATIS
This product includes software developed by
The Apache Software Foundation (http://www.apache.org/).
Copyright 2010 The Apache Software Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
OGNL
//--------------------------------------------------------------------------
// Copyright (c) 2004, Drew Davidson and Luke Blanshard
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
// Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// Neither the name of the Drew Davidson nor the names of its contributors
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
// OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
// AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
// OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
// DAMAGE.
//--------------------------------------------------------------------------
Refactored SqlBuilder class (SQL, AbstractSQL)
This product includes software developed by
Adam Gent (https://gist.github.com/3650165)
Copyright 2010 Adam Gent
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Spring Framework ${version}
Copyright (c) 2002-${copyright} Pivotal, Inc.
This product is licensed to you under the Apache License, Version 2.0
(the "License"). You may not use this product except in compliance with
the License.
This product may include a number of subcomponents with separate
copyright notices and license terms. Your use of the source code for
these subcomponents is subject to the terms and conditions of the
subcomponent's license, as noted in the license.txt file.

83
README.md

@ -0,0 +1,83 @@
# Apache Dolphinscheduler
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=apache-dolphinscheduler&metric=alert_status)](https://sonarcloud.io/dashboard?id=apache-dolphinscheduler)
[![Twitter Follow](https://img.shields.io/twitter/follow/dolphinschedule.svg?style=social&label=Follow)](https://twitter.com/dolphinschedule)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/dolphinscheduler-slack)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md)
## About
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code. It is also provided powerful user interface,
dedicated to solving complex task dependencies in the data pipeline and providing various types of jobs available **out of the box**
The key features for DolphinScheduler are as follows:
- Easy to deploy, provide four ways to deploy which including Standalone, Cluster, Docker and Kubernetes.
- Easy to use, workflow can be created and managed by four ways, which including Web UI, [Python SDK](https://dolphinscheduler.apache.org/python/main/index.html), Yaml file and Open API
- Highly reliable and high availability, decentralized architecture with multi-master and multi-worker, native supports horizontal scaling.
- High performance, its performance is N times faster than other orchestration platform and it can support tens of millions of tasks per day
- Cloud Native, DolphinScheduler supports orchestrating multi-cloud/data center workflow, and supports custom task type
- Versioning both workflow and workflow instance(including tasks)
- Various state control of workflow and task, support pause/stop/recover them in any time
- Multi-tenancy support
- Others like backfill support(Web UI native), permission control including project, resource and data source
## QuickStart
- For quick experience
- Want to [start with standalone](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/installation/standalone)
- Want to [start with Docker](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/start/docker)
- For Kubernetes
- [Start with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/installation/kubernetes)
| Stability | Accessibility | Features | Scalability |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.  |  Support pause, recover operation | Support customized task types |
| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
## User Interface Screenshots
* **Homepage:** Project and workflow overview, including the latest workflow instance and task instance status statistics.
![home](images/home.png)
* **Workflow Definition:** Create and manage workflow by drag and drop, easy to build and maintain complex workflow, support [bulk of tasks](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/introduction-to-functions_menu/task_menu) out of box.
![workflow-definition](images/workflow-definition.png)
* **Workflow Tree View:** Abstract tree structure could clearer understanding of the relationship between tasks
![workflow-tree](images/workflow-tree.png)
* **Data source:** Manage support multiple external data sources, provide unified data access capabilities for such as MySQL, PostgreSQL, Hive, Trino, etc.
![data-source](images/data-source.png)
* **Monitor:** View the status of the master, worker and database in real time, including server resource usage and load, do quick health check without logging in to the server.
![monitor](images/monitor.png)
## Suggestions & Bug Reports
Follow [this guide](https://github.com/apache/dolphinscheduler/issues/new/choose) to report your suggestions or bugs.
## Contributing
The community welcomes everyone to contribute, please refer to this page to find out more: [How to contribute](docs/docs/en/contribute/join/contribute.md),
find the good first issue in [here](https://github.com/apache/dolphinscheduler/contribute) if you are new to DolphinScheduler.
## Community
Welcome to join the Apache DolphinScheduler community by:
- Join the [DolphinScheduler Slack](https://s.apache.org/dolphinscheduler-slack) to keep in touch with the community
- Follow the [DolphinScheduler Twitter](https://twitter.com/dolphinschedule) and get the latest news
- Subscribe DolphinScheduler mail list, users@dolphinscheduler.apache.org for user and dev@dolphinscheduler.apache.org for developer
# Landscapes
<p align="center">
<br/><br/>
<img src="https://landscape.cncf.io/images/left-logo.svg" width="150"/>&nbsp;&nbsp;<img src="https://landscape.cncf.io/images/right-logo.svg" width="200"/>
<br/><br/>
DolphinScheduler enriches the <a href="https://landscape.cncf.io/?landscape=observability-and-analysis&license=apache-license-2-0">CNCF CLOUD NATIVE Landscape.</a >
</p >

76
README_zh_CN.md

@ -0,0 +1,76 @@
# Apache Dolphinscheduler
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=apache-dolphinscheduler&metric=alert_status)](https://sonarcloud.io/dashboard?id=apache-dolphinscheduler)
[![Twitter Follow](https://img.shields.io/twitter/follow/dolphinschedule.svg?style=social&label=Follow)](https://twitter.com/dolphinschedule)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/dolphinscheduler-slack)
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)
## 关于
一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中`开箱即用`。
DolphinScheduler 的主要特性如下:
- 易于部署,提供四种部署方式,包括Standalone、Cluster、Docker和Kubernetes。
- 易于使用,可以通过四种方式创建和管理工作流,包括Web UI、[Python SDK](https://dolphinscheduler.apache.org/python/main/index.html)、Yaml文件和Open API
- 高可靠高可用,多主多从的去中心化架构,原生支持横向扩展。
- 高性能,性能比其他编排平台快N倍,每天可支持千万级任务
- Cloud Native,DolphinScheduler支持编排多云/数据中心工作流,支持自定义任务类型
- 对工作流和工作流实例(包括任务)进行版本控制
- 工作流和任务的多种状态控制,支持随时暂停/停止/恢复它们
- 多租户支持
- 其他如回填支持(Web UI 原生),包括项目、资源和数据源的权限控制
## 快速开始
- 如果想要体验
- [standalone 启动](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/installation/standalone)
- [Docker 启动](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/start/docker)
- 想 Kubernetes 部署
- [Kubernetes 部署](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/installation/kubernetes)
## 系统部分截图
* **主页**:项目和工作流概览,包括最新的工作流实例和任务实例状态统计。
![home](images/home.png)
* **工作流定义**: 通过拖拉拽创建和管理工作流,轻松构建和维护复杂的工作流。
![workflow-definition](images/workflow-definition.png)
* **工作流树状图**: 抽象的树形结构可以更清晰的理解任务之间的关系
![workflow-tree](images/workflow-tree.png)
* **数据源**: 管理支持多种外部数据源,为MySQL、PostgreSQL、Hive、Trino等,并提供统一的数据访问能力。
![data-source](images/data-source.png)
* **监控**:实时查看master、worker和数据库的状态,包括服务器资源使用情况和负载情况,无需登录服务器即可快速进行健康检查。
![monitor](images/monitor.png)
## 建议和报告 bugs
根据 [这个步骤](https://github.com/apache/dolphinscheduler/issues/new/choose) 来报告你的 bug 或者提交建议。
## 参与贡献
社区欢迎大家贡献,请参考此页面了解更多:[如何贡献](docs/docs/zh/contribute/join/contribute.md),在[这里](https://github.com/apache/dolphinscheduler/contribute)可以找到good first issue
如果你是首次贡献 dolphinscheduler。
## 社区
欢迎通过以方式加入社区:
- 加入 [DolphinScheduler Slack](https://s.apache.org/dolphinscheduler-slack)
- 关注 [DolphinScheduler Twitter](https://twitter.com/dolphinschedule) 来获取最新消息
- 订阅 DolphinScheduler 邮件列表, 用户订阅 users@dolphinscheduler.apache.org 开发者请订阅 dev@dolphinscheduler.apache.org
# Landscapes
<p align="center">
<br/><br/>
<img src="https://landscape.cncf.io/images/left-logo.svg" width="150"/>&nbsp;&nbsp;<img src="https://landscape.cncf.io/images/right-logo.svg" width="200"/>
<br/><br/>
DolphinScheduler enriches the <a href="https://landscape.cncf.io/?landscape=observability-and-analysis&license=apache-license-2-0">CNCF CLOUD NATIVE Landscape.</a >
</p >

5
deploy/README.md

@ -0,0 +1,5 @@
# DolphinScheduler for Docker and Kubernetes
* [Start Up DolphinScheduler with Docker](https://dolphinscheduler.apache.org/en-us/docs/3.1.2/guide/start/docker)
* [Start Up DolphinScheduler with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/3.1.2/guide/installation/kubernetes)

24
deploy/docker/.env

@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
HUB=apache
TAG=3.1.9
TZ=Asia/Shanghai
DATABASE=postgresql
SPRING_DATASOURCE_URL=jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler
REGISTRY_ZOOKEEPER_CONNECT_STRING=dolphinscheduler-zookeeper:2181

154
deploy/docker/docker-compose.yml

@ -0,0 +1,154 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3.8"
services:
dolphinscheduler-postgresql:
image: bitnami/postgresql:11.11.0
ports:
- "5432:5432"
profiles: ["all", "schema"]
environment:
POSTGRESQL_USERNAME: root
POSTGRESQL_PASSWORD: root
POSTGRESQL_DATABASE: dolphinscheduler
volumes:
- dolphinscheduler-postgresql:/bitnami/postgresql
healthcheck:
test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/5432"]
interval: 5s
timeout: 60s
retries: 120
networks:
- dolphinscheduler
dolphinscheduler-zookeeper:
image: bitnami/zookeeper:3.6.2
profiles: ["all"]
environment:
ALLOW_ANONYMOUS_LOGIN: "yes"
ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons
volumes:
- dolphinscheduler-zookeeper:/bitnami/zookeeper
healthcheck:
test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/2181"]
interval: 5s
timeout: 60s
retries: 120
networks:
- dolphinscheduler
dolphinscheduler-schema-initializer:
image: ${HUB}/dolphinscheduler-tools:${TAG}
env_file: .env
profiles: ["schema"]
command: [ tools/bin/upgrade-schema.sh ]
depends_on:
dolphinscheduler-postgresql:
condition: service_healthy
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
dolphinscheduler-api:
image: ${HUB}/dolphinscheduler-api:${TAG}
ports:
- "12345:12345"
- "25333:25333"
profiles: ["all"]
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:12345/dolphinscheduler/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
depends_on:
dolphinscheduler-zookeeper:
condition: service_healthy
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
dolphinscheduler-alert:
image: ${HUB}/dolphinscheduler-alert-server:${TAG}
profiles: ["all"]
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
networks:
- dolphinscheduler
dolphinscheduler-master:
image: ${HUB}/dolphinscheduler-master:${TAG}
profiles: ["all"]
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:5679/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
depends_on:
dolphinscheduler-zookeeper:
condition: service_healthy
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
networks:
- dolphinscheduler
dolphinscheduler-worker:
image: ${HUB}/dolphinscheduler-worker:${TAG}
profiles: ["all"]
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
depends_on:
dolphinscheduler-zookeeper:
condition: service_healthy
volumes:
- dolphinscheduler-worker-data:/tmp/dolphinscheduler
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
networks:
dolphinscheduler:
driver: bridge
volumes:
dolphinscheduler-postgresql:
dolphinscheduler-zookeeper:
dolphinscheduler-worker-data:
dolphinscheduler-logs:
dolphinscheduler-shared-local:
dolphinscheduler-resource-local:

132
deploy/docker/docker-stack.yml

@ -0,0 +1,132 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3.1"
services:
dolphinscheduler-postgresql:
image: bitnami/postgresql:11.11.0
environment:
TZ: Asia/Shanghai
POSTGRESQL_USERNAME: root
POSTGRESQL_PASSWORD: root
POSTGRESQL_DATABASE: dolphinscheduler
volumes:
- dolphinscheduler-postgresql:/bitnami/postgresql
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
dolphinscheduler-zookeeper:
image: bitnami/zookeeper:3.6.2
environment:
TZ: Asia/Shanghai
ALLOW_ANONYMOUS_LOGIN: "yes"
ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons
volumes:
- dolphinscheduler-zookeeper:/bitnami/zookeeper
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
dolphinscheduler-api:
image: apache/dolphinscheduler-api
ports:
- 12345:12345
- 25333:25333
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:12345/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
dolphinscheduler-alert:
image: apache/dolphinscheduler-alert-server
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
dolphinscheduler-master:
image: apache/dolphinscheduler-master
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
dolphinscheduler-worker:
image: apache/dolphinscheduler-worker
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-worker-data:/tmp/dolphinscheduler
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
networks:
dolphinscheduler:
driver: overlay
volumes:
dolphinscheduler-postgresql:
dolphinscheduler-zookeeper:
dolphinscheduler-worker-data:
dolphinscheduler-logs:
dolphinscheduler-shared-local:
dolphinscheduler-resource-local:

62
deploy/kubernetes/dolphinscheduler/Chart.yaml

@ -0,0 +1,62 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v2
name: dolphinscheduler-helm
description: Dolphin Scheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
home: https://dolphinscheduler.apache.org
icon: https://dolphinscheduler.apache.org/img/hlogo_colorful.svg
keywords:
- dolphinscheduler
- scheduler
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 3.1.9
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 3.1.9
dependencies:
- name: postgresql
version: 10.3.18
# Due to a change in the Bitnami repo, https://charts.bitnami.com/bitnami was truncated only
# containing entries for the latest 6 months (from January 2022 on).
# This URL: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
# contains the full 'index.yaml'.
# See detail here: https://github.com/bitnami/charts/issues/10833
repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
condition: postgresql.enabled
- name: zookeeper
version: 6.5.3
# Same as above.
repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
condition: zookeeper.enabled
- name: mysql
version: 9.4.1
repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
condition: mysql.enabled

21
deploy/kubernetes/dolphinscheduler/resources/config/common.properties

@ -0,0 +1,21 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if index .Values.conf "common" }}
{{- range $key, $value := index .Values.conf "common" }}
{{ $key }}={{ $value }}
{{- end }}
{{- end }}

50
deploy/kubernetes/dolphinscheduler/templates/NOTES.txt

@ -0,0 +1,50 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
** Please be patient while the chart DolphinScheduler {{ .Chart.AppVersion }} is being deployed **
Access DolphinScheduler UI URL by:
{{- if .Values.ingress.enabled }}
DolphinScheduler UI URL: http{{ if .Values.ingress.tls.enabled }}s{{ end }}://{{ .Values.ingress.host }}/dolphinscheduler
{{- else if eq .Values.api.service.type "ClusterIP" }}
kubectl port-forward -n {{ .Release.Namespace }} svc/{{ template "dolphinscheduler.fullname" . }}-api 12345:12345
DolphinScheduler UI URL: http://127.0.0.1:12345/dolphinscheduler
{{- else if eq .Values.api.service.type "NodePort" }}
NODE_IP=$(kubectl get no -n {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
NODE_PORT=$(kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}")
echo http://$NODE_IP:$NODE_PORT/dolphinscheduler
DolphinScheduler UI URL: http://$NODE_IP:$NODE_PORT/dolphinscheduler
{{- else if eq .Values.api.service.type "LoadBalancer" }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -w'
SERVICE_IP=$(kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:12345/dolphinscheduler
DolphinScheduler UI URL: http://$SERVICE_IP:12345/dolphinscheduler
{{- end }}

245
deploy/kubernetes/dolphinscheduler/templates/_helpers.tpl

@ -0,0 +1,245 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{/* vim: set filetype=mustache: */}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "dolphinscheduler.fullname" -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create default docker images' fullname.
*/}}
{{- define "dolphinscheduler.image.fullname.master" -}}
{{- .Values.image.registry }}/{{ .Values.image.master }}:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{- define "dolphinscheduler.image.fullname.worker" -}}
{{- .Values.image.registry }}/{{ .Values.image.worker }}:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{- define "dolphinscheduler.image.fullname.api" -}}
{{- .Values.image.registry }}/{{ .Values.image.api }}:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{- define "dolphinscheduler.image.fullname.alert" -}}
{{- .Values.image.registry }}/{{ .Values.image.alert }}:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{- define "dolphinscheduler.image.fullname.tools" -}}
{{- .Values.image.registry }}/{{ .Values.image.tools }}:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{/*
Create a default common labels.
*/}}
{{- define "dolphinscheduler.common.labels" -}}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
{{- end -}}
{{/*
Create a master labels.
*/}}
{{- define "dolphinscheduler.master.labels" -}}
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-master
app.kubernetes.io/component: master
{{ include "dolphinscheduler.common.labels" . }}
{{- end -}}
{{/*
Create a worker labels.
*/}}
{{- define "dolphinscheduler.worker.labels" -}}
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-worker
app.kubernetes.io/component: worker
{{ include "dolphinscheduler.common.labels" . }}
{{- end -}}
{{/*
Create an alert labels.
*/}}
{{- define "dolphinscheduler.alert.labels" -}}
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-alert
app.kubernetes.io/component: alert
{{ include "dolphinscheduler.common.labels" . }}
{{- end -}}
{{/*
Create an api labels.
*/}}
{{- define "dolphinscheduler.api.labels" -}}
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-api
app.kubernetes.io/component: api
{{ include "dolphinscheduler.common.labels" . }}
{{- end -}}
{{/*
Create a default fully qualified postgresql name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
*/}}
{{- define "dolphinscheduler.postgresql.fullname" -}}
{{- $name := default "postgresql" .Values.postgresql.nameOverride -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified mysql name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
*/}}
{{- define "dolphinscheduler.mysql.fullname" -}}
{{- $name := default "mysql" .Values.mysql.nameOverride -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified zookeeper name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
*/}}
{{- define "dolphinscheduler.zookeeper.fullname" -}}
{{- $name := default "zookeeper" .Values.zookeeper.nameOverride -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified zookkeeper quorum.
*/}}
{{- define "dolphinscheduler.zookeeper.quorum" -}}
{{- $port := default "2181" .Values.zookeeper.service.port | toString -}}
{{- printf "%s:%s" (include "dolphinscheduler.zookeeper.fullname" .) $port -}}
{{- end -}}
{{/*
Create a database environment variables.
*/}}
{{- define "dolphinscheduler.database.env_vars" -}}
- name: DATABASE
{{- if .Values.postgresql.enabled }}
value: "postgresql"
{{- else if .Values.mysql.enabled }}
value: "mysql"
{{- else }}
value: {{ .Values.externalDatabase.type | quote }}
{{- end }}
- name: SPRING_DATASOURCE_URL
{{- if .Values.postgresql.enabled }}
value: jdbc:postgresql://{{ template "dolphinscheduler.postgresql.fullname" . }}:5432/{{ .Values.postgresql.postgresqlDatabase }}?{{ .Values.postgresql.params }}
{{- else if .Values.mysql.enabled }}
value: jdbc:mysql://{{ template "dolphinscheduler.mysql.fullname" . }}:3306/{{ .Values.mysql.auth.database }}?{{ .Values.mysql.auth.params }}
{{- else }}
value: jdbc:{{ .Values.externalDatabase.type }}://{{ .Values.externalDatabase.host }}:{{ .Values.externalDatabase.port }}/{{ .Values.externalDatabase.database }}?{{ .Values.externalDatabase.params }}
{{- end }}
- name: SPRING_DATASOURCE_USERNAME
{{- if .Values.postgresql.enabled }}
value: {{ .Values.postgresql.postgresqlUsername }}
{{- else if .Values.mysql.enabled }}
value: {{ .Values.mysql.auth.username }}
{{- else }}
value: {{ .Values.externalDatabase.username | quote }}
{{- end }}
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
{{- if .Values.postgresql.enabled }}
name: {{ template "dolphinscheduler.postgresql.fullname" . }}
key: postgresql-password
{{- else if .Values.mysql.enabled }}
name: {{ template "dolphinscheduler.mysql.fullname" . }}
key: mysql-password
{{- else }}
name: {{ include "dolphinscheduler.fullname" . }}-externaldb
key: database-password
{{- end }}
{{- end -}}
{{/*
Wait for database to be ready.
*/}}
{{- define "dolphinscheduler.database.wait-for-ready" -}}
- name: wait-for-database
image: busybox:1.30
imagePullPolicy: IfNotPresent
{{- if .Values.postgresql.enabled }}
command: ['sh', '-xc', 'for i in $(seq 1 180); do nc -z -w3 {{ template "dolphinscheduler.postgresql.fullname" . }} 5432 && exit 0 || sleep 5; done; exit 1']
{{- else if .Values.mysql.enabled }}
command: ['sh', '-xc', 'for i in $(seq 1 180); do nc -z -w3 {{ template "dolphinscheduler.mysql.fullname" . }} 3306 && exit 0 || sleep 5; done; exit 1']
{{- else }}
command: ['sh', '-xc', 'for i in $(seq 1 180); do nc -z -w3 {{ .Values.externalDatabase.host }} {{ .Values.externalDatabase.port }} && exit 0 || sleep 5; done; exit 1']
{{- end }}
{{- end -}}
{{/*
Create a registry environment variables.
*/}}
{{- define "dolphinscheduler.registry.env_vars" -}}
- name: REGISTRY_TYPE
{{- if .Values.zookeeper.enabled }}
value: "zookeeper"
{{- else }}
value: {{ .Values.externalRegistry.registryPluginName }}
{{- end }}
- name: REGISTRY_ZOOKEEPER_CONNECT_STRING
{{- if .Values.zookeeper.enabled }}
value: {{ template "dolphinscheduler.zookeeper.quorum" . }}
{{- else }}
value: {{ .Values.externalRegistry.registryServers }}
{{- end }}
{{- end -}}
{{/*
Create a sharedStoragePersistence volume.
*/}}
{{- define "dolphinscheduler.sharedStorage.volume" -}}
{{- if .Values.common.sharedStoragePersistence.enabled -}}
- name: {{ include "dolphinscheduler.fullname" . }}-shared
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-shared
{{- end -}}
{{- end -}}
{{/*
Create a sharedStoragePersistence volumeMount.
*/}}
{{- define "dolphinscheduler.sharedStorage.volumeMount" -}}
{{- if .Values.common.sharedStoragePersistence.enabled -}}
- mountPath: {{ .Values.common.sharedStoragePersistence.mountPath | quote }}
name: {{ include "dolphinscheduler.fullname" . }}-shared
{{- end -}}
{{- end -}}
{{/*
Create a fsFileResourcePersistence volume.
*/}}
{{- define "dolphinscheduler.fsFileResource.volume" -}}
{{- if .Values.common.fsFileResourcePersistence.enabled -}}
- name: {{ include "dolphinscheduler.fullname" . }}-fs-file
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-fs-file
{{- end -}}
{{- end -}}
{{/*
Create a fsFileResourcePersistence volumeMount.
*/}}
{{- define "dolphinscheduler.fsFileResource.volumeMount" -}}
{{- if .Values.common.fsFileResourcePersistence.enabled -}}
- mountPath: {{ default "/dolphinscheduler" .Values.common.configmap.RESOURCE_UPLOAD_PATH | quote }}
name: {{ include "dolphinscheduler.fullname" . }}-fs-file
{{- end -}}
{{- end -}}

29
deploy/kubernetes/dolphinscheduler/templates/configmap-dolphinscheduler-common.yaml

@ -0,0 +1,29 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.common.configmap }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-common
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-common
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
data:
{{- range $key, $value := .Values.common.configmap }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}

26
deploy/kubernetes/dolphinscheduler/templates/configmap.yaml

@ -0,0 +1,26 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-configs
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-common
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
data:
common_properties: |-
{{ tpl (.Files.Get "resources/config/common.properties") . | indent 4 }}

122
deploy/kubernetes/dolphinscheduler/templates/deployment-dolphinscheduler-alert.yaml

@ -0,0 +1,122 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-alert
labels:
{{- include "dolphinscheduler.alert.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.alert.replicas }}
selector:
matchLabels:
{{- include "dolphinscheduler.alert.labels" . | nindent 6 }}
strategy:
type: {{ .Values.alert.strategy.type | quote }}
rollingUpdate:
maxSurge: {{ .Values.alert.strategy.rollingUpdate.maxSurge | quote }}
maxUnavailable: {{ .Values.alert.strategy.rollingUpdate.maxUnavailable | quote }}
template:
metadata:
labels:
{{- include "dolphinscheduler.alert.labels" . | nindent 8 }}
{{- if .Values.alert.annotations }}
annotations:
{{- toYaml .Values.alert.annotations | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ template "dolphinscheduler.fullname" . }}
{{- if .Values.alert.affinity }}
affinity:
{{- toYaml .Values.alert.affinity | nindent 8 }}
{{- end }}
{{- if .Values.alert.nodeSelector }}
nodeSelector:
{{- toYaml .Values.alert.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.alert.tolerations }}
tolerations:
{{- toYaml .Values.alert.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.image.pullSecret }}
{{- end }}
containers:
- name: {{ include "dolphinscheduler.fullname" . }}-alert
image: {{ include "dolphinscheduler.image.fullname.alert" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 50052
name: "alert-port"
- containerPort: 50053
name: "actuator-port"
env:
- name: TZ
value: {{ .Values.timezone }}
- name: SPRING_JACKSON_TIME_ZONE
value: {{ .Values.timezone }}
{{- include "dolphinscheduler.database.env_vars" . | nindent 12 }}
{{- include "dolphinscheduler.registry.env_vars" . | nindent 12 }}
{{ range $key, $value := .Values.alert.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{ end }}
envFrom:
- configMapRef:
name: {{ include "dolphinscheduler.fullname" . }}-common
{{- if .Values.alert.resources }}
resources:
{{- toYaml .Values.alert.resources | nindent 12 }}
{{- end }}
{{- if .Values.alert.livenessProbe.enabled }}
livenessProbe:
exec:
command: ["curl", "-s", "http://localhost:50053/actuator/health/liveness"]
initialDelaySeconds: {{ .Values.alert.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.alert.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.alert.livenessProbe.timeoutSeconds }}
successThreshold: {{ .Values.alert.livenessProbe.successThreshold }}
failureThreshold: {{ .Values.alert.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.alert.readinessProbe.enabled }}
readinessProbe:
exec:
command: ["curl", "-s", "http://localhost:50053/actuator/health/readiness"]
initialDelaySeconds: {{ .Values.alert.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.alert.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.alert.readinessProbe.timeoutSeconds }}
successThreshold: {{ .Values.alert.readinessProbe.successThreshold }}
failureThreshold: {{ .Values.alert.readinessProbe.failureThreshold }}
{{- end }}
volumeMounts:
- mountPath: "/opt/dolphinscheduler/logs"
name: {{ include "dolphinscheduler.fullname" . }}-alert
- name: config-volume
mountPath: /opt/dolphinscheduler/conf/common.properties
subPath: common_properties
volumes:
- name: {{ include "dolphinscheduler.fullname" . }}-alert
{{- if .Values.alert.persistentVolumeClaim.enabled }}
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-alert
{{- else }}
emptyDir: {}
{{- end }}
- name: config-volume
configMap:
name: {{ include "dolphinscheduler.fullname" . }}-configs

126
deploy/kubernetes/dolphinscheduler/templates/deployment-dolphinscheduler-api.yaml

@ -0,0 +1,126 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-api
labels:
{{- include "dolphinscheduler.api.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.api.replicas }}
selector:
matchLabels:
{{- include "dolphinscheduler.api.labels" . | nindent 6 }}
strategy:
type: {{ .Values.api.strategy.type | quote }}
rollingUpdate:
maxSurge: {{ .Values.api.strategy.rollingUpdate.maxSurge | quote }}
maxUnavailable: {{ .Values.api.strategy.rollingUpdate.maxUnavailable | quote }}
template:
metadata:
labels:
{{- include "dolphinscheduler.api.labels" . | nindent 8 }}
{{- if .Values.api.annotations }}
annotations:
{{- toYaml .Values.api.annotations | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ template "dolphinscheduler.fullname" . }}
{{- if .Values.api.affinity }}
affinity:
{{- toYaml .Values.api.affinity | nindent 8 }}
{{- end }}
{{- if .Values.api.nodeSelector }}
nodeSelector:
{{- toYaml .Values.api.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.api.tolerations }}
tolerations:
{{- toYaml .Values.api.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.image.pullSecret }}
{{- end }}
containers:
- name: {{ include "dolphinscheduler.fullname" . }}-api
image: {{ include "dolphinscheduler.image.fullname.api" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 12345
name: "api-port"
- containerPort: 25333
name: "python-api-port"
env:
- name: TZ
value: {{ .Values.timezone }}
- name: SPRING_JACKSON_TIME_ZONE
value: {{ .Values.timezone }}
{{- include "dolphinscheduler.database.env_vars" . | nindent 12 }}
{{- include "dolphinscheduler.registry.env_vars" . | nindent 12 }}
{{ range $key, $value := .Values.api.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{ end }}
envFrom:
- configMapRef:
name: {{ include "dolphinscheduler.fullname" . }}-common
{{- if .Values.api.resources }}
resources:
{{- toYaml .Values.api.resources | nindent 12 }}
{{- end }}
{{- if .Values.api.livenessProbe.enabled }}
livenessProbe:
exec:
command: ["curl", "-s", "http://localhost:12345/dolphinscheduler/actuator/health/liveness"]
initialDelaySeconds: {{ .Values.api.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.api.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.api.livenessProbe.timeoutSeconds }}
successThreshold: {{ .Values.api.livenessProbe.successThreshold }}
failureThreshold: {{ .Values.api.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.api.readinessProbe.enabled }}
readinessProbe:
exec:
command: ["curl", "-s", "http://localhost:12345/dolphinscheduler/actuator/health/readiness"]
initialDelaySeconds: {{ .Values.api.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.api.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.api.readinessProbe.timeoutSeconds }}
successThreshold: {{ .Values.api.readinessProbe.successThreshold }}
failureThreshold: {{ .Values.api.readinessProbe.failureThreshold }}
{{- end }}
volumeMounts:
- mountPath: "/opt/dolphinscheduler/logs"
name: {{ include "dolphinscheduler.fullname" . }}-api
- name: config-volume
mountPath: /opt/dolphinscheduler/conf/common.properties
subPath: common_properties
{{- include "dolphinscheduler.sharedStorage.volumeMount" . | nindent 12 }}
{{- include "dolphinscheduler.fsFileResource.volumeMount" . | nindent 12 }}
volumes:
- name: {{ include "dolphinscheduler.fullname" . }}-api
{{- if .Values.api.persistentVolumeClaim.enabled }}
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-api
{{- else }}
emptyDir: {}
{{- end }}
- name: config-volume
configMap:
name: {{ include "dolphinscheduler.fullname" . }}-configs
{{- include "dolphinscheduler.sharedStorage.volume" . | nindent 8 }}
{{- include "dolphinscheduler.fsFileResource.volume" . | nindent 8 }}

60
deploy/kubernetes/dolphinscheduler/templates/ingress.yaml

@ -0,0 +1,60 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.ingress.enabled }}
{{- if .Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
apiVersion: networking.k8s.io/v1
{{- else if .Capabilities.APIVersions.Has "networking.k8s.io/v1beta1/Ingress" }}
apiVersion: networking.k8s.io/v1beta1
{{- else }}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ include "dolphinscheduler.fullname" . }}
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
rules:
- host: "{{ .Values.ingress.host }}"
http:
paths:
- path: {{ .Values.ingress.path }}
backend:
{{- if .Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
service:
name: {{ include "dolphinscheduler.fullname" . }}-api
port:
name: api-port
{{- else }}
serviceName: {{ include "dolphinscheduler.fullname" . }}-api
servicePort: api-port
{{- end }}
{{- if .Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
pathType: Prefix
{{- end }}
{{- if .Values.ingress.tls.enabled }}
tls:
- hosts:
- {{ .Values.ingress.host }}
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
{{- end }}

51
deploy/kubernetes/dolphinscheduler/templates/job-dolphinscheduler-schema-initializer.yaml

@ -0,0 +1,51 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-db-init-job
labels:
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
template:
metadata:
labels:
{{- include "dolphinscheduler.common.labels" . | nindent 8 }}
spec:
{{- if .Values.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.image.pullSecret }}
{{- end }}
restartPolicy: Never
initContainers:
{{- include "dolphinscheduler.database.wait-for-ready" . | nindent 6 }}
containers:
- name: {{ include "dolphinscheduler.fullname" . }}-db-init-job
image: {{ include "dolphinscheduler.image.fullname.tools" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- tools/bin/upgrade-schema.sh
env:
- name: TZ
value: {{ .Values.timezone }}
- name: SPRING_JACKSON_TIME_ZONE
value: {{ .Values.timezone }}
{{- include "dolphinscheduler.database.env_vars" . | nindent 12 }}
{{- include "dolphinscheduler.registry.env_vars" . | nindent 12 }}
envFrom:
- configMapRef:
name: {{ include "dolphinscheduler.fullname" . }}-common

34
deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-alert.yaml

@ -0,0 +1,34 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.alert.persistentVolumeClaim.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-alert
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-alert
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
accessModes:
{{- range .Values.alert.persistentVolumeClaim.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.alert.persistentVolumeClaim.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.alert.persistentVolumeClaim.storage | quote }}
{{- end }}

34
deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-api.yaml

@ -0,0 +1,34 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.api.persistentVolumeClaim.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-api
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-api
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
accessModes:
{{- range .Values.api.persistentVolumeClaim.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.api.persistentVolumeClaim.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.api.persistentVolumeClaim.storage | quote }}
{{- end }}

36
deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-fs-file.yaml

@ -0,0 +1,36 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.common.fsFileResourcePersistence.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-fs-file
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-fs-file
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
annotations:
"helm.sh/resource-policy": keep
spec:
accessModes:
{{- range .Values.common.fsFileResourcePersistence.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.common.fsFileResourcePersistence.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.common.fsFileResourcePersistence.storage | quote }}
{{- end }}

36
deploy/kubernetes/dolphinscheduler/templates/pvc-dolphinscheduler-shared.yaml

@ -0,0 +1,36 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if .Values.common.sharedStoragePersistence.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-shared
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-shared
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
annotations:
"helm.sh/resource-policy": keep
spec:
accessModes:
{{- range .Values.common.sharedStoragePersistence.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.common.sharedStoragePersistence.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.common.sharedStoragePersistence.storage | quote }}
{{- end }}

53
deploy/kubernetes/dolphinscheduler/templates/rbac.yaml

@ -0,0 +1,53 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: {{ template "dolphinscheduler.fullname" . }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
release: {{ .Release.Name }}
name: {{ template "dolphinscheduler.fullname" . }}
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: {{ template "dolphinscheduler.fullname" . }}
labels:
app: {{ template "dolphinscheduler.fullname" . }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
release: "{{ .Release.Name }}"
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ template "dolphinscheduler.fullname" . }}
labels:
app: {{ template "dolphinscheduler.fullname" . }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
release: "{{ .Release.Name }}"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ template "dolphinscheduler.fullname" . }}
subjects:
- kind: ServiceAccount
name: {{ template "dolphinscheduler.fullname" . }}
namespace: {{ .Release.Namespace }}

28
deploy/kubernetes/dolphinscheduler/templates/secret-external-database.yaml

@ -0,0 +1,28 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
{{- if not .Values.postgresql.enabled }}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-externaldb
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-externaldb
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
type: Opaque
data:
database-password: {{ .Values.externalDatabase.password | b64enc | quote }}
{{- end }}

136
deploy/kubernetes/dolphinscheduler/templates/statefulset-dolphinscheduler-master.yaml

@ -0,0 +1,136 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-master
labels:
{{- include "dolphinscheduler.master.labels" . | nindent 4 }}
spec:
podManagementPolicy: {{ .Values.master.podManagementPolicy }}
replicas: {{ .Values.master.replicas }}
selector:
matchLabels:
{{- include "dolphinscheduler.master.labels" . | nindent 6 }}
serviceName: {{ template "dolphinscheduler.fullname" . }}-master-headless
template:
metadata:
labels:
{{- include "dolphinscheduler.master.labels" . | nindent 8 }}
{{- if .Values.master.annotations }}
annotations:
{{- toYaml .Values.master.annotations | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ template "dolphinscheduler.fullname" . }}
{{- if .Values.master.affinity }}
affinity:
{{- toYaml .Values.master.affinity | nindent 8 }}
{{- end }}
{{- if .Values.master.nodeSelector }}
nodeSelector:
{{- toYaml .Values.master.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.master.tolerations }}
tolerations:
{{- toYaml .Values.master.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.image.pullSecret }}
{{- end }}
containers:
- name: {{ include "dolphinscheduler.fullname" . }}-master
image: {{ include "dolphinscheduler.image.fullname.master" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 5678
name: "master-port"
env:
- name: TZ
value: {{ .Values.timezone }}
- name: SPRING_JACKSON_TIME_ZONE
value: {{ .Values.timezone }}
{{- include "dolphinscheduler.database.env_vars" . | nindent 12 }}
{{- include "dolphinscheduler.registry.env_vars" . | nindent 12 }}
{{ range $key, $value := .Values.master.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{ end }}
envFrom:
- configMapRef:
name: {{ include "dolphinscheduler.fullname" . }}-common
{{- if .Values.master.resources }}
resources:
{{- toYaml .Values.master.resources | nindent 12 }}
{{- end }}
{{- if .Values.master.livenessProbe.enabled }}
livenessProbe:
exec:
command: ["curl", "-s", "http://localhost:5679/actuator/health/liveness"]
initialDelaySeconds: {{ .Values.master.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.master.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.master.livenessProbe.timeoutSeconds }}
successThreshold: {{ .Values.master.livenessProbe.successThreshold }}
failureThreshold: {{ .Values.master.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.master.readinessProbe.enabled }}
readinessProbe:
exec:
command: ["curl", "-s", "http://localhost:5679/actuator/health/readiness"]
initialDelaySeconds: {{ .Values.master.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.master.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.master.readinessProbe.timeoutSeconds }}
successThreshold: {{ .Values.master.readinessProbe.successThreshold }}
failureThreshold: {{ .Values.master.readinessProbe.failureThreshold }}
{{- end }}
volumeMounts:
- mountPath: "/opt/dolphinscheduler/logs"
name: {{ include "dolphinscheduler.fullname" . }}-master
{{- include "dolphinscheduler.sharedStorage.volumeMount" . | nindent 12 }}
- name: config-volume
mountPath: /opt/dolphinscheduler/conf/common.properties
subPath: common_properties
volumes:
- name: {{ include "dolphinscheduler.fullname" . }}-master
{{- if .Values.master.persistentVolumeClaim.enabled }}
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-master
{{- else }}
emptyDir: {}
{{- end }}
{{- include "dolphinscheduler.sharedStorage.volume" . | nindent 8 }}
- name: config-volume
configMap:
name: {{ include "dolphinscheduler.fullname" . }}-configs
{{- if .Values.master.persistentVolumeClaim.enabled }}
volumeClaimTemplates:
- metadata:
name: {{ include "dolphinscheduler.fullname" . }}-master
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-master
{{- include "dolphinscheduler.common.labels" . | nindent 10 }}
spec:
accessModes:
{{- range .Values.master.persistentVolumeClaim.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.master.persistentVolumeClaim.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.master.persistentVolumeClaim.storage | quote }}
{{- end }}

167
deploy/kubernetes/dolphinscheduler/templates/statefulset-dolphinscheduler-worker.yaml

@ -0,0 +1,167 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-worker
labels:
{{- include "dolphinscheduler.worker.labels" . | nindent 4 }}
spec:
podManagementPolicy: {{ .Values.worker.podManagementPolicy }}
replicas: {{ .Values.worker.replicas }}
selector:
matchLabels:
{{- include "dolphinscheduler.worker.labels" . | nindent 6 }}
serviceName: {{ template "dolphinscheduler.fullname" . }}-worker-headless
template:
metadata:
labels:
{{- include "dolphinscheduler.worker.labels" . | nindent 8 }}
{{- if .Values.worker.annotations }}
annotations:
{{- toYaml .Values.worker.annotations | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ template "dolphinscheduler.fullname" . }}
{{- if .Values.worker.affinity }}
affinity:
{{- toYaml .Values.worker.affinity | nindent 8 }}
{{- end }}
{{- if .Values.worker.nodeSelector }}
nodeSelector:
{{- toYaml .Values.worker.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.worker.tolerations }}
tolerations:
{{- toYaml .Values.worker.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.image.pullSecret }}
{{- end }}
containers:
- name: {{ include "dolphinscheduler.fullname" . }}-worker
image: {{ include "dolphinscheduler.image.fullname.worker" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 1234
name: "worker-port"
env:
- name: TZ
value: {{ .Values.timezone }}
- name: SPRING_JACKSON_TIME_ZONE
value: {{ .Values.timezone }}
- name: WORKER_ALERT_LISTEN_HOST
value: {{ include "dolphinscheduler.fullname" . }}-alert
{{- include "dolphinscheduler.database.env_vars" . | nindent 12 }}
{{- include "dolphinscheduler.registry.env_vars" . | nindent 12 }}
{{ range $key, $value := .Values.worker.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{ end }}
envFrom:
- configMapRef:
name: {{ include "dolphinscheduler.fullname" . }}-common
{{- if .Values.worker.resources }}
resources:
{{- toYaml .Values.worker.resources | nindent 12 }}
{{- end }}
{{- if .Values.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command: ["curl", "-s", "http://localhost:1235/actuator/health/liveness"]
initialDelaySeconds: {{ .Values.worker.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.worker.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.worker.livenessProbe.timeoutSeconds }}
successThreshold: {{ .Values.worker.livenessProbe.successThreshold }}
failureThreshold: {{ .Values.worker.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.worker.readinessProbe.enabled }}
readinessProbe:
exec:
command: ["curl", "-s", "http://localhost:1235/actuator/health/readiness"]
initialDelaySeconds: {{ .Values.worker.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.worker.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.worker.readinessProbe.timeoutSeconds }}
successThreshold: {{ .Values.worker.readinessProbe.successThreshold }}
failureThreshold: {{ .Values.worker.readinessProbe.failureThreshold }}
{{- end }}
volumeMounts:
- mountPath: {{ default "/tmp/dolphinscheduler" .Values.common.configmap.DATA_BASEDIR_PATH | quote }}
name: {{ include "dolphinscheduler.fullname" . }}-worker-data
- mountPath: "/opt/dolphinscheduler/logs"
name: {{ include "dolphinscheduler.fullname" . }}-worker-logs
- name: config-volume
mountPath: /opt/dolphinscheduler/conf/common.properties
subPath: common_properties
{{- include "dolphinscheduler.sharedStorage.volumeMount" . | nindent 12 }}
{{- include "dolphinscheduler.fsFileResource.volumeMount" . | nindent 12 }}
volumes:
- name: {{ include "dolphinscheduler.fullname" . }}-worker-data
{{- if .Values.worker.persistentVolumeClaim.dataPersistentVolume.enabled }}
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-worker-data
{{- else }}
emptyDir: {}
{{- end }}
- name: {{ include "dolphinscheduler.fullname" . }}-worker-logs
{{- if .Values.worker.persistentVolumeClaim.logsPersistentVolume.enabled }}
persistentVolumeClaim:
claimName: {{ include "dolphinscheduler.fullname" . }}-worker-logs
{{- else }}
emptyDir: {}
{{- end }}
- name: config-volume
configMap:
name: {{ include "dolphinscheduler.fullname" . }}-configs
{{- include "dolphinscheduler.sharedStorage.volume" . | nindent 8 }}
{{- include "dolphinscheduler.fsFileResource.volume" . | nindent 8 }}
{{- if .Values.worker.persistentVolumeClaim.enabled }}
volumeClaimTemplates:
{{- if .Values.worker.persistentVolumeClaim.dataPersistentVolume.enabled }}
- metadata:
name: {{ include "dolphinscheduler.fullname" . }}-worker-data
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-worker-data
{{- include "dolphinscheduler.common.labels" . | nindent 10 }}
spec:
accessModes:
{{- range .Values.worker.persistentVolumeClaim.dataPersistentVolume.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.worker.persistentVolumeClaim.dataPersistentVolume.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.worker.persistentVolumeClaim.dataPersistentVolume.storage | quote }}
{{- end }}
{{- if .Values.worker.persistentVolumeClaim.logsPersistentVolume.enabled }}
- metadata:
name: {{ include "dolphinscheduler.fullname" . }}-worker-logs
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-worker-logs
{{- include "dolphinscheduler.common.labels" . | nindent 10 }}
spec:
accessModes:
{{- range .Values.worker.persistentVolumeClaim.logsPersistentVolume.accessModes }}
- {{ . | quote }}
{{- end }}
storageClassName: {{ .Values.worker.persistentVolumeClaim.logsPersistentVolume.storageClassName | quote }}
resources:
requests:
storage: {{ .Values.worker.persistentVolumeClaim.logsPersistentVolume.storage | quote }}
{{- end }}
{{- end }}

35
deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-alert.yaml

@ -0,0 +1,35 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v1
kind: Service
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-alert
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-alert
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
ports:
- port: 50052
targetPort: alert-port
protocol: TCP
name: alert-port
- port: 50053
targetPort: actuator-port
protocol: TCP
name: actuator-port
selector:
{{- include "dolphinscheduler.alert.labels" . | nindent 4 }}

61
deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-api.yaml

@ -0,0 +1,61 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v1
kind: Service
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-api
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-api
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
{{- if and (eq .Values.api.service.type "LoadBalancer") .Values.api.service.annotations }}
annotations:
{{- range $key, $value := .Values.api.service.annotations }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}
spec:
type: {{ .Values.api.service.type }}
{{- if and (eq .Values.api.service.type "ClusterIP") .Values.api.service.clusterIP }}
clusterIP: {{ .Values.api.service.clusterIP }}
{{- end }}
ports:
- port: 12345
targetPort: api-port
{{- if and (eq .Values.api.service.type "NodePort") .Values.api.service.nodePort }}
nodePort: {{ .Values.api.service.nodePort }}
{{- end }}
protocol: TCP
name: api-port
- port: 25333
targetPort: python-api-port
{{- if and (eq .Values.api.service.type "NodePort") .Values.api.service.pythonNodePort }}
nodePort: {{ .Values.api.service.pythonNodePort }}
{{- end }}
protocol: TCP
name: python-api-port
{{- if .Values.api.service.externalIPs }}
externalIPs:
{{- toYaml .Values.api.service.externalIPs | nindent 4 }}
{{- end }}
{{- if and (eq .Values.api.service.type "ExternalName") .Values.api.service.externalName }}
externalName: {{ .Values.api.service.externalName }}
{{- end }}
{{- if and (eq .Values.api.service.type "LoadBalancer") .Values.api.service.loadBalancerIP }}
loadBalancerIP: {{ .Values.api.service.loadBalancerIP }}
{{- end }}
selector:
{{- include "dolphinscheduler.api.labels" . | nindent 4 }}

32
deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-master-headless.yaml

@ -0,0 +1,32 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v1
kind: Service
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-master-headless
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-master-headless
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
clusterIP: "None"
ports:
- port: 5678
targetPort: master-port
protocol: TCP
name: master-port
selector:
{{- include "dolphinscheduler.master.labels" . | nindent 4 }}

32
deploy/kubernetes/dolphinscheduler/templates/svc-dolphinscheduler-worker-headless.yaml

@ -0,0 +1,32 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v1
kind: Service
metadata:
name: {{ include "dolphinscheduler.fullname" . }}-worker-headless
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}-worker-headless
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
spec:
clusterIP: "None"
ports:
- port: 1234
targetPort: worker-port
protocol: TCP
name: worker-port
selector:
{{- include "dolphinscheduler.worker.labels" . | nindent 4 }}

500
deploy/kubernetes/dolphinscheduler/values.yaml

@ -0,0 +1,500 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default values for dolphinscheduler-chart.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
timezone: "Asia/Shanghai"
image:
registry: "dolphinscheduler.docker.scarf.sh/apache"
tag: "3.1.9"
pullPolicy: "IfNotPresent"
pullSecret: ""
master: dolphinscheduler-master
worker: dolphinscheduler-worker
api: dolphinscheduler-api
alert: dolphinscheduler-alert-server
tools: dolphinscheduler-tools
## If not exists external database, by default, Dolphinscheduler's database will use it.
postgresql:
enabled: true
postgresqlUsername: "root"
postgresqlPassword: "root"
postgresqlDatabase: "dolphinscheduler"
params: "characterEncoding=utf8"
persistence:
enabled: false
size: "20Gi"
storageClass: "-"
mysql:
enabled: false
auth:
username: "ds"
password: "ds"
database: "dolphinscheduler"
params: "characterEncoding=utf8"
primary:
persistence:
enabled: false
size: "20Gi"
storageClass: "-"
## If exists external database, and set postgresql.enable value to false.
## external database will be used, otherwise Dolphinscheduler's database will be used.
externalDatabase:
type: "postgresql"
host: "localhost"
port: "5432"
username: "root"
password: "root"
database: "dolphinscheduler"
params: "characterEncoding=utf8"
## If not exists external registry, the zookeeper registry will be used by default.
zookeeper:
enabled: true
service:
port: 2181
fourlwCommandsWhitelist: "srvr,ruok,wchs,cons"
persistence:
enabled: false
size: "20Gi"
storageClass: "-"
## If exists external registry and set zookeeper.enable value to false, the external registry will be used.
externalRegistry:
registryPluginDir: "lib/plugin/registry"
registryPluginName: "zookeeper"
registryServers: "127.0.0.1:2181"
conf:
common:
# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path: /tmp/dolphinscheduler
# resource storage type: HDFS, S3, NONE
resource.storage.type: HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path: /dolphinscheduler
# whether to startup kerberos
hadoop.security.authentication.startup.state: false
# java.security.krb5.conf path
java.security.krb5.conf.path: /opt/krb5.conf
# login user from keytab username
login.user.keytab.username: hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path: /opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour
kerberos.expire.time: 2
# resource view suffixs
#resource.view.suffixs: txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user: hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS: hdfs://mycluster:8020
# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id: minioadmin
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key: minioadmin
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region: cn-north-1
# The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name: dolphinscheduler
# You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint: http://localhost:9000
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port: 8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids: 192.168.xx.xx,192.168.xx.xx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address: http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address: http://ds1:19888/ws/v1/history/mapreduce/jobs/%s
# datasource encryption enable
datasource.encryption.enable: false
# datasource encryption salt
datasource.encryption.salt: '!@#$%^&*'
# data quality option
data-quality.jar.name: dolphinscheduler-data-quality-dev-SNAPSHOT.jar
#data-quality.error.output.path: /tmp/data-quality-error-data
# Network IP gets priority, default inner outer
# Whether hive SQL is executed in the same session
support.hive.oneSession: false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable: true
# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred:
# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy: default
# system env path
#dolphinscheduler.env.path: dolphinscheduler_env.sh
# development state
development.state: false
# rpc port
alert.rpc.port: 50052
# Url endpoint for zeppelin RESTful API
zeppelin.rest.url: http://localhost:8080
common:
## Configmap
configmap:
DOLPHINSCHEDULER_OPTS: ""
DATA_BASEDIR_PATH: "/tmp/dolphinscheduler"
RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
# dolphinscheduler env
HADOOP_HOME: "/opt/soft/hadoop"
HADOOP_CONF_DIR: "/opt/soft/hadoop/etc/hadoop"
SPARK_HOME1: "/opt/soft/spark1"
SPARK_HOME2: "/opt/soft/spark2"
PYTHON_HOME: "/usr/bin/python"
JAVA_HOME: "/opt/java/openjdk"
HIVE_HOME: "/opt/soft/hive"
FLINK_HOME: "/opt/soft/flink"
DATAX_HOME: "/opt/soft/datax"
## Shared storage persistence mounted into api, master and worker, such as Hadoop, Spark, Flink and DataX binary package
sharedStoragePersistence:
enabled: false
mountPath: "/opt/soft"
accessModes:
- "ReadWriteMany"
## storageClassName must support the access mode: ReadWriteMany
storageClassName: "-"
storage: "20Gi"
## If RESOURCE_STORAGE_TYPE is HDFS and FS_DEFAULT_FS is file:///, fsFileResourcePersistence should be enabled for resource storage
fsFileResourcePersistence:
enabled: false
accessModes:
- "ReadWriteMany"
## storageClassName must support the access mode: ReadWriteMany
storageClassName: "-"
storage: "20Gi"
master:
## PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down.
podManagementPolicy: "Parallel"
## Replicas is the desired number of replicas of the given Template.
replicas: "3"
## You can use annotations to attach arbitrary non-identifying metadata to objects.
## Clients such as tools and libraries can retrieve this metadata.
annotations: {}
## Affinity is a group of affinity scheduling rules. If specified, the pod's scheduling constraints.
## More info: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#affinity-v1-core
affinity: {}
## NodeSelector is a selector which must be true for the pod to fit on a node.
## Selector which must match a node's labels for the pod to be scheduled on that node.
## More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
nodeSelector: {}
## Tolerations are appended (excluding duplicates) to pods running with this RuntimeClass during admission,
## effectively unioning the set of nodes tolerated by the pod and the RuntimeClass.
tolerations: []
## Compute Resources required by this container. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
resources: {}
# resources:
# limits:
# memory: "8Gi"
# cpu: "4"
# requests:
# memory: "2Gi"
# cpu: "500m"
## Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
livenessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
readinessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## PersistentVolumeClaim represents a reference to a PersistentVolumeClaim in the same namespace.
## The StatefulSet controller is responsible for mapping network identities to claims in a way that maintains the identity of a pod.
## Every claim in this list must have at least one matching (by name) volumeMount in one container in the template.
## A claim in this list takes precedence over any volumes in the template, with the same name.
persistentVolumeClaim:
enabled: false
accessModes:
- "ReadWriteOnce"
storageClassName: "-"
storage: "20Gi"
env:
JAVA_OPTS: "-Xms1g -Xmx1g -Xmn512m"
MASTER_EXEC_THREADS: "100"
MASTER_EXEC_TASK_NUM: "20"
MASTER_DISPATCH_TASK_NUM: "3"
MASTER_HOST_SELECTOR: "LowerWeight"
MASTER_HEARTBEAT_INTERVAL: "10s"
MASTER_HEARTBEAT_ERROR_THRESHOLD: "5"
MASTER_TASK_COMMIT_RETRYTIMES: "5"
MASTER_TASK_COMMIT_INTERVAL: "1s"
MASTER_STATE_WHEEL_INTERVAL: "5s"
MASTER_MAX_CPU_LOAD_AVG: "-1"
MASTER_RESERVED_MEMORY: "0.3"
MASTER_FAILOVER_INTERVAL: "10m"
MASTER_KILL_YARN_JOB_WHEN_HANDLE_FAILOVER: "true"
worker:
## PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down.
podManagementPolicy: "Parallel"
## Replicas is the desired number of replicas of the given Template.
replicas: "3"
## You can use annotations to attach arbitrary non-identifying metadata to objects.
## Clients such as tools and libraries can retrieve this metadata.
annotations: {}
## Affinity is a group of affinity scheduling rules. If specified, the pod's scheduling constraints.
## More info: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#affinity-v1-core
affinity: {}
## NodeSelector is a selector which must be true for the pod to fit on a node.
## Selector which must match a node's labels for the pod to be scheduled on that node.
## More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
nodeSelector: {}
## Tolerations are appended (excluding duplicates) to pods running with this RuntimeClass during admission,
## effectively unioning the set of nodes tolerated by the pod and the RuntimeClass.
tolerations: []
## Compute Resources required by this container. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
resources: {}
# resources:
# limits:
# memory: "8Gi"
# cpu: "4"
# requests:
# memory: "2Gi"
# cpu: "500m"
## Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
livenessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
readinessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## PersistentVolumeClaim represents a reference to a PersistentVolumeClaim in the same namespace.
## The StatefulSet controller is responsible for mapping network identities to claims in a way that maintains the identity of a pod.
## Every claim in this list must have at least one matching (by name) volumeMount in one container in the template.
## A claim in this list takes precedence over any volumes in the template, with the same name.
persistentVolumeClaim:
enabled: false
## dolphinscheduler data volume
dataPersistentVolume:
enabled: false
accessModes:
- "ReadWriteOnce"
storageClassName: "-"
storage: "20Gi"
## dolphinscheduler logs volume
logsPersistentVolume:
enabled: false
accessModes:
- "ReadWriteOnce"
storageClassName: "-"
storage: "20Gi"
env:
WORKER_MAX_CPU_LOAD_AVG: "-1"
WORKER_RESERVED_MEMORY: "0.3"
WORKER_EXEC_THREADS: "100"
WORKER_HEARTBEAT_INTERVAL: "10s"
WORKER_HEART_ERROR_THRESHOLD: "5"
WORKER_HOST_WEIGHT: "100"
alert:
## Number of desired pods. This is a pointer to distinguish between explicit zero and not specified. Defaults to 1.
replicas: 1
## The deployment strategy to use to replace existing pods with new ones.
strategy:
type: "RollingUpdate"
rollingUpdate:
maxSurge: "25%"
maxUnavailable: "25%"
## You can use annotations to attach arbitrary non-identifying metadata to objects.
## Clients such as tools and libraries can retrieve this metadata.
annotations: {}
## NodeSelector is a selector which must be true for the pod to fit on a node.
## Selector which must match a node's labels for the pod to be scheduled on that node.
## More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinity: {}
## Compute Resources required by this container. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
nodeSelector: {}
## Tolerations are appended (excluding duplicates) to pods running with this RuntimeClass during admission,
## effectively unioning the set of nodes tolerated by the pod and the RuntimeClass.
tolerations: []
## Affinity is a group of affinity scheduling rules. If specified, the pod's scheduling constraints.
## More info: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#affinity-v1-core
resources: {}
# resources:
# limits:
# memory: "2Gi"
# cpu: "1"
# requests:
# memory: "1Gi"
# cpu: "500m"
## Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
livenessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
readinessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## PersistentVolumeClaim represents a reference to a PersistentVolumeClaim in the same namespace.
## More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistentvolumeclaims
persistentVolumeClaim:
enabled: false
accessModes:
- "ReadWriteOnce"
storageClassName: "-"
storage: "20Gi"
env:
JAVA_OPTS: "-Xms512m -Xmx512m -Xmn256m"
api:
## Number of desired pods. This is a pointer to distinguish between explicit zero and not specified. Defaults to 1.
replicas: "1"
## The deployment strategy to use to replace existing pods with new ones.
strategy:
type: "RollingUpdate"
rollingUpdate:
maxSurge: "25%"
maxUnavailable: "25%"
## You can use annotations to attach arbitrary non-identifying metadata to objects.
## Clients such as tools and libraries can retrieve this metadata.
annotations: {}
## NodeSelector is a selector which must be true for the pod to fit on a node.
## Selector which must match a node's labels for the pod to be scheduled on that node.
## More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinity: {}
## Compute Resources required by this container. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
nodeSelector: {}
## Tolerations are appended (excluding duplicates) to pods running with this RuntimeClass during admission,
## effectively unioning the set of nodes tolerated by the pod and the RuntimeClass.
tolerations: []
## Affinity is a group of affinity scheduling rules. If specified, the pod's scheduling constraints.
## More info: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#affinity-v1-core
resources: {}
# resources:
# limits:
# memory: "2Gi"
# cpu: "1"
# requests:
# memory: "1Gi"
# cpu: "500m"
## Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
livenessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated.
## More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
readinessProbe:
enabled: true
initialDelaySeconds: "30"
periodSeconds: "30"
timeoutSeconds: "5"
failureThreshold: "3"
successThreshold: "1"
## PersistentVolumeClaim represents a reference to a PersistentVolumeClaim in the same namespace.
## More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistentvolumeclaims
persistentVolumeClaim:
enabled: false
accessModes:
- "ReadWriteOnce"
storageClassName: "-"
storage: "20Gi"
service:
## type determines how the Service is exposed. Defaults to ClusterIP. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer
type: "ClusterIP"
## clusterIP is the IP address of the service and is usually assigned randomly by the master
clusterIP: ""
## nodePort is the port on each node on which this api service is exposed when type=NodePort
nodePort: ""
## pythonNodePort is the port on each node on which this python api service is exposed when type=NodePort
pythonNodePort: ""
## externalIPs is a list of IP addresses for which nodes in the cluster will also accept traffic for this service
externalIPs: []
## externalName is the external reference that kubedns or equivalent will return as a CNAME record for this service, requires Type to be ExternalName
externalName: ""
## loadBalancerIP when service.type is LoadBalancer. LoadBalancer will get created with the IP specified in this field
loadBalancerIP: ""
## annotations may need to be set when service.type is LoadBalancer
## service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:EXAMPLE_CERT
annotations: {}
env:
JAVA_OPTS: "-Xms512m -Xmx512m -Xmn256m"
ingress:
enabled: false
host: "dolphinscheduler.org"
path: "/dolphinscheduler"
annotations: {}
tls:
enabled: false
secretName: "dolphinscheduler-tls"

1269
docs/configs/docsdev.js

File diff suppressed because it is too large

131
docs/configs/index.md.jsx

@ -0,0 +1,131 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*/
import React from 'react';
import ReactDOM from 'react-dom';
import cookie from 'js-cookie';
import Language from '../../components/language';
import Header from '../../components/header';
import Footer from '../../components/footer';
import Md2Html from '../../components/md2html';
import Sidemenu from '../../components/sidemenu';
import siteConfig from '../../../site_config/site';
import docs120Config from '../../../site_config/docs1-2-0';
import docs121Config from '../../../site_config/docs1-2-1';
import docs131Config from '../../../site_config/docs1-3-1';
import docs132Config from '../../../site_config/docs1-3-2';
import docs133Config from '../../../site_config/docs1-3-3';
import docs134Config from '../../../site_config/docs1-3-4';
import docs135Config from '../../../site_config/docs1-3-5';
import docs136Config from '../../../site_config/docs1-3-6';
import docs138Config from '../../../site_config/docs1-3-8';
import docs139Config from '../../../site_config/docs1-3-9';
import docs200Config from '../../../site_config/docs2-0-0';
import docs201Config from '../../../site_config/docs2-0-1';
import docs202Config from '../../../site_config/docs2-0-2';
import docs203Config from '../../../site_config/docs2-0-3';
import docs205Config from '../../../site_config/docs2-0-5';
import docs206Config from '../../../site_config/docs2-0-6';
import docs300Config from '../../../site_config/docs3-0-0';
import docsDevConfig from '../../../site_config/docsdev';
const docsSource = {
'1.2.0': docs120Config,
'1.2.1': docs121Config,
'1.3.1': docs131Config,
'1.3.2': docs132Config,
'1.3.3': docs133Config,
'1.3.4': docs134Config,
'1.3.5': docs135Config,
'1.3.6': docs136Config,
'1.3.8': docs138Config,
'1.3.9': docs139Config,
'2.0.0': docs200Config,
'2.0.1': docs201Config,
'2.0.2': docs202Config,
'2.0.3': docs203Config,
'2.0.5': docs205Config,
'2.0.6': docs206Config,
'3.0.0': docs300Config,
dev: docsDevConfig,
};
const isValidVersion = version => version && docsSource.hasOwnProperty(version);
class Docs extends Md2Html(Language) {
render() {
const language = this.getLanguage();
let dataSource = {};
// from location path
let version = window.location.pathname.split('/')[3];
if (isValidVersion(version) || version === 'latest') {
cookie.set('docs_version', version);
}
// from rendering html
if (!version && this.props.subdir) {
version = this.props.subdir.split('/')[0];
}
if (isValidVersion(version)) {
dataSource = docsSource[version][language];
} else if (isValidVersion(cookie.get('docs_version'))) {
dataSource = docsSource[cookie.get('docs_version')][language];
} else if (isValidVersion(siteConfig.docsLatest)) {
dataSource = docsSource[siteConfig.docsLatest][language];
dataSource.sidemenu.forEach((menu) => {
menu.children.forEach((submenu) => {
if (!submenu.children) {
submenu.link = submenu.link.replace(`docs/${siteConfig.docsLatest}`, 'docs/latest');
} else {
submenu.children.forEach((menuLevel3) => {
menuLevel3.link = menuLevel3.link.replace(`docs/${siteConfig.docsLatest}`, 'docs/latest');
});
}
});
});
} else {
return null;
}
const __html = this.props.__html || this.state.__html;
return (
<div className="md2html docs-page">
<Header
currentKey="docs"
type="dark"
logo="/img/hlogo_white.svg"
language={language}
onLanguageChange={this.onLanguageChange}
/>
<section className="content-section">
<Sidemenu dataSource={dataSource.sidemenu} />
<div
className="doc-content markdown-body"
ref={(node) => { this.markdownContainer = node; }}
dangerouslySetInnerHTML={{ __html }}
/>
</section>
<Footer logo="/img/ds_gray.svg" language={language} />
</div>
);
}
}
document.getElementById('root') && ReactDOM.render(<Docs />, document.getElementById('root'));
export default Docs;

288
docs/configs/site.js

@ -0,0 +1,288 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*/
// 全局的一些配置
export default {
rootPath: '',
port: 8080,
domain: 'dolphinscheduler.apache.org',
copyToDist: ['asset', 'img', 'file', '.asf.yaml', 'sitemap.xml', '.nojekyll', '.htaccess', 'googled0df7b96f277a143.html'],
docsLatest: '3.0.0',
defaultSearch: 'google', // default search engine
defaultLanguage: 'en-us',
'en-us': {
banner: {
text: '🤔 Have queries regarding Apache DolphinScheduler, Join Slack channel to disscuss them ',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-1e36toy4n-5n9U2R__FDM05R~MJFFVBg'
},
pageMenu: [
{
key: 'home',
text: 'HOME',
link: '/en-us/index.html',
},
{
key: 'docs',
text: 'DOCS',
link: '/en-us/docs/latest/user_doc/about/introduction.html',
children: [
{
key: 'docs0',
text: 'latest(3.1.9)',
link: '/en-us/docs/latest/user_doc/about/introduction.html',
},
{
key: 'docs1',
text: '3.0.1',
link: '/en-us/docs/3.0.1/user_doc/about/introduction.html',
},
{
key: 'docs2',
text: '2.0.6',
link: '/en-us/docs/2.0.6/user_doc/guide/quick-start.html',
},
{
key: 'docsHistory',
text: 'Older Versions',
link: '/en-us/docs/release/history-versions.html',
}
],
},
{
key: 'download',
text: 'DOWNLOAD',
link: '/en-us/download/download.html',
},
{ key: 'blog',
text: 'BLOG',
link: '/en-us/blog/index.html',
},
{
key: 'community',
text: 'COMMUNITY',
link: '/en-us/community/community.html',
},
{
key: 'ASF',
text: 'ASF',
target: '_blank',
link: 'https://www.apache.org/',
children: [
{
key: 'Foundation',
text: 'Foundation',
target: '_blank',
link: 'https://www.apache.org/',
},
{
key: 'License',
text: 'License',
target: '_blank',
link: 'https://www.apache.org/licenses/',
},
{
key: 'Events',
text: 'Events',
target: '_blank',
link: 'https://www.apache.org/events/current-event',
},
{
key: 'Security',
text: 'Security',
target: '_blank',
link: 'https://www.apache.org/security/',
},
{
key: 'Sponsorship',
text: 'Sponsorship',
target: '_blank',
link: 'https://www.apache.org/foundation/sponsorship.html',
},
{
key: 'Thanks',
text: 'Thanks',
target: '_blank',
link: 'https://www.apache.org/foundation/thanks.html',
},
],
},
{
key: 'user',
text: 'USER',
link: '/en-us/user/index.html',
},
],
contact: {
title: 'About us',
content: 'Do you need feedback? Please contact us through the following ways.',
list: [
{
name: 'Slack',
img1: '/img/slack.png',
img2: '/img/slack-selected.png',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-1e36toy4n-5n9U2R__FDM05R~MJFFVBg',
},
{
name: 'Email List',
img1: '/img/emailgray.png',
img2: '/img/emailblue.png',
link: '/en-us/docs/latest/user_doc/contribute/join/subscribe.html',
},
{
name: 'Twitter',
img1: '/img/twittergray.png',
img2: '/img/twitterblue.png',
link: 'https://twitter.com/dolphinschedule',
},
],
},
copyright: 'Copyright © 2019-2022 The Apache Software Foundation. Apache DolphinScheduler, DolphinScheduler, and its feather logo are trademarks of The Apache Software Foundation.',
},
'zh-cn': {
banner: {
text: '🤔 有关于 Apache DolphinScheduler 的疑问,加入 Slack 频道来讨论他们 ',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-1e36toy4n-5n9U2R__FDM05R~MJFFVBg'
},
pageMenu: [
{
key: 'home',
text: '首页',
link: '/zh-cn/index.html',
},
{
key: 'docs',
text: '文档',
link: '/zh-cn/docs/latest/user_doc/about/introduction.html',
children: [
{
key: 'docs0',
text: '最新版本latest(3.1.9)',
link: '/zh-cn/docs/latest/user_doc/about/introduction.html',
},
{
key: 'docs1',
text: '3.0.1',
link: '/zh-cn/docs/3.0.1/user_doc/about/introduction.html',
},
{
key: 'docs2',
text: '2.0.6',
link: '/zh-cn/docs/2.0.6/user_doc/guide/quick-start.html',
},
{
key: 'docsHistory',
text: '历史版本',
link: '/zh-cn/docs/release/history-versions.html',
}
],
},
{
key: 'download',
text: '下载',
link: '/zh-cn/download/download.html',
},
{
key: 'blog',
text: '博客',
link: '/zh-cn/blog/index.html',
},
{
key: 'community',
text: '社区',
link: '/zh-cn/community/community.html',
},
{
key: 'ASF',
text: 'ASF',
target: '_blank',
link: 'https://www.apache.org/',
children: [
{
key: 'Foundation',
text: 'Foundation',
target: '_blank',
link: 'https://www.apache.org/',
},
{
key: 'License',
text: 'License',
target: '_blank',
link: 'https://www.apache.org/licenses/',
},
{
key: 'Events',
text: 'Events',
target: '_blank',
link: 'https://www.apache.org/events/current-event',
},
{
key: 'Security',
text: 'Security',
target: '_blank',
link: 'https://www.apache.org/security/',
},
{
key: 'Sponsorship',
text: 'Sponsorship',
target: '_blank',
link: 'https://www.apache.org/foundation/sponsorship.html',
},
{
key: 'Thanks',
text: 'Thanks',
target: '_blank',
link: 'https://www.apache.org/foundation/thanks.html',
},
],
},
{
key: 'user',
text: '用户',
// link: '',
link: '/zh-cn/user/index.html',
},
],
contact: {
title: '联系我们',
content: '有问题需要反馈?请通过以下方式联系我们。',
list: [
{
name: 'Slack',
img1: '/img/slack.png',
img2: '/img/slack-selected.png',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-1e36toy4n-5n9U2R__FDM05R~MJFFVBg',
},
{
name: '邮件列表',
img1: '/img/emailgray.png',
img2: '/img/emailblue.png',
link: '/zh-cn/docs/latest/user_doc/contribute/join/subscribe.html',
},
{
name: 'Twitter',
img1: '/img/twittergray.png',
img2: '/img/twitterblue.png',
link: 'https://twitter.com/dolphinschedule',
},
],
},
copyright: 'Copyright © 2019-2022 The Apache Software Foundation. Apache DolphinScheduler, DolphinScheduler, and its feather logo are trademarks of The Apache Software Foundation.',
},
};

92
docs/docs/en/DSIP.md

@ -0,0 +1,92 @@
# DSIP
DolphinScheduler Improvement Proposal(DSIP) introduce major improvements to the Apache DolphinScheduler codebase. It is
not for small incremental improvements, and the purpose of DSIP is to notice and inform community the finished or coming
big feature for Apache DolphinScheduler.
## What is considered as DSIP
- Any major new feature, major improvement, introduce or remove components
- Any major change of public interfaces, such as API endpoints, web ui huge change
When the change in doubt and any committer thinks it should be DSIP, it does.
We use GitHub Issue and Apache mail thread to record and hold DSIP, for more detail you could go to section
[current DSIPs](#current-dsips) and [past DSIPs](#past-dsips).
As a DSIP, it should:
- Have a mail thread title started with `[DISCUSS]` in [dev@dolphinscheduler.apache.org][mail-to-dev]
- Have a GitHub Issue labeled with `DSIP`, and including the mail thread link in the description.
### Current DSIPs
Current DSIPs including all DSIP still work-in-progress, you could see in [current DSIPs][current-DSIPs]
### Past DSIPs
Past DSIPs including all DSIP already done or retired for some reason, you could see in [past DSIPs][past-DSIPs]
## DSIP Process
### Create GitHub Issue
All DSIP should start with GitHub Issue
- If you pretty sure your issue is DSIP, you could click and choose "DSIP" in
[GitHub Issue][github-issue-choose]
- If you not sure about your issue is DSIP or not, you could click and choose "Feature request" in
[GitHub Issue][github-issue-choose]. DolphinScheduler maintainer team would add label `DSIP`, mention you in the
issue and lead you to this document when they think it should be DSIP.
You should add special prefix `[DSIP-XXX]`, `XXX` stand for the id DSIP. It's auto increment, and you could find the next
integer in [All DSIPs][all-DSIPs] issues.
### Send Discuss Mail
After issue labeled with "DSIP", you should send an email to [dev@dolphinscheduler.apache.org][mail-to-dev].
Describe the purpose, and the draft design about your idea.
Here is the template for mail
- Title: `[DISCUSS][DSIP-XXX] <CHANGE-TO-YOUR-LOVELY-PROPOSAL-TITLE>`, change `XXX` to special integer you just change in
[GitHub Issue](#create-github-issue), and also change proposal title.
- Content:
```text
Hi community,
<CHANGE-TO-YOUR-PROPOSAL-DETAIL>
I already add a GitHub Issue for my proposal, which you could see in <CHANGE-TO-YOUR-GITHUB-ISSUE-LINK>.
Looking forward any feedback for this thread.
```
After community discuss and all of them think it worth as DSIP, you could [work on it](#work-on-it-or-create-subtask-for-it).
But if community think it should not be DSIP or even this change should not be included to DolphinScheduler, maintainers
terminate mail thread and remove label "DSIP" for GitHub Issue, or even close issue if it should not change.
### Work On It, Or Create Subtask For It
When your proposal pass in the mail thread, you could make your hand dirty and start the work. You could submit related
pull requests in GitHub if change should in one single commit. What's more, if proposal is too huge in single commit, you
could create subtasks in GitHub Issue like [DSIP-1][DSIP-1], and separate into multiple commit.
### Close After It Done
When DSIP is finished and all related PR were merged, you should reply the mail thread you created in
[step two](#send-discuss-mail) to notice community the result of the DSIP. After that, this DSIP GitHub Issue would be
closed and transfer from [current DSIPs][current-DSIPs] to [past DSIPs][past-DSIPs], but you could still find it in [All DSIPs][all-DSIPs]
## An Example For DSIP
* [[DSIP-1][Feature][Parent] Add Python API for DolphinScheduler][DSIP-1]: Have multiple subtasks and Projects on it.
[all-DSIPs]: https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue+label%3A%22DSIP%22+
[current-DSIPs]: https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue+is%3Aopen+label%3A%22DSIP%22
[past-DSIPs]: https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue+is%3Aclosed+label%3A%22DSIP%22+
[github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose
[mail-to-dev]: mailto:dev@dolphinscheduler.apache.org
[DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407

20
docs/docs/en/about/features.md

@ -0,0 +1,20 @@
# Features
## Simple to Use
- **Visual DAG**: User-friendly drag-and-drop workflow definition and facility for run-time control.
- **Modular Operation**: Modularity facilitates easy customization and maintenance.
## Rich Scenarios
- **Multiple Task Type Support**: Supports more than 10 task types, like Shell, MR, Spark, SQL, etc., with cross-language support making it easy to extend
- **Workflow Ops**: Workflow can be timed, paused, resumed and stopped, enabling easy maintenance and control of global and local parameters.
## High Reliability
- **Reliability**: Decentralized designs ensure stability. Self-supporting HA task queue to avoid overload fault tolerant capability. DolphinScheduler facilitates a highly robust environment.
## High Scalability
- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported.

75
docs/docs/en/about/glossary.md

@ -0,0 +1,75 @@
## System Architecture Design
Before explaining the architecture of the scheduling system, let's first understand the commonly used terms of the
scheduling system
### 1.Glossary
**DAG:** The full name is Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are assembled in the
form of a directed acyclic graph, and topological traversal is performed from nodes with zero degrees of entry until
there are no subsequent nodes. Examples are as follows:
![about-glossary](../../../img/new_ui/dev/about/glossary.png)
**Process definition**: Visualization formed by dragging task nodes and establishing task node associations**DAG**
**Process instance**: The process instance is the instantiation of the process definition, which can be generated by
manual start or scheduled scheduling. Each time the process definition runs, a process instance is generated
**Task instance**: The task instance is the instantiation of the task node in the process definition, which identifies
the specific task
**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (
depends), and plans to support dynamic plug-in expansion, note: **SUB_PROCESS** need relation with another workflow definition which also a separate process
definition that can be started and executed separately
**Scheduling method**: The system supports scheduled scheduling and manual scheduling based on cron expressions. Command
type support: start workflow, start execution from current node, resume fault-tolerant workflow, resume pause process,
start execution from failed node, complement, timing, rerun, pause, stop, resume waiting thread. Among them **Resume
fault-tolerant workflow** and **Resume waiting thread** The two command types are used by the internal control of
scheduling, and cannot be called from the outside
**Scheduled**: System adopts **quartz** distributed scheduler, and supports the visual generation of cron expressions
**Rely**: The system not only supports **DAG** simple dependencies between the predecessor and successor nodes, but also
provides **task dependent** nodes, supporting **between processes**
**Priority**: Support the priority of process instances and task instances, if the priority of process instances and
task instances is not set, the default is first-in-first-out
**Email alert**: Support **SQL task** Query result email sending, process instance running result email alert and fault
tolerance alert notification
**Failure strategy**: For tasks running in parallel, if a task fails, two failure strategy processing methods are
provided. **Continue** refers to regardless of the status of the task running in parallel until the end of the process
failure. **End** means that once a failed task is found, Kill will also run the parallel task at the same time, and the
process fails and ends
**Complement**: Supplement historical data,supports **interval parallel** and **serial** two complement methods, and two types of date selection which include **date range** and **date enumeration**.
### 2.Module introduction
- dolphinscheduler-master master module, provides workflow management and orchestration.
- dolphinscheduler-worker worker module, provides task execution management.
- dolphinscheduler-alert alarm module, providing AlertServer service.
- dolphinscheduler-api web application module, providing ApiServer service.
- dolphinscheduler-common General constant enumeration, utility class, data structure or base class
- dolphinscheduler-dao provides operations such as database access.
- dolphinscheduler-remote client and server based on netty
- dolphinscheduler-service service module, including Quartz, Zookeeper, log client access service, easy to call server
module and api module
- dolphinscheduler-ui front-end module
### Sum up
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation
ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

49
docs/docs/en/about/hardware.md

@ -0,0 +1,49 @@
# Hardware Environment
This section briefs about the hardware requirements for DolphinScheduler. DolphinScheduler works as an open-source distributed workflow task scheduling system. It can deploy and run smoothly in Intel architecture server environments and mainstream virtualization environments. It also supports mainstream Linux operating system environments and ARM architecture.
## Linux Operating System Version Requirements
The Linux operating systems specified below can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
| Operating System | Version |
|:-------------------------|:---------------:|
| Red Hat Enterprise Linux | 7.0 and above |
| CentOS | 7.0 and above |
| Oracle Enterprise Linux | 7.0 and above |
| Ubuntu LTS | 16.04 and above |
> **Note:**
> The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
## Server Configuration
DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architecture. The following table shows the recommended server requirements in a production environment:
### Production Environment
| **CPU** | **MEM** | **HD** | **NIC** | **Num** |
|---------|---------|--------|---------|---------|
| 4 core+ | 8 GB+ | SAS | GbE | 1+ |
> **Note:**
> - The above recommended configuration is the minimum configuration for deploying DolphinScheduler. Higher configuration is strongly recommended for production environments.
> - The recommended hard disk size is more than 50GB and separate the system disk and data disk.
## Network Requirements
DolphinScheduler provides the following network port configurations for normal operation:
| Server | Port | Desc |
|----------------------|-------|----------------------------------------------------------------------|
| MasterServer | 5678 | not the communication port, require the native ports do not conflict |
| WorkerServer | 1234 | not the communication port, require the native ports do not conflict |
| ApiApplicationServer | 12345 | backend communication port |
> **Note:**
> - MasterServer and WorkerServer do not need to enable communication between the networks. As long as the local ports do not conflict.
> - Administrators can adjust relevant ports on the network side and host-side according to the deployment plan of DolphinScheduler components in the actual environment.
## Browser Requirements
The minimum supported version of Google Chrome is version 85, but version 90 or above is recommended.

7
docs/docs/en/about/introduction.md

@ -0,0 +1,7 @@
# About DolphinScheduler
Apache DolphinScheduler provides a distributed and easy to expand visual workflow task scheduling open-source platform. It is suitable for enterprise-level scenarios. It provides a solution to visualize operation tasks, workflows, and the entire data processing procedures.
Apache DolphinScheduler aims to solve complex big data task dependencies and to trigger relationships in data OPS orchestration for various big data applications. Solves the intricate dependencies of data R&D ETL and the inability to monitor the health status of tasks. DolphinScheduler assembles tasks in the Directed Acyclic Graph (DAG) streaming mode, which can monitor the execution status of tasks in time, and supports operations like retry, recovery failure from specified nodes, pause, resume, and kill tasks, etc.
![Apache DolphinScheduler](../../../img/introduction_ui.png)

42
docs/docs/en/architecture/cache.md

@ -0,0 +1,42 @@
# Cache
## Purpose
Due to the large database read operations during the master-server scheduling process. Such as read tables like `tenant`, `user`, `processDefinition`, etc. Operations stress read pressure to the DB, and slow down the entire core scheduling process.
By considering this part of the business data is a high-read and low-write scenario, a cache module is introduced to reduce the DB read pressure and speed up the core scheduling process.
## Cache Settings
```yaml
spring:
cache:
# default disable cache, you can enable by `type: caffeine`
type: none
cache-names:
- tenant
- user
- processDefinition
- processTaskRelation
- taskDefinition
caffeine:
spec: maximumSize=100,expireAfterWrite=300s,recordStats
```
The cache module uses [spring-cache](https://spring.io/guides/gs/caching/), so you can set cache config like whether to enable cache (`none` to disable by default), cache types in the spring `application.yaml` directly.
Currently, implements the config of [caffeine](https://github.com/ben-manes/caffeine), you can assign cache configs like cache size, expire time, etc.
## Cache Read
The cache module adopts the `@Cacheable` annotation from spring-cache and you can annotate the annotation in the related mapper layer. Refer to the `TenantMapper`.
## Cache Evict
The business data updates come from the api-server, and the cache side is in the master-server. Then it is necessary to monitor the data updates from the api-server (use aspect point cut interceptor `@CacheEvict`), and notify the master-server of `cacheEvictCommand` when processing a cache eviction.
Note: the final strategy for cache update comes from the expiration strategy configuration in caffeine, therefore configure it under the business scenarios;
The sequence diagram shows below:
<img src="../../../img/cache-evict.png" alt="cache-evict" style="zoom: 67%;" />

384
docs/docs/en/architecture/configuration.md

@ -0,0 +1,384 @@
<!-- markdown-link-check-disable -->
# Configuration
## Preface
This document explains the DolphinScheduler application configurations.
## Directory Structure
The directory structure of DolphinScheduler is as follows:
```
├── LICENSE
├── NOTICE
├── licenses directory of licenses
├── bin directory of DolphinScheduler application commands, configrations scripts
│   ├── dolphinscheduler-daemon.sh script to start or shut down DolphinScheduler application
│   ├── env directory of scripts to load environment variables
│   │   ├── dolphinscheduler_env.sh script to export environment variables [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...] when you start or stop service using script `dolphinscheduler-daemon.sh`
│   │   └── install_env.sh script to export environment variables for DolphinScheduler installation when you use scripts `install.sh` `start-all.sh` `stop-all.sh` `status-all.sh`
│   ├── install.sh script to auto-setup services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   ├── remove-zk-node.sh script to cleanup ZooKeeper caches
│   ├── scp-hosts.sh script to copy installation files to target hosts
│   ├── start-all.sh script to start all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   ├── status-all.sh script to check the status of all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   └── stop-all.sh script to shut down all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
├── alert-server directory of DolphinScheduler alert-server commands, configrations scripts and libs
│   ├── bin
│   │   └── start.sh script to start DolphinScheduler alert-server
│   ├── conf
│   │   ├── application.yaml configurations of alert-server
│   │   ├── bootstrap.yaml configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh script to load environment variables for alert-server
│   │   └── logback-spring.xml configurations of alert-service log
│   └── libs directory of alert-server libs
├── api-server directory of DolphinScheduler api-server commands, configrations scripts and libs
│   ├── bin
│   │   └── start.sh script to start DolphinScheduler api-server
│   ├── conf
│   │   ├── application.yaml configurations of api-server
│   │   ├── bootstrap.yaml configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh script to load environment variables for api-server
│   │   └── logback-spring.xml configurations of api-service log
│   ├── libs directory of api-server libs
│   └── ui directory of api-server related front-end web resources
├── master-server directory of DolphinScheduler master-server commands, configrations scripts and libs
│   ├── bin
│   │   └── start.sh script to start DolphinScheduler master-server
│   ├── conf
│   │   ├── application.yaml configurations of master-server
│   │   ├── bootstrap.yaml configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh script to load environment variables for master-server
│   │   └── logback-spring.xml configurations of master-service log
│   └── libs directory of master-server libs
├── standalone-server directory of DolphinScheduler standalone-server commands, configrations scripts and libs
│   ├── bin
│   │   └── start.sh script to start DolphinScheduler standalone-server
│   ├── conf
│   │   ├── application.yaml configurations of standalone-server
│   │   ├── bootstrap.yaml configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh script to load environment variables for standalone-server
│   │   ├── logback-spring.xml configurations of standalone-service log
│   │   └── sql .sql files to create or upgrade DolphinScheduler metadata
│   ├── libs directory of standalone-server libs
│   └── ui directory of standalone-server related front-end web resources
│  
├── tools directory of DolphinScheduler metadata tools commands, configrations scripts and libs
│   ├── bin
│   │   └── upgrade-schema.sh script to initialize or upgrade DolphinScheduler metadata
│   ├── conf
│   │   ├── application.yaml configurations of tools
│   │   └── common.properties configurations of common-service like storage, credentials, etc.
│   ├── libs directory of tool libs
│   └── sql .sql files to create or upgrade DolphinScheduler metadata
│  
├── worker-server directory of DolphinScheduler worker-server commands, configrations scripts and libs
│ ├── bin
│ │   └── start.sh script to start DolphinScheduler worker-server
│ ├── conf
│ │   ├── application.yaml configurations of worker-server
│ │   ├── bootstrap.yaml configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│ │   ├── common.properties configurations of common-service like storage, credentials, etc.
│ │   ├── dolphinscheduler_env.sh script to load environment variables for worker-server
│ │   └── logback-spring.xml configurations of worker-service log
│ └── libs directory of worker-server libs
└── ui directory of front-end web resources
```
## Configurations in Details
### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown.
Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh.
Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are:
```bash
export DOLPHINSCHEDULER_OPTS="
-server
-Xmx16g
-Xms1g
-Xss512k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
"
```
> "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate).
### Database connection related configuration
DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
|Alert Server| `alert-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value| Description|
|--|--|--|
|spring.datasource.driver-class-name| org.postgresql.Driver |datasource driver|
|spring.datasource.url| jdbc:postgresql://127.0.0.1:5432/dolphinscheduler |datasource connection url|
|spring.datasource.username|root|datasource username|
|spring.datasource.password|root|datasource password|
|spring.datasource.hikari.connection-test-query|select 1|validate connection by running the SQL|
|spring.datasource.hikari.minimum-idle| 5| minimum connection pool size number|
|spring.datasource.hikari.auto-commit|true|whether auto commit|
|spring.datasource.hikari.pool-name|DolphinScheduler|name of the connection pool|
|spring.datasource.hikari.maximum-pool-size|50| maximum connection pool size number|
|spring.datasource.hikari.connection-timeout|30000|connection timeout|
|spring.datasource.hikari.idle-timeout|600000|Maximum idle connection survival time|
|spring.datasource.hikari.leak-detection-threshold|0|Connection leak detection threshold|
|spring.datasource.hikari.initialization-fail-timeout|1|Connection pool initialization failed timeout|
Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
### Zookeeper related configuration
DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value| Description|
|--|--|--|
|registry.zookeeper.namespace|dolphinscheduler|namespace of zookeeper|
|registry.zookeeper.connect-string|localhost:2181| the connection string of zookeeper|
|registry.zookeeper.retry-policy.base-sleep-time|60ms|time to wait between subsequent retries|
|registry.zookeeper.retry-policy.max-sleep|300ms|maximum time to wait between subsequent retries|
|registry.zookeeper.retry-policy.max-retries|5|maximum retry times|
|registry.zookeeper.session-timeout|30s|session timeout|
|registry.zookeeper.connection-timeout|30s|connection timeout|
|registry.zookeeper.block-until-connected|600ms|waiting time to block until the connection succeeds|
|registry.zookeeper.digest|{username}:{password}|digest of zookeeper to access znode, works only when acl is enabled, for more details please check [https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html](Apache Zookeeper doc) |
Note that DolphinScheduler also supports zookeeper related configuration through `bin/env/dolphinscheduler_env.sh`.
### common.properties [hadoop、s3、yarn config properties]
Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/common.properties`|
|Api Server| `api-server/conf/common.properties`|
|Worker Server| `worker-server/conf/common.properties`|
|Alert Server| `alert-server/conf/common.properties`|
The default configuration is as follows:
| Parameters | Default value | Description |
|--|--|--|
|data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files|
|resource.storage.type | NONE | type of resource files: HDFS, S3, NONE|
|resource.upload.path | /dolphinscheduler | storage path of resource files|
|aws.access.key.id | minioadmin | access key id of S3|
|aws.secret.access.key | minioadmin | secret access key of S3|
|aws.region | us-east-1 | region of S3|
|aws.s3.endpoint | http://minio:9000 | endpoint of S3|
|hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS|
|fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory|
|hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission|
|java.security.krb5.conf.path | /opt/krb5.conf | kerberos config directory|
|login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos username|
|login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab|
|kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour|
|yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone|
|yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode|
|development.state | false | specify whether in development state|
|dolphin.scheduler.network.interface.preferred | NONE | display name of the network card|
|dolphin.scheduler.network.priority.strategy | default | IP acquisition strategy, give priority to finding the internal network or the external network|
|resource.manager.httpaddress.port | 8088 | the port of resource manager|
|yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn|
|datasource.encryption.enable | false | whether to enable datasource encryption|
|datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption|
|data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality|
|support.hive.oneSession | false | specify whether hive SQL is executed in the same session|
|sudo.enable | true | whether to enable sudo|
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
|server.port|12345|api service communication port|
|server.servlet.session.timeout|120m|session timeout|
|server.servlet.context-path|/dolphinscheduler/ |request path|
|spring.servlet.multipart.max-file-size|1024MB|maximum file size|
|spring.servlet.multipart.max-request-size|1024MB|maximum request size|
|server.jetty.max-http-post-size|5000000|jetty maximum post size|
|spring.banner.charset|UTF-8|message encoding|
|spring.jackson.time-zone|UTC|time zone|
|spring.jackson.date-format|"yyyy-MM-dd HH:mm:ss"|time format|
|spring.messages.basename|i18n/messages|i18n config|
|security.authentication.type|PASSWORD|authentication type|
|security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP|
|security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls|
|security.authentication.ldap.base.dn|dc=example,dc=com|LDAP base dn|
|security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP username|
|security.authentication.ldap.password|password|LDAP password|
|security.authentication.ldap.user.identity.attribute|uid|LDAP user identity attribute|
|security.authentication.ldap.user.email.attribute|mail|LDAP user email attribute|
|traffic.control.global.switch|false|traffic control global switch|
|traffic.control.max-global-qps-rate|300|global max request number per second|
|traffic.control.tenant-switch|false|traffic control tenant switch|
|traffic.control.default-tenant-qps-rate|10|default tenant max request number per second|
|traffic.control.customize-tenant-qps-rate||customize tenant max request number per second|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
|master.listen-port|5678|master listen port|
|master.fetch-command-num|10|the number of commands fetched by master|
|master.pre-exec-threads|10|master prepare execute thread number to limit handle commands in parallel|
|master.exec-threads|100|master execute thread number to limit process instances in parallel|
|master.dispatch-task-number|3|master dispatch task number per batch|
|master.host-selector|lower_weight|master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight|
|master.heartbeat-interval|10|master heartbeat interval, the unit is second|
|master.task-commit-retry-times|5|master commit task retry times|
|master.task-commit-interval|1000|master commit task interval, the unit is millisecond|
|master.state-wheel-interval|5|time to check status|
|master.max-cpu-load-avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2|
|master.reserved-memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G|
|master.failover-interval|10|failover interval, the unit is minute|
|master.kill-yarn-job-when-task-failover|true|whether to kill yarn job when failover taskInstance|
|master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting|
|master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely|
|master.worker-group-refresh-interval|10s|The interval to refresh worker group from db to memory|
### Worker Server related configuration
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
|worker.listen-port|1234|worker-service listen port|
|worker.exec-threads|100|worker-service execute thread number, used to limit the number of task instances in parallel|
|worker.heartbeat-interval|10|worker-service heartbeat interval, the unit is second|
|worker.host-weight|100|worker host weight to dispatch tasks|
|worker.tenant-auto-create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.|
|worker.max-cpu-load-avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2|
|worker.reserved-memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G|
|worker.alert-listen-host|localhost|the alert listen host of worker|
|worker.alert-listen-port|50052|the alert listen port of worker|
|worker.registry-disconnect-strategy.strategy|stop|Used when the worker disconnect from registry, default value: stop. Optional values include stop, waiting|
|worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely |
|worker.task-execute-threads-full-policy|REJECT|If REJECT, when the task waiting in the worker reaches exec-threads, it will reject the received task and the Master will redispatch it; If CONTINUE, it will put the task into the worker's execution queue and wait for a free thread to start execution|
### Alert Server related configuration
Location: `alert-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
|server.port|50053|the port of Alert Server|
|alert.port|50052|the port of alert|
### Quartz related configuration
This part describes quartz configs and configure them based on your practical situation and resources.
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value|
|--|--|
|spring.quartz.properties.org.quartz.threadPool.threadPriority | 5|
|spring.quartz.properties.org.quartz.jobStore.isClustered | true|
|spring.quartz.properties.org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX|
|spring.quartz.properties.org.quartz.scheduler.instanceId | AUTO|
|spring.quartz.properties.org.quartz.jobStore.tablePrefix | QRTZ_|
|spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock|true|
|spring.quartz.properties.org.quartz.scheduler.instanceName | DolphinScheduler|
|spring.quartz.properties.org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool|
|spring.quartz.properties.org.quartz.jobStore.useProperties | false|
|spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons | true|
|spring.quartz.properties.org.quartz.threadPool.threadCount | 25|
|spring.quartz.properties.org.quartz.jobStore.misfireThreshold | 60000|
|spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon | true|
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
### dolphinscheduler_env.sh [load environment variables configs]
When using shell to commit tasks, DolphinScheduler will export environment variables from `bin/env/dolphinscheduler_env.sh`. The
mainly configuration including `JAVA_HOME`, mata database, registry center, and task configuration.
```bash
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL
export SPRING_DATASOURCE_USERNAME
export SPRING_DATASOURCE_PASSWORD
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
```
### Log related configuration
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/logback-spring.xml`|
|Api Server| `api-server/conf/logback-spring.xml`|
|Worker Server| `worker-server/conf/logback-spring.xml`|
|Alert Server| `alert-server/conf/logback-spring.xml`|

225
docs/docs/en/architecture/design.md

@ -0,0 +1,225 @@
# System Architecture Design
## System Structure
### System Architecture Diagram
<p align="center">
<img src="../../../img/architecture-1.3.0.jpg" alt="System architecture diagram" width="70%" />
<p align="center">
<em>System architecture diagram</em>
</p>
</p>
### Start Process Activity Diagram
<p align="center">
<img src="../../../img/process-start-flow-1.3.0.png" alt="Start process activity diagram" width="70%" />
<p align="center">
<em>Start process activity diagram</em>
</p>
</p>
### Architecture Description
* **MasterServer**
MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
MasterServer provides monitoring services based on netty.
#### The Service Mainly Includes:
- **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
- **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
- **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
- **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
- **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances;
- **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;
- **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
* **WorkerServer**
WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
WorkerServer provides monitoring services based on netty.
#### The Service Mainly Includes:
- **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
- **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
- **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
* **ZooKeeper**
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
* **AlertServer**
Provides alarm services, and implements rich alarm methods through alarm plugins.
* **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
* **UI**
The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section.
### Architecture Design Ideas
#### Decentralization VS Centralization
##### Centralized Thinking
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave character" width="50%" />
</p>
- The role of the master is mainly responsible for task distribution and monitoring the health status of the slave, and can dynamically balance the task to the slave, so that the slave node won't be in a "busy dead" or "idle dead" state.
- The role of Worker is mainly responsible for task execution and heartbeat maintenance to the Master, so that Master can assign tasks to Slave.
Problems in centralized thought design:
- Once there is a problem with the Master, the team grow aimless without commander and the entire cluster collapse. In order to solve this problem, most of the Master and Slave architecture models adopt the design scheme of active and standby Master, which can be hot standby or cold standby, or automatic switching or manual switching. More and more new systems are beginning to have ability to automatically elect and switch Master to improve the availability of the system.
- Another problem is that if the Scheduler is on the Master, although it can support different tasks in a DAG running on different machines, it will cause the Master to be overloaded. If the Scheduler is on the slave, all tasks in a DAG can only submit jobs on a certain machine. When there are more parallel tasks, the pressure on the slave may be greater.
##### Decentralized
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="Decentralization" width="50%" />
</p>
- In the decentralized design, there is usually no concept of Master or Slave. All roles are the same, the status is equal, the global Internet is a typical decentralized distributed system. Any node connected to the network goes down, will only affect a small range of functions.
- The core design of decentralized design is that there is no distinct "manager" different from other nodes in the entire distributed system, so there is no single point failure. However, because there is no "manager" node, each node needs to communicate with other nodes to obtain the necessary machine information, and the unreliability of distributed system communication greatly increases the difficulty to implement the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly pouring out. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will automatically hold "meetings" to elect new "managers" To preside over the work. The most typical case is Etcd implemented by ZooKeeper and Go language.
- The decentralization of DolphinScheduler is that the Master and Worker register in ZooKeeper, for implement the centerless feature to Master cluster and Worker cluster. Use the ZooKeeper distributed lock to elect one of the Master or Worker as the "manager" to perform the task.
#### Fault-Tolerant Design
Fault tolerance divides into service downtime fault tolerance and task retry, and service downtime fault tolerance divides into master fault tolerance and worker fault tolerance.
##### Downtime Fault Tolerance
The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and the implementation principle shows in the figure:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png" alt="DolphinScheduler fault-tolerant design" width="40%" />
</p>
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
- Master fault tolerance:
<p align="center">
<img src="../../../img/failover-master.jpg" alt="failover-master" width="50%" />
</p>
Fault tolerance range: From the perspective of host, the fault tolerance range of Master includes: own host and node host that does not exist in the registry, and the entire process of fault tolerance will be locked;
Fault-tolerant content: Master's fault-tolerant content includes: fault-tolerant process instances and task instances. Before fault-tolerant, compares the start time of the instance with the server start-up time, and skips fault-tolerance if after the server start time;
Fault-tolerant post-processing: After the fault tolerance of ZooKeeper Master completed, then re-schedule by the Scheduler thread in DolphinScheduler, traverses the DAG to find the "running" and "submit successful" tasks. Monitor the status of its task instances for the "running" tasks, and for the "commits successful" tasks, it is necessary to find out whether the task queue already exists. If exists, monitor the status of the task instance. Otherwise, resubmit the task instance.
- Worker fault tolerance:
<p align="center">
<img src="../../../img/failover-worker.jpg" alt="failover-worker" width="50%" />
</p>
Fault tolerance range: From the perspective of process instance, each Master is only responsible for fault tolerance of its own process instance; it will lock only when `handleDeadServer`;
Fault-tolerant content: When sending the remove event of the Worker node, the Master only fault-tolerant task instances. Before fault-tolerant, compares the start time of the instance with the server start-up time, and skips fault-tolerance if after the server start time;
Fault-tolerant post-processing: Once the Master Scheduler thread finds that the task instance is in the "fault-tolerant" state, it takes over the task and resubmits it.
Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
##### Task Failed and Try Again
Here we must first distinguish the concepts of task failure retry, process failure recovery, and process failure re-run:
- Task failure retry is at the task level and is automatically performed by the schedule system. For example, if a Shell task sets to retry for 3 times, it will try to run it again up to 3 times after the Shell task fails.
- Process failure recovery is at the process level and is performed manually. Recovery can only perform **from the failed node** or **from the current node**.
- Process failure re-run is also at the process level and is performed manually, re-run perform from the beginning node.
Next to the main point, we divide the task nodes in the workflow into two types.
- One is a business task, which corresponds to an actual script or process command, such as Shell task, SQL task, and Spark task.
- Another is a logical task, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process task, dependent task.
**Business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times. **Logical node** failure retry is not supported.
If there is a task failure in the workflow that reaches the maximum retry times, the workflow will fail and stop, and the failed workflow can be manually re-run or process recovery operations.
#### Task Priority Design
In the early schedule design, if there is no priority design and use the fair scheduling, the task submitted first may complete at the same time with the task submitted later, thus invalid the priority of process or task. So we have re-designed this, and the following is our current design:
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744784-eb351b14-c94a-4ed6-8ba4-5132c2a3d116.png" alt="Process priority configuration" width="40%" />
</p>
- The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below:
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744830-5eac611f-5933-4f53-a0c6-31613c283708.png" alt="Task priority configuration" width="35%" />
</p>
#### Logback and Netty Implement Log Access
- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options:
- Put logs on the ES search engine.
- Obtain remote log information through netty communication.
- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc remote access" width="50%" />
</p>
- For details, please refer to the logback configuration of Master and Worker, as shown in the following example:
```xml
<conversionRule conversionWord="messsage" converterClass="org.apache.dolphinscheduler.service.log.SensitiveDataConverter"/>
<appender name="TASKLOGFILE" class="ch.qos.logback.classic.sift.SiftingAppender">
<filter class="org.apache.dolphinscheduler.service.log.TaskLogFilter"/>
<Discriminator class="org.apache.dolphinscheduler.service.log.TaskLogDiscriminator">
<key>taskAppId</key>
<logBase>${log.base}</logBase>
</Discriminator>
<sift>
<appender name="FILE-${taskAppId}" class="ch.qos.logback.core.FileAppender">
<file>${log.base}/${taskAppId}.log</file>
<encoder>
<pattern>
[%level] %date{yyyy-MM-dd HH:mm:ss.SSS Z} [%thread] %logger{96}:[%line] - %messsage%n
</pattern>
<charset>UTF-8</charset>
</encoder>
<append>true</append>
</appender>
</sift>
</appender>
```
## Sum Up
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system: DolphinScheduler. To be continued.

60
docs/docs/en/architecture/load-balance.md

@ -0,0 +1,60 @@
# Load Balance
Load balancing refers to the reasonable allocation of server pressure through routing algorithms (usually in cluster environments) to achieve the maximum optimization of server performance.
## DolphinScheduler-Worker Load Balancing Algorithms
DolphinScheduler-Master allocates tasks to workers, and by default provides three algorithms:
- Weighted random (random)
- Smoothing polling (round-robin)
- Linear load (lower weight)
The default configuration is the linear load.
As the routing sets on the client side, the master service, you can change master.host.selector in master.properties to configure the algorithm.
e.g. master.host.selector=random (case-insensitive)
## Worker Load Balancing Configuration
The configuration file is worker.properties
### Weight
All the load algorithms above are weighted based on weights, which affect the routing outcome. You can set different weights for different machines by modifying the `worker.weight` value.
### Preheating
Consider JIT optimization, worker runs at low power for a period of time after startup, so that it can gradually reach its optimal state, a process we call preheating. If you are interested, you can read some articles about JIT.
So the worker gradually reaches its maximum weight with time after starts up ( by default ten minutes, there is no configuration about the pre-heating duration, it's recommend to submit a PR if have needs to change the duration).
## Load Balancing Algorithm in Details
### Random (Weighted)
This algorithm is relatively simple, select a worker by random (the weight affects its weighting).
### Smoothed Polling (Weighted)
An obvious drawback of the weighted polling algorithm, which is under special weights circumstance, weighted polling scheduling generates an imbalanced sequence of instances, and this unsmooth load may cause some instances to experience transient high loads, leading to a risk of system crash. To address this scheduling flaw, we provide a smooth weighted polling algorithm.
Each worker has two weights parameters, weight (which remains constant after warm-up is complete) and current_weight (which changes dynamically). For every route, calculate the current_weight + weight and is iterated over all the workers, the weight of all the workers sum up and count as total_weight, then the worker with the largest current_weight is selected as the worker for this task. By meantime, set worker's current_weight-total_weight.
### Linear Weighting (Default Algorithm)
This algorithm reports its own load information to the registry at regular intervals. Make decision on two main pieces of information:
- load average (default is the number of CPU cores * 2)
- available physical memory (default is 0.3, in G)
If either of these is lower than the configured item, then this worker will not participate in the load. (no traffic will be allocated)
You can customise the configuration by changing the following properties in worker.properties
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)

43
docs/docs/en/architecture/metadata.md

@ -0,0 +1,43 @@
# MetaData
## Table Schema
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
---
## E-R Diagram
### User Queue DataSource
![image.png](../../../img/metadata-erd/user-queue-datasource.png)
- One tenant can own Multiple users.
- The queue field in the `t_ds_user` table stores the `queue_name` information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using `queue_id` column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
- The `user_id` field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
### Project Resource Alert
![image.png](../../../img/metadata-erd/project-resource-alert.png)
- User can have multiple projects, user project authorization completes the relationship binding using `project_id` and `user_id` in `t_ds_relation_project_user` table.
- The `user_id` in the `t_ds_projcet` table represents the user who create the project, and the `user_id` in the `t_ds_relation_project_user` table represents users who have permission to the project.
- The `user_id` in the `t_ds_resources` table represents the user who create the resource, and the `user_id` in `t_ds_relation_resources_user` represents the user who has permissions to the resource.
- The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
### Project - Tenant - ProcessDefinition - Schedule
![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
- A project can have multiple process definitions, and each process definition belongs to only one project.
- A tenant can be used by multiple process definitions, and each process definition must select only one tenant.
- A workflow definition can have one or more schedules.
### Process Definition Execution
![image.png](../../../img/metadata-erd/process_definition.png)
- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0.
- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.

1116
docs/docs/en/architecture/task-structure.md

File diff suppressed because it is too large

131
docs/docs/en/contribute/api-standard.md

@ -0,0 +1,131 @@
# API design standard
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
+ A Resource: expressed in the singular, or use the ID to represent the corresponding resource, such as `group`、`groups/{groupId}`;
+ Sub Resources: Resources under a certain resource, such as `/instances/{instanceId}/tasks`;
+ A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
```
Method: GET
/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② Create - POST
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
+ create an `alter-group`
```
Method: POST
/dolphinscheduler/alter-groups
```
+ create sub-resources is also the same as above.
```
Method: POST
/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ Modify - PUT
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)**
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Partial Modifications -PATCH
Use URI to locate the resource, use PATCH to partial modifications.
```
Method: PATCH
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ⑥ Others
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
```
/dolphinscheduler/alert-groups/verify-name
/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. Parameter design
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
### base URL
The URI of the project needs to use `/<project_name>` as the base path, so as to identify that these APIs are under this project.
```
/dolphinscheduler
```

92
docs/docs/en/contribute/api-test.md

@ -0,0 +1,92 @@
# DolphinScheduler API Automation Test
## Preparatory knowledge
### The difference between API Test and Unit Test
API test, which imitates the user calling API, starts from a certain entry and performs operations step by step until a certain work is completed. Different from unit testing, the latter usually needs to test parameters, parameter types, parameter values, parameter numbers, return values, throw errors, etc. in order to ensure that a specific function can complete its work stably and reliably in any case. Unit tests assume that the entire product will work as long as all functions work properly.
In contrast, API testing focuses on whether a complete operation chain can be completed
For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry.
## API Test
### API-Pages
DolphinScheduler's API tests are deployed using docker-compose. The current tests are in standalone mode and are mainly used to check some basic functions such as "add, delete, change and check". For further cluster validation, such as collaboration between services or communication mechanisms between services, refer to `deploy/docker/docker-compose.yml` for configuration.
For API test, the [page model](https://www.selenium.dev/documentation/guidelines/page_object_models/) form is used, mainly to create a corresponding model for each page. The following is an example of a login page.
```java
package org.apache.dolphinscheduler.api.test.pages;
import org.apache.dolphinscheduler.api.test.entity.HttpResponse;
import org.apache.dolphinscheduler.api.test.utils.RequestClient;
import java.util.HashMap;
import java.util.Map;
public final class LoginPage {
public HttpResponse login(String username, String password) {
Map<String, Object> params = new HashMap<>();
params.put("userName", username);
params.put("userPassword", password);
RequestClient requestClient = new RequestClient();
return requestClient.post("/login", null, params);
}
}
```
During the test process, we only test the interfaces we need to focus on, not all interfaces in the page. Therefore, we only declare the user name, password and interface path on the login page.
In addition, during the testing process, the interface are not requested directly. The general choice is to package the corresponding methods to achieve the effect of reuse. For example, if you want to log in, you input your username and password through the `public LoginPage login()` method to manipulate the elements you pass in to achieve the effect of logging in. That is, when the user finishes logging in, he or she achieve the effect of login.
On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases.
### API-Cases
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.
The interface is requested using the RemoteWebDriver provided with Selenium. Before each test case is started there is some preparation work that needs to be done. For example: logging in the user, jumping to the corresponding page (depending on the specific test case).
```java
@BeforeAll
public static void setup() {
LoginPage loginPage = new LoginPage();
HttpResponse loginHttpResponse = loginPage.login(user, password);
sessionId = JSONUtils.convertValue(loginHttpResponse.body().data(), LoginResponseData.class).sessionId();
}
```
When the preparation is complete, it is time for the formal test case writing. We use a form of @Order() annotation for modularity, to confirm the order of the tests. After the tests have been run, assertions are used to determine if the tests were successful, and if the assertion returns true, the tenant creation was successful. The following code can be used as a reference:
```java
@Test
@Order(1)
public void testCreateTenant() {
TenantPage tenantPage = new TenantPage();
HttpResponse createTenantHttpResponse = tenantPage.createTenant(sessionId, tenant, 1, "");
Assertions.assertTrue(createTenantHttpResponse.body().success());
}
```
The rest are similar cases and can be understood by referring to the specific source code.
https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-api-test/dolphinscheduler-api-test-case/src/test/java/org/apache/dolphinscheduler/api.test/cases
## Supplements
When running API tests locally, First, you need to start the local service, you can refer to this page:
[development-environment-setup](./development-environment-setup.md)
When running API tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI.
The current default request timeout length is 10 seconds. This value should not be modified without special requirements.

293
docs/docs/en/contribute/architecture-design.md

@ -0,0 +1,293 @@
## Architecture Design
Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system.
### 1.Noun Interpretation
**DAG:** Full name Directed Acyclic Graph,referred to as DAG。Tasks in the workflow are assembled in the form of directed acyclic graphs, which are topologically traversed from nodes with zero indegrees of ingress until there are no successor nodes. For example, the following picture:
<p align="center">
<img src="../../../img/architecture-design/dag_examples.png" alt="dag示例" width="80%" />
<p align="center">
<em>dag example</em>
</p>
</p>
**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
**Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated
**Task instance**: A task instance is the instantiation of a specific task node when a process instance runs, which indicates the specific task execution status
**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (dependency), and plans to support dynamic plug-in extension, note: the sub-**SUB_PROCESS** is also A separate process definition that can be launched separately
**Schedule mode** : The system supports timing schedule and manual schedule based on cron expressions. Command type support: start workflow, start execution from current node, resume fault-tolerant workflow, resume pause process, start execution from failed node, complement, timer, rerun, pause, stop, resume waiting thread. Where **recovers the fault-tolerant workflow** and **restores the waiting thread** The two command types are used by the scheduling internal control and cannot be called externally
**Timed schedule**: The system uses **quartz** distributed scheduler and supports the generation of cron expression visualization
**Dependency**: The system does not only support **DAG** Simple dependencies between predecessors and successor nodes, but also provides **task dependencies** nodes, support for **custom task dependencies between processes**
**Priority**: Supports the priority of process instances and task instances. If the process instance and task instance priority are not set, the default is first in, first out.
**Mail Alert**: Support **SQL Task** Query Result Email Send, Process Instance Run Result Email Alert and Fault Tolerant Alert Notification
**Failure policy**: For tasks running in parallel, if there are tasks that fail, two failure policy processing methods are provided. **Continue** means that the status of the task is run in parallel until the end of the process failure. **End** means that once a failed task is found, Kill also drops the running parallel task and the process ends.
**Complement**: Complement historical data, support **interval parallel and serial** two complement methods
### 2.System architecture
#### 2.1 System Architecture Diagram
<p align="center">
<img src="../../../img/architecture.jpg" alt="System Architecture Diagram" />
<p align="center">
<em>System Architecture Diagram</em>
</p>
</p>
#### 2.2 Architectural description
* **MasterServer**
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
##### The service mainly contains:
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
- **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
- **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
- **MasterTaskExecThread** is mainly responsible for task persistence
* **WorkerServer**
- WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
##### This service contains:
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **ZooKeeper**
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
- **Task Queue**
The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
- **Alert**
Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
- **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
- **UI**
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/3.1.2/about/introduction) section.
#### 2.3 Architectural Design Ideas
##### I. Decentralized vs centralization
###### Centralization Thought
The centralized design concept is relatively simple. The nodes in the distributed cluster are divided into two roles according to their roles:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave role" width="50%" />
</p>
- The role of Master is mainly responsible for task distribution and supervising the health status of Slave. It can dynamically balance the task to Slave, so that the Slave node will not be "busy" or "free".
- The role of the Worker is mainly responsible for the execution of the task and maintains the heartbeat with the Master so that the Master can assign tasks to the Slave.
Problems in the design of centralized :
- Once the Master has a problem, the group has no leader and the entire cluster will crash. In order to solve this problem, most Master/Slave architecture modes adopt the design scheme of the master and backup masters, which can be hot standby or cold standby, automatic switching or manual switching, and more and more new systems are available. Automatically elects the ability to switch masters to improve system availability.
- Another problem is that if the Scheduler is on the Master, although it can support different tasks in one DAG running on different machines, it will generate overload of the Master. If the Scheduler is on the Slave, all tasks in a DAG can only be submitted on one machine. If there are more parallel tasks, the pressure on the Slave may be larger.
###### Decentralization
<p align="center"
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="decentralized" width="50%" />
</p>
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
##### 二、Distributed lock practice
DolphinScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission.
1. The core process algorithm for obtaining distributed locks is as follows
<p align="center">
<img src="../../../img/architecture-design/distributed_lock.png" alt="Get Distributed Lock Process" width="70%" />
</p>
2. Scheduler thread distributed lock implementation flow chart in DolphinScheduler:
<p align="center">
<img src="../../../img/architecture-design/distributed_lock_procss.png" alt="Get Distributed Lock Process" />
</p>
##### Third, the thread is insufficient loop waiting problem
- If there is no subprocess in a DAG, if the number of data in the Command is greater than the threshold set by the thread pool, the direct process waits or fails.
- If a large number of sub-processes are nested in a large DAG, the following figure will result in a "dead" state:
<p align="center">
<img src="../../../img/architecture-design/lack_thread.png" alt="Thread is not enough to wait for loop" width="70%" />
</p>
In the above figure, MainFlowThread waits for SubFlowThread1 to end, SubFlowThread1 waits for SubFlowThread2 to end, SubFlowThread2 waits for SubFlowThread3 to end, and SubFlowThread3 waits for a new thread in the thread pool, then the entire DAG process cannot end, and thus the thread cannot be released. This forms the state of the child parent process loop waiting. At this point, the scheduling cluster will no longer be available unless a new Master is started to add threads to break such a "stuck."
It seems a bit unsatisfactory to start a new Master to break the deadlock, so we proposed the following three options to reduce this risk:
1. Calculate the sum of the threads of all Masters, and then calculate the number of threads required for each DAG, that is, pre-calculate before the DAG process is executed. Because it is a multi-master thread pool, the total number of threads is unlikely to be obtained in real time.
2. Judge the single master thread pool. If the thread pool is full, let the thread fail directly.
3. Add a Command type with insufficient resources. If the thread pool is insufficient, the main process will be suspended. This way, the thread pool has a new thread, which can make the process with insufficient resources hang up and wake up again.
Note: The Master Scheduler thread is FIFO-enabled when it gets the Command.
So we chose the third way to solve the problem of insufficient threads.
##### IV. Fault Tolerant Design
Fault tolerance is divided into service fault tolerance and task retry. Service fault tolerance is divided into two types: Master Fault Tolerance and Worker Fault Tolerance.
###### 1. Downtime fault tolerance
Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The implementation principle is as follows:
<p align="center">
<img src="../../../img/architecture-design/fault-tolerant.png" alt="DolphinScheduler Fault Tolerant Design" width="70%" />
</p>
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
- Master fault tolerance flow chart:
<p align="center">
<img src="../../../img/architecture-design/fault-tolerant_master.png" alt="Master Fault Tolerance Flowchart" width="70%" />
</p>
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
- Worker fault tolerance flow chart:
<p align="center">
<img src="../../../img/architecture-design/fault-tolerant_worker.png" alt="Worker Fault Tolerance Flowchart" width="70%" />
</p>
Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits.
Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
###### 2. Task failure retry
Here we must first distinguish between the concept of task failure retry, process failure recovery, and process failure rerun:
- Task failure Retry is task level, which is automatically performed by the scheduling system. For example, if a shell task sets the number of retries to 3 times, then the shell task will try to run up to 3 times after failing to run.
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
- There is also a logical node, which does not do the actual script or statement processing, but the logical processing of the entire process flow, such as sub-flow sections.
Each **service node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the configured number of retries. **Logical node** does not support failed retry. But the tasks in the logical nodes support retry.
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="../../../img/architecture-design/process_priority.png" alt="Process Priority Configuration" width="40%" />
</p>
- The priority of the task is also divided into 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">`
<img src="../../../img/architecture-design/task_priority.png" alt="task priority configuration" width="35%" />
</p>
##### VI. Logback and gRPC implement log access
- Since the Web (UI) and Worker are not necessarily on the same machine, viewing the log is not as it is for querying local files. There are two options:
- Put the logs on the ES search engine
- Obtain remote log information through gRPC communication
- Considering the lightweightness of DolphinScheduler as much as possible, gRPC was chosen to implement remote access log information.
<p align="center">
<img src="../../../img/architecture-design/grpc.png" alt="grpc remote access" width="50%" />
</p>
- We use a custom Logback FileAppender and Filter function to generate a log file for each task instance.
- The main implementation of FileAppender is as follows:
```java
/**
* task log appender
*/
Public class TaskLogAppender extends FileAppender<ILoggingEvent> {
...
@Override
Protected void append(ILoggingEvent event) {
If (currentlyActiveFile == null){
currentlyActiveFile = getFile();
}
String activeFile = currentlyActiveFile;
// thread name: taskThreadName-processDefineId_processInstanceId_taskInstanceId
String threadName = event.getThreadName();
String[] threadNameArr = threadName.split("-");
// logId = processDefineId_processInstanceId_taskInstanceId
String logId = threadNameArr[1];
...
super.subAppend(event);
}
}
```
Generate a log in the form of /process definition id/process instance id/task instance id.log
- Filter matches the thread name starting with TaskLogInfo:
- TaskLogFilter is implemented as follows:
```java
/**
* task log filter
*/
Public class TaskLogFilter extends Filter<ILoggingEvent> {
@Override
Public FilterReply decide(ILoggingEvent event) {
If (event.getThreadName().startsWith("TaskLogInfo-")){
Return FilterReply.ACCEPT;
}
Return FilterReply.DENY;
}
}
```
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

62
docs/docs/en/contribute/backend/mechanism/global-parameter.md

@ -0,0 +1,62 @@
# Global Parameter development document
After the user defines the parameter with the direction OUT, it is saved in the localParam of the task.
## Usage of parameters
Getting the direct predecessor node `preTasks` of the current `taskInstance` to be created from the DAG, get the `varPool` of `preTasks`, merge this varPool (List) into one `varPool`, and in the merging process, if parameters with the same parameter name are found, they will be handled according to the following logics:
* If all the values are null, the merged value is null
* If one and only one value is non-null, then the merged value is the non-null value
* If all the values are not null, it would be the earliest value of the endtime of taskInstance taken by VarPool.
The direction of all the merged properties is updated to IN during the merge process.
The result of the merge is saved in taskInstance.varPool.
The worker receives and parses the varPool into the format of `Map<String,Property>`, where the key of the map is property.prop, which is the parameter name.
When the processor processes the parameters, it will merge the varPool and localParam and globalParam parameters, and if there are parameters with duplicate names during the merging process, they will be replaced according to the following priorities, with the higher priority being retained and the lower priority being replaced:
* globalParam: high
* varPool: middle
* localParam: low
The parameters are replaced with the corresponding values using regular expressions compared to ${parameter name} before the node content is executed.
## Parameter setting
Currently, only SQL and SHELL nodes are supported to get parameters.
Get the parameter with direction OUT from localParam, and do the following way according to the type of different nodes.
### SQL node
The structure returned by the parameter is List<Map<String,String>>, where the elements of List are each row of data, the key of Map is the column name, and the value is the value corresponding to the column.
* If the SQL statement returns one row of data, match the OUT parameter name based on the OUT parameter name defined by the user when defining the task, or discard it if it does not match.
* If the SQL statement returns multiple rows of data, the column names are matched based on the OUT parameter names defined by the user when defining the task of type LIST. All rows of the corresponding column are converted to `List<String>` as the value of this parameter. If there is no match, it is discarded.
### SHELL node
The result of the processor execution is returned as `Map<String,String>`.
The user needs to define `${setValue(key=value)}` in the output when defining the shell script.
Remove `${setValue()}` when processing parameters, split by "=", with the 0th being the key and the 1st being the value.
Similarly match the OUT parameter name and key defined by the user when defining the task, and use value as the value of that parameter.
Return parameter processing
* The result of acquired Processor is String.
* Determine whether the processor is empty or not, and exit if it is empty.
* Determine whether the localParam is empty or not, and exit if it is empty.
* Get the parameter of localParam which is OUT, and exit if it is empty.
* Format String as per appeal format (`List<Map<String,String>>` for SQL, `Map<String,String>>` for shell).
Assign the parameters with matching values to varPool (List, which contains the original IN's parameters)
* Format the varPool as json and pass it to master.
* The parameters that are OUT would be written into the localParam after the master has received the varPool.

6
docs/docs/en/contribute/backend/mechanism/overview.md

@ -0,0 +1,6 @@
# Overview
<!-- TODO Since the side menu does not support multiple levels, add new page to keep all sub page here -->
* [Global Parameter](global-parameter.md)
* [Switch Task type](task/switch.md)

9
docs/docs/en/contribute/backend/mechanism/task/switch.md

@ -0,0 +1,9 @@
# SWITCH Task development
Switch task workflow step as follows
* User-defined expressions and branch information are stored in `taskParams` in `taskdefinition`. When the switch is executed, it will be formatted as `SwitchParameters`
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left

108
docs/docs/en/contribute/backend/spi/alert.md

@ -0,0 +1,108 @@
### DolphinScheduler Alert SPI main design
#### DolphinScheduler SPI Design
DolphinScheduler is undergoing a microkernel + plug-in architecture change. All core capabilities such as tasks, resource storage, registration centers, etc. will be designed as extension points. We hope to use SPI to improve DolphinScheduler’s own flexibility and friendliness (extended sex).
For alarm-related codes, please refer to the `dolphinscheduler-alert-api` module. This module defines the extension interface of the alarm plug-in and some basic codes. When we need to realize the plug-inization of related functions, it is recommended to read the code of this block first. Of course, it is recommended that you read the document. This will reduce a lot of time, but the document There is a certain degree of lag. When the document is missing, it is recommended to take the source code as the standard (if you are interested, we also welcome you to submit related documents). In addition, we will hardly make changes to the extended interface (excluding new additions) , Unless there is a major structural adjustment, there is an incompatible upgrade version, so the existing documents can generally be satisfied.
We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
In additional, the `AlertChannelFactory` extends from `PrioritySPI`, this means you can set the plugin priority, when you have two plugin has the same name, you can customize the priority by override the `getIdentify` method. The high priority plugin will be load, but if you have two plugin with the same name and same priority, the server will throw `IllegalArgumentException` when load the plugin.
By the way, we have adopted an excellent front-end component form-create, which supports the generation of front-end UI components based on JSON. If plug-in development involves the front-end, we will use JSON to generate related front-end UI components, org.apache.dolphinscheduler. The parameters of the plug-in are encapsulated in spi.params, which will convert all the relevant parameters into the corresponding JSON, which means that you can complete the drawing of the front-end components by way of Java code (here is mainly the form, we only care Data exchanged between the front and back ends).
This article mainly focuses on the design and development of Alert.
#### Main Modules
If you don't care about its internal design, but simply want to know how to develop your own alarm plug-in, you can skip this content.
* dolphinscheduler-alert-api
This module is the core module of ALERT SPI. This module defines the interface of the alarm plug-in extension and some basic codes. The extension plug-in must implement the interface defined by this module: `org.apache.dolphinscheduler.alert.api.AlertChannelFactory`
* dolphinscheduler-alert-plugins
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
#### Alert SPI Main class information.
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
AlertChannel
The interface of the alert plug-in. The alert plug-in needs to implement this interface. There is only one method process in this interface. The upper-level alert system will call this method and obtain the return information of the alert through the AlertResult returned by this method.
AlertData
Alarm content information, including id, title, content, log.
AlertInfo
For alarm-related information, when the upper-level system calls an instance of the alarm plug-in, the instance of this class is passed to the specific alarm plug-in through the process method. It contains the alert content AlertData and the parameter information filled in by the front end of the called alert plug-in instance.
AlertResult
The alarm plug-in sends alarm return information.
org.apache.dolphinscheduler.spi.params
This package is a plug-in parameter definition. Our front-end uses the from-create front-end library http://www.form-create.com, which can dynamically generate the front-end UI based on the parameter list json returned by the plug-in definition, so We don't need to care about the front end when we are doing SPI plug-in development.
Under this package, we currently only encapsulate RadioParam, TextParam, and PasswordParam, which are used to define text type parameters, radio parameters and password type parameters, respectively.
AbsPluginParams This class is the base class of all parameters, RadioParam these classes all inherit this class. Each DS alert plug-in will return a list of AbsPluginParams in the implementation of AlertChannelFactory.
The specific design of alert_spi can be seen in the issue: [Alert Plugin Design](https://github.com/apache/incubator-dolphinscheduler/issues/3049)
#### Alert SPI built-in implementation
* Email
Email alert notification
* DingTalk
Alert for DingTalk group chat bots
Related parameter configuration can refer to the DingTalk robot document.
* EnterpriseWeChat
EnterpriseWeChat alert notifications
Related parameter configuration can refer to the EnterpriseWeChat robot document.
* Script
We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
* SMS
SMS alerts
* FeiShu
FeiShu alert notification
* Slack
Slack alert notification
* PagerDuty
PagerDuty alert notification
* WebexTeams
WebexTeams alert notification
Related parameter configuration can refer to the WebexTeams document.
* Telegram
Telegram alert notification
Related parameter configuration can refer to the Telegram document.
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)

25
docs/docs/en/contribute/backend/spi/datasource.md

@ -0,0 +1,25 @@
## DolphinScheduler Datasource SPI main design
#### How do I use data sources?
The data source center supports POSTGRESQL, HIVE/IMPALA, SPARK, CLICKHOUSE, SQLSERVER data sources by default.
If you are using MySQL or ORACLE data source, you need to place the corresponding driver package in the lib directory
#### How to do Datasource plugin development?
org.apache.dolphinscheduler.spi.datasource.DataSourceChannel
org.apache.dolphinscheduler.spi.datasource.DataSourceChannelFactory
org.apache.dolphinscheduler.plugin.datasource.api.client.CommonDataSourceClient
1. In the first step, the data source plug-in can implement the above interfaces and inherit the general client. For details, refer to the implementation of data source plug-ins such as sqlserver and mysql. The addition methods of all RDBMS plug-ins are the same.
2. Add the driver configuration in the data source plug-in pom.xml
We provide APIs for external access of all data sources in the dolphin scheduler data source API module
In additional, the `DataSourceChannelFactory` extends from `PrioritySPI`, this means you can set the plugin priority, when you have two plugin has the same name, you can customize the priority by override the `getIdentify` method. The high priority plugin will be load, but if you have two plugin with the same name and same priority, the server will throw `IllegalArgumentException` when load the plugin.
#### **Future plan**
Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc.

28
docs/docs/en/contribute/backend/spi/registry.md

@ -0,0 +1,28 @@
### DolphinScheduler Registry SPI Extension
#### how to use?
Make the following configuration (take zookeeper as an example)
* Registry plug-in configuration, take Zookeeper as an example (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
```
For specific configuration information, please refer to the parameter information provided by the specific plug-in, for example zk: `org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java`
All configuration information prefixes need to be +registry, such as base.sleep.time.ms, which should be configured in the registry as follows: registry.base.sleep.time.ms=100
#### How to expand
`dolphinscheduler-registry-api` defines the standard for implementing plugins. When you need to extend plugins, you only need to implement `org.apache.dolphinscheduler.registry.api.RegistryFactory`.
Under the `dolphinscheduler-registry-plugin` module is the registry plugin we currently provide.
#### FAQ
1: registry connect timeout
You can increase the relevant timeout parameters.

Some files were not shown because too many files changed in this diff

Loading…
Cancel
Save