2021年5月5日星期三

Setup of Scala/Flink project using Bazel

I am trying to setup a simple flink application from scratch using Bazel. I've bootstrapped the project by running

sbt new tillrohrmann/flink-project.g8  

and after that I have added some files in order for Bazel to take control of the building (i.e., migrate from sbt). This is how the WORKSPACE looks like

# WORKSPACE  load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")    skylib_version = "1.0.3"  http_archive(      name = "bazel_skylib",      sha256 = "1c531376ac7e5a180e0237938a2536de0c54d93f5c278634818e0efc952dd56c",      type = "tar.gz",      url = "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/{}/bazel-skylib-{}.tar.gz".format(skylib_version, skylib_version),  )    rules_scala_version = "5df8033f752be64fbe2cedfd1bdbad56e2033b15"    http_archive(      name = "io_bazel_rules_scala",      sha256 = "b7fa29db72408a972e6b6685d1bc17465b3108b620cb56d9b1700cf6f70f624a",      strip_prefix = "rules_scala-%s" % rules_scala_version,      type = "zip",      url = "https://github.com/bazelbuild/rules_scala/archive/%s.zip" % rules_scala_version,  )    # Stores Scala version and other configuration  # 2.12 is a default version, other versions can be use by passing them explicitly:  load("@io_bazel_rules_scala//:scala_config.bzl", "scala_config")  scala_config(scala_version = "2.12.11")    load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories")  scala_repositories()    load("@io_bazel_rules_scala//scala:toolchains.bzl", "scala_register_toolchains")  scala_register_toolchains()    load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_binary", "scala_test")    # optional: setup ScalaTest toolchain and dependencies  load("@io_bazel_rules_scala//testing:scalatest.bzl", "scalatest_repositories", "scalatest_toolchain")  scalatest_repositories()  scalatest_toolchain()    load("//vendor:workspace.bzl", "maven_dependencies")  maven_dependencies()    load("//vendor:target_file.bzl", "build_external_workspace")  build_external_workspace(name = "vendor")  

and this is the BUILD file

package(default_visibility = ["//visibility:public"])    load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_test")    scala_library(      name = "job",      srcs = glob(["src/main/scala/**/*.scala"]),      deps = [          "@vendor//vendor/org/apache/flink:flink_clients",          "@vendor//vendor/org/apache/flink:flink_scala",          "@vendor//vendor/org/apache/flink:flink_streaming_scala",      ]  )  

I'm using bazel-deps for vendoring the dependencies (put in the vendor folder). I have this on my dependencies.yaml file:

options:    buildHeader: [        "load(\"@io_bazel_rules_scala//scala:scala_import.bzl\", \"scala_import\")",        "load(\"@io_bazel_rules_scala//scala:scala.bzl\", \"scala_library\", \"scala_binary\", \"scala_test\")",    ]    languages: [ "java", "scala:2.12.11" ]    resolverType: "coursier"    thirdPartyDirectory: "vendor"    resolvers:      - id: "mavencentral"        type: "default"        url: https://repo.maven.apache.org/maven2/    strictVisibility: true    transitivity: runtime_deps    versionConflictPolicy: highest    dependencies:    org.apache.flink:      flink:        lang: scala        version: "1.11.2"        modules: [clients, scala, streaming-scala] # provided      flink-connector-kafka:        lang: java        version: "0.10.2"      flink-test-utils:        lang: java        version: "0.10.2"  

For downloading the dependencies, I'm running

bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala --sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps dependencies.yaml  

Which runs just fine, but then when I try to build the project

bazel build //:job  

I'm getting this error

Starting local Bazel server and connecting to it...  ERROR: Traceback (most recent call last):      File "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line 44, column 25, in <toplevel>          build_external_workspace(name = "vendor")      File "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", line 258, column 91, in build_external_workspace          return build_external_workspace_from_opts(name = name, target_configs = list_target_data(), separator = list_target_data_separator(), build_header = build_header())      File "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", line 251, column 40, in list_target_data          "vendor/org/apache/flink:flink_clients": ["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients","visibility||||||//visibility:public","kind||||||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],  Error: dictionary expression has duplicate key: "vendor/org/apache/flink:flink_clients"  ERROR: error loading package 'external': Package 'external' contains errors  INFO: Elapsed time: 3.644s  INFO: 0 processes.  FAILED: Build did NOT complete successfully (0 packages loaded)  

Why is that? Anyone can help? It would be great having detailed instructions and project templates for Flink/Scala applications using Bazel. I've put everything together in the following repo: https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR or whatever.

https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel April 30, 2021 at 05:51PM

没有评论:

发表评论