2020年12月22日星期二

Number Of Parallel Task in Spark Streaming and Kafka Integration

I am very new to Spark Streaming.I have some basic doubts..Can some one please help me to clarify this:

  1. My message size is standard.1Kb each message.

  2. Number of Topic partitions is 30 and using dstream approach to consume message from kafka.

  3. Number of cores given to spark job as :

    ( spark.max.cores=6| spark.executor.cores=2)

  4. As I understand that Number of Kafka Partitions=Number of RDD partitions:

     In this case dstream approach:     dstream.forEachRdd(rdd->{   rdd.forEachPartition{   }      **Question**:This loop forEachPartiton will execute 30 times??As there are 30 Kafka partitions  

    }

  5. Also since I have given 6 cores,How many partitions will be consumed in parallel from kafka

    Questions: Is it 6 partitions at a time or
    30/6 =5 partitions at a time? Can some one please give little detail on it on how this exactly work in dstream approach.

https://stackoverflow.com/questions/65418503/number-of-parallel-task-in-spark-streaming-and-kafka-integration December 23, 2020 at 11:05AM

没有评论:

发表评论