有些事如何做: Number Of Parallel Task in Spark Streaming and Kafka Integration

2020年12月22日星期二

Number Of Parallel Task in Spark Streaming and Kafka Integration

I am very new to Spark Streaming.I have some basic doubts..Can some one please help me to clarify this:

My message size is standard.1Kb each message.
Number of Topic partitions is 30 and using dstream approach to consume message from kafka.
Number of cores given to spark job as :

( spark.max.cores=6| spark.executor.cores=2)

As I understand that Number of Kafka Partitions=Number of RDD partitions:

 In this case dstream approach:     dstream.forEachRdd(rdd->{   rdd.forEachPartition{   }      **Question**:This loop forEachPartiton will execute 30 times??As there are 30 Kafka partitions

}

Also since I have given 6 cores,How many partitions will be consumed in parallel from kafka

Questions: Is it 6 partitions at a time or
30/6 =5 partitions at a time? Can some one please give little detail on it on how this exactly work in dstream approach.

https://stackoverflow.com/questions/65418503/number-of-parallel-task-in-spark-streaming-and-kafka-integration December 23, 2020 at 11:05AM

没有评论:

发表评论

订阅：博文评论 (Atom)