I am very new to Spark Streaming.I have some basic doubts..Can some one please help me to clarify this:
-
My message size is standard.1Kb each message.
-
Number of Topic partitions is 30 and using dstream approach to consume message from kafka.
-
Number of cores given to spark job as :
( spark.max.cores=6| spark.executor.cores=2)
-
As I understand that Number of Kafka Partitions=Number of RDD partitions:
In this case dstream approach: dstream.forEachRdd(rdd->{ rdd.forEachPartition{ } **Question**:This loop forEachPartiton will execute 30 times??As there are 30 Kafka partitions
}
-
Also since I have given 6 cores,How many partitions will be consumed in parallel from kafka
Questions: Is it 6 partitions at a time or
30/6 =5 partitions at a time? Can some one please give little detail on it on how this exactly work in dstream approach.
没有评论:
发表评论