Im trying to define a function that gets the cartesian product of a given list with itself , however i nedd to filter out the elemnts that contains the same pairs.
For example: Getting the cartesian product of rdd and fiter out the results ((1,0),(1,0)),((2,0),(2,0)) and ((3,0),(3,0))
rdd = sc.parallelize([(1,0), (2,0), (3,0)])      def get_cart(rdd):           a=sorted(rdd.cartesian(rdd).collect())       aRDD=sc.parallelize(a)         return aRDD  Im expecting to get the output:
[((1, 0), (2, 0)), ((1, 0), (3, 0)), ((2, 0), (1, 0)), ((2, 0), (3, 0)), ((3, 0), (1, 0)), ((3, 0), (2, 0))]  Instead im getting:
[((1, 0), (1, 0)),   ((1, 0), (2, 0)),   ((1, 0), (3, 0)),   ((2, 0), (1, 0)),   ((2, 0), (2, 0)),   ((2, 0), (3, 0)),   ((3, 0), (1, 0)),   ((3, 0), (2, 0)),   ((3, 0), (3, 0))]   
没有评论:
发表评论