I have a pyspark code that reads from a persistent store(HDFS) and creates a soark dataframe in memory. I believe it is called caching.
What i need is this: every night the pyspark should run and refresh the cache ,such that other pyspark scripts can directly read from the cache without going to the persistent store.
I understand one can use Redis to do this , but what are some other options? Kafka?
https://stackoverflow.com/questions/67362727/write-to-cache-using-pyspark-that-is-shared-with-other-pyspark-processes May 03, 2021 at 10:05AM
没有评论:
发表评论