2021年4月22日星期四

What is the difference between PySpark and Spark?

I am asking a question very similar to this SO question on pyspark and spark This answer explains that the pyspark installation does have spark in it. What happens when I do this through Anaconda ? And, is there any other way to run this in PyCharm ? Because, my jupyter notebooks run well with this.

I am very confused about Spark and Pyspark starting right from the installation.

I understand that PySpark is a wrapper to write scalable spark scripts using python. All I did was through anaconda, I installed it.

conda install pyspark. I could import it in the script.

But, while I try to run scripts through PyCharm, these warnings come up and the code just stays as is, not stopped though.

Missing Python executable 'C:\Users\user\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Python 3.9', defaulting to 'C:\Users\user\AppData\Local\Programs\Python\Python39\Lib\site-packages\pyspark\bin\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.

It clearly tells that these environment variables need to be set

There are a lot of resources on installing Spark and I went through many and followed this:

I just don't understand the link between all of this. This may be a very trivial question, but I am just feeling so helpless.

Thanks.

https://stackoverflow.com/questions/67222999/what-is-the-difference-between-pyspark-and-spark April 23, 2021 at 09:56AM

没有评论:

发表评论