0. 安装PyCharm和spark
下载pycharm http://www.jetbrains.com/pycharm/
下载spark http://spark.apache.org/
ps:在安装pycharm前系统需要有java环境
1.安装py4j
$ sudo pip install py4j
2.配置pycharm
在Run/Debug Configurations中 如下图配置
然后就可以在pycharm中运行pyspark的程序了
测试一下:
- from pyspark import SparkContext
- sc = SparkContext()
- logData = sc.textFile("README.md").cache()
- numAs = logData.filter(lambda s: 'a' in s).count()
- numBs = logData.filter(lambda s: 'b' in s).count()
- print("Lines with a: %i,lines with b: %i" % (numAs,numBs))
运行结果