Colab安装Pyspark


Colab安装Pyspark

1. 安装pyspark和java

!pip install pyspark
!pip install -U -q PyDrive
!apt install openjdk-8-jdk-headless -qq
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"

2. 初始化SparkSession

from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf



# create the session
conf = SparkConf().set("spark.ui.port", "4050")

# create the context
sc = SparkContext(conf=conf)
spark = SparkSession.builder.getOrCreate()

3. 使用SparkSession和sc

rdd = sc.parallelize([("LiLei",15,88),("HanMeiMei",16,90),("DaChui",17,60)])
df = rdd.toDF(["name","age","score"])
df.show()