Wednesday, July 10, 2019

Google BigQuery Connection From a Spark-Shell Environment


Set the environment variable as below pointing to the json credential file.

export GOOGLE_APPLICATION_CREDENTIALS="<path>/bigquery-auth.json"

The latest version of the jar (spark-bigquery-latest.jar) is available in the google cloud storage.
Download the same from this link and include the same in the spark shell.

Launch the spark-shell and try few commands as shown below.

spark-shell --master local[*] --jars <path>/bigquery_spark-bigquery-latest.jar 

scala> import com.google.cloud.spark.bigquery._

scala> val df = spark.read.bigquery("<project_id>:<dataset_id>.<table_name>")

scala> df.count

scala> df.write.mode("Overwrite").format("com.databricks.spark.avro").save("adl:///testpath/bigquerydata")


The link below has few more details explained about setting up the connection
https://github.com/GoogleCloudPlatform/spark-bigquery-connector

No comments:

Post a Comment