Set the environment variable as below pointing to the json credential file.
export GOOGLE_APPLICATION_CREDENTIALS="<path>/bigquery-auth.json"
The latest version of the jar (spark-bigquery-latest.jar) is available in the google cloud storage.
Download the same from this link and include the same in the spark shell.
Launch the spark-shell and try few commands as shown below.
spark-shell --master local[*] --jars <path>/bigquery_spark-bigquery-latest.jar
scala> import com.google.cloud.spark.bigquery._
scala> val df = spark.read.bigquery("<project_id>:<dataset_id>.<table_name>")
scala> df.count
scala> df.write.mode("Overwrite").format("com.databricks.spark.avro").save("adl:///testpath/bigquerydata")
The link below has few more details explained about setting up the connection
https://github.com/GoogleCloudPlatform/spark-bigquery-connector
No comments:
Post a Comment