Wednesday, August 14, 2019

Connect to Oracle DB Using Spark/Scala

Form a jdbc connection string
val jdbcHostname = "10.21.31.41"
val jdbcPort = 1521
val jdbcDatabase = "dbname" 
val jdbcUsername = "user1"
val jdbcPassword = password1 
val driverClass = "oracle.jdbc.driver.OracleDriver"
val jdbcUrl = s"jdbc:oracle:thin:@$jdbcHostname:$jdbcPort/$jdbcDatabase"

Set Up connection properties
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"$jdbcUsername")
connectionProperties.put("password", s"$jdbcPassword")
connectionProperties.setProperty("Driver", driverClass)
connectionProperties.put("fetchsize", "100000")

Form a query and create a spark data frame
val query = "(select * from table1)"
val df = spark.read.jdbc(jdbcUrl, query, connectionProperties)

Note: 


  1. Ensure that the oracle jar is included in the spark-shell or spark-submit commands to invoke the driverClass to make a connection
  2. More details on the fetchsize parameter which improves the data retrieval performance https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm


No comments:

Post a Comment