Tuesday, June 9, 2020

Send email using an SMTP host

Code snippet to send email via an SMTP Host - Run on a Unix terminal

{ sleep 5; 
  echo 'mail from:kiran@abc.com'; 
  sleep 3; 
  echo 'rcpt to:kiran.sumanam@abc.com'; 
  sleep 3; 
  echo 'data'; 
  sleep 3; 
  echo 'Subject: File Pull'; 
  sleep 2; 
  echo 'File Pull Completed'; 
  sleep 2; 
  echo '.'; 
  sleep 3; 
  echo 'quit'; 
} | telnet 10.2.3.4 25

Connect to SFTP Server - Python Solution

SFTP Connection Notes:

  • Below code connects to a sftp server. 
  • It needs a private key. The private key should be in PEM format. 
  • If a PPK format key is present, covert it to PEM format using Putty tool.
  • The Private key has a passphrase associated with it. 
  • Two level authentication with password and private key (along with passphrase) might be present in some cases. 
  • It is observed that, though password is not given the python code authenticates successfully using the private key and passphrase itself.
  • Using WINSCP - the two way authentication was successful only if passphrase and password are given. It is a surprise and to be researched still, why python script is able to bypass and make a successful connection with only privatekey + passphrase.

Windows based code:
import pysftp as sftp
cnopts = sftp.CnOpts()
cnopts.hostkeys = None

myHostname = "secureftp.abc.com"
myUsername = "username"
myPassword = "password"
myPort = 22
myPriKey="C:\test\Privatekey.pem"
myPriKeyPass="passphrase"

s = sftp.Connection(host=myHostname, username=myUsername, port=myPort, private_key=myPriKey, private_key_pass=myPriKeyPass,cnopts=cnopts)

remotedir="./remote_dir_name/"
localdir="C:\\Users\\Test\\Downloads\\"
dir = s.listdir(remotedir+"*csv")
for a in dir:
    print(a)
    s.get(remotedir+a)
s.close()

Unix based code:
#Code to pull the files from zobble sftp to edgenode local directory
import datetime
def dt_tm():
    dttm=datetime.datetime.now().strftime("%x %X")
    return dttm

import pysftp as sftp
cnopts = sftp.CnOpts()
cnopts.hostkeys = None

myHostname = "secureftp.abc.com"
myUsername = "username"
myPassword = "password"
myPort = 22
myPriKey="/home/tes/keys/privatekey.pem"
myPriKeyPass="passphrase"

s = sftp.Connection(host=myHostname, username=myUsername, port=myPort, private_key=myPriKey, private_key_pass=myPriKeyPass,cnopts=cnopts)

remotedir="./remote_dir_name/"
myLocalDir="/home/AZFRK/ksumanam/indiasales/ZobbleFiles/"
dir = s.listdir(remotedir+"*csv")
i=0
for a in dir:
    print(a)
    i+=1
    s.get(remotedir+a,myLocalDir+a)
print(dt_tm(), 'copied files from sftp server. Number of files copied :',i)

s.close()

Wednesday, May 20, 2020

Connect to SFTP server - Scala Solution


spark-shell --packages com.springml:spark-sftp_2.11:1.1.3

import org.apache.commons.net.ftp.FTPClient
import org.apache.commons.net.ftp.FTPReply
import org.apache.commons.net.ftp.FTPSClient
import com.jcraft.jsch._

val jsch = new JSch()
var session = jsch.getSession("test","securetransfertest.com",22)
session.setPassword("test123")

var config = new java.util.Properties()
config.put("StrictHostKeyChecking", "no")
session.setConfig(config)

session.connect()

var channel = session.openChannel("sftp").asInstanceOf[ChannelSftp]
channel.connect()
println("Directory:" + channel.pwd())  // This prints the present working directory

session.disconnect()

//jsch.addIdentity(privatekeyfile);

Further research need to be done to put/get files from sftp server.

Monday, May 11, 2020

Connect to Hive with Kerberos Set up from a Local PC


The below approach can be used to connect to a hive instance with kerberos authentication enabled.
One can use tools like DBVisualizer , DBeaver etc.

Check Java Version:-

C:\Program Files\DbVisualizer\jre\bin>.\java.exe -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)

Use "kinit" to generate kerberos ticket

C:\Program Files\DbVisualizer\jre\bin>.\kinit ksumanam@ABC.COM
Password for ksumanam@ABC.COM:
New ticket is stored in cache file C:\Users\ksumanam\krb5cc_ksumanam

Use "klist" to check the ticket validity
C:\Program Files\DbVisualizer\jre\bin>klist
Credentials cache: C:\Users\ksumanam\krb5cc_ksumanam
Default principal: ksumanam@ABC.COM, 1 entry found.

[1]  Service Principal:  krbtgt/ABC.COM@ABC.COM
     Valid starting:     May 11, 2020 09:53:43
     Expires:            May 11, 2020 19:53:43

If Using DbVisualizer - Add the below (If other tools add the below in the relevant .ini files)

Click on Tools - Tool Properties and add below details. In box Specify overridden JAVA VM properties here
-Dsun.security.krb5.debug=true
-Djavax.security.auth.useSubjectCredsOnly=false
-Djava.security.krb5.conf=C:\Users\ksumanam\krb5.ini  -- See below for file content.

Sample krb5.ini File (This is the same file krb5.conf copied from Unix)

[libdefaults]
 renew_lifetime = 7d
 forwardable = true
 default_realm = ABC.COM
 ticket_lifetime = 24h
 dns_lookup_realm = false
 dns_lookup_kdc = false
 udp_preference_limit = 1
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

[domain_realm]
abc.com = ABC.COM
.abc.com = ABC.COM

[logging]
default = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
kdc = FILE:/var/log/krb5kdc.log

[realms]
ABC.COM = {
  admin_server = abc.com
  kdc = abc.com
}

Create a new connection on DbVisualizer(One might need the below details)
Database server = hn1-ahd701.abc.com
Database Port = 10001
Database = default;principal=hive/_HOST@ABC.COM; transportMode=http

JDBC String:
jdbc:hive2://hn1-ahd703.abc.com:10001/dl_explr_india;principal=hive/_HOST@ABC.COM;transportMode=http

Example beeline jdbc URL
beeline -u 'jdbc:hive2://hn1-ahd701.abc.com:10001/default;principal=hive/_HOST@ABC.COM;auth=kerberos;transportMode=http;httpPath=cliservice'


Thursday, February 6, 2020

Convert xlsx to csv - Python Solutions

csvkit tool

csvkit is a python library optimized for working with CSV files. It is a nice tool to manipulate, organize, analyze and work with data, using the csv format. It is very light and fast. It is used through the terminal with its in2csv command which converts a variety of common file formats, including xls, xlsx and fixed-width into CSV format..

# pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org csvkit

Collecting csvkit
 Using cached csvkit-1.0.2.tar.gz
Collecting agate>=1.6.0 (from csvkit)
Now you can convert as below:

# in2csv Classeur2.xlsx > book3.csv

Shell script to convert all excel files in  a directory

for i in *.xlsx;
 do
  filename=$(basename $i .xlsx |tr '[:upper:]' '[:lower:]')
  outext=".csv" 
  in2csv $i > $filename$outext
  if [ $? -ne 0 ]
   then echo $i 
  fi
done


XLRD


pip install xlrd --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org

import xlrd
import csv

def csv_from_excel():
    wb = xlrd.open_workbook('excel.xlsx')
    sh = wb.sheet_by_name('Sheet1')
    your_csv_file = open('your_csv_file.csv', 'w')
    wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    your_csv_file.close()

# runs the csv_from_excel function:
csv_from_excel()


xlsx2csv 

# pip install xlsx2csv --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org
from xlsx2csv import Xlsx2csv
  xlsx2csv(r"C:\Users\ksumanam\Downloads\AMS_AssesmentQOptions\UserDetails.xlsx", outputencoding="utf-8").convert(r"C:\Users\ksumanam\Downloads\AMS_AssesmentQOptions\UserDetails1.csv")

Wednesday, August 14, 2019

Connect to Oracle DB Using Spark/Scala

Form a jdbc connection string
val jdbcHostname = "10.21.31.41"
val jdbcPort = 1521
val jdbcDatabase = "dbname" 
val jdbcUsername = "user1"
val jdbcPassword = password1 
val driverClass = "oracle.jdbc.driver.OracleDriver"
val jdbcUrl = s"jdbc:oracle:thin:@$jdbcHostname:$jdbcPort/$jdbcDatabase"

Set Up connection properties
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"$jdbcUsername")
connectionProperties.put("password", s"$jdbcPassword")
connectionProperties.setProperty("Driver", driverClass)
connectionProperties.put("fetchsize", "100000")

Form a query and create a spark data frame
val query = "(select * from table1)"
val df = spark.read.jdbc(jdbcUrl, query, connectionProperties)

Note: 


  1. Ensure that the oracle jar is included in the spark-shell or spark-submit commands to invoke the driverClass to make a connection
  2. More details on the fetchsize parameter which improves the data retrieval performance https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm


Tuesday, August 13, 2019

Use JCEKS - Avoid Clear text passwords

The below code snippets creates a JCEKS credential store where passwords can be saved and the credential store can be used in sqoop jobs, hadoop/spark code to refer to the passwords.

Create a Credential store

:~$ hadoop credential create user1.dbname.alias -value password1 -provider jceks:///expl/dl_explr/lib/dbname.jceks
user1.dbname.alias has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

:~$ hadoop credential create user2.dbname.alias -value password2 -provider jceks:///expl/dl_explr/lib/dbname.jceks

user2.dbname.alias has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

List entries in the credential store. It will display the alias names (as opposed to the actual password)

:~$ hadoop credential list -provider jceks:///expl/dl_explr/lib/dbname.jceks
Listing aliases for CredentialProvider: jceks:///expl/dl_explr/lib/dbname.jceks
user2.dbname.alias
user1.dbname.alias

One can delete entries from the credential store.

:~$ hadoop credential delete user1.dbname.alias -provider jceks:///expl/dl_explr/lib/dbname.jceks
You are about to DELETE the credential user1.dbname.alias from CredentialProvider jceks:///expl/dl_explr/lib/dbname.jceks. Continue?  (Y or N) Y
Deleting credential: user1.dbname.alias from CredentialProvider: jceks:///expl/dl_explr/lib/dbname.jceks
user1.dbname.alias has been successfully deleted.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

Read Values from credential store

scala> import org.apache.hadoop.security.alias.CredentialProviderFactory
import org.apache.hadoop.security.alias.CredentialProviderFactory

scala> val conf = new org.apache.hadoop.conf.Configuration()

conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml

scala> val alias = "user1.dbname.alias"

alias: String = user1.dbname.alias

scala> val jceksPath = "jceks:///expl/dl_explr/lib/dbname.jceks"

jceksPath: String = jceks:///expl/dl_explr/lib/dbname.jceks

scala> conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH, jceksPath)


//getPassword Returns Array[Char]

scala> val password = conf.getPassword(alias).mkString
password: String = password1