Support Questions
Find answers, ask questions, and share your expertise

Hive Queries using Spark

Highlighted

Hive Queries using Spark

New Contributor

ss3.pngss4.pngss7.pngss8.pngHi guys,

I am working on geolocation.

I am joining two hive table and trying to save the result into another hive table using spark. Its creating table but not loading data.

Please refer the code below:

import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.hive._
import sqlContext.implicits._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.DataFrameWriter
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkConf



val sqlContext= new org.apache.spark.sql.hive.HiveContext(sc)

val ipsdf1 = sqlContext.sql("select LogID,Date_key,SourceIP from ods.biods_fnc_auditlog where date_key = 20150528 ")
val countriesWithIp1 = sqlContext.sql("select ip_from, ip_to, country_code from breytendb.test3")

def ipToLongUDF = udf(
(ip: String) => {
val patternIPv4 = """\s*\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s*""".r
ip match {
case patternIPv4() => ip.split("\\.").reverse.zipWithIndex.map(
a => a._1.toInt * math.pow(256, a._2).toLong
).sum

case _ => -1L
}
}

)

val df11 = ipsdf1.withColumn("IPinteger",ipToLongUDF($"SourceIP"))
val df21 = countriesWithIp1.withColumn("ipint_from",ipToLongUDF($"ip_from"))
val df31 = df21.withColumn("ipint_to",ipToLongUDF($"ip_to"))

df11.write.mode("overwrite").saveAsTable("mamtadb.temp_table5")

df31.write.mode("overwrite").saveAsTable("mamtadb.temp_table6")


val op = sqlContext.sql("create table mamtadb.temp_table7 row format delimited fields terminated by ',' stored as textfile as select t.Date_key, t.IPinteger, t1.ipint_from, t1.ipint_to, t.LogID, t1.country_code from mamtadb.temp_table5 t left join mamtadb.temp_table6 t1 where t.IPinteger >= t1.ipint_from and t.IPinteger <= t1.ipint_to")

2 REPLIES 2
Highlighted

Re: Hive Queries using Spark

@Mamta Singh Do you see any errors after running the CTAS statement ?

Highlighted

Re: Hive Queries using Spark

New Contributor

No, it's not giving me any error. It just keeps running.It's creating temp_table5 and temp_table6 and also loading data properly. It's giving me join results on zeppelin but when trying to save this result in a hive table, that's where the problem starts, it's creating table but not loading data.