Member since
08-29-2016
64
Posts
8
Kudos Received
0
Solutions
12-16-2018
05:01 PM
My I have dataframe below Df =spark.read.json("myjson.json",mutiLine=True) Df.show() Am getting below output Body| offset| {'name':'swathi'} |2345| How I can extract json column from dataframe Make it as another dataframe like below I need results like below in pyspark Name |offset Swathi |2345 Please help me out how to do
... View more
11-24-2018
09:02 AM
Hi SOURYGNA , Am using sqooo parellel import code which is submitted in GitHub, In that code am able to pass only three parameters -o orig server and hive database and source dabaee and table names txt file if am passing extra parameters -u username and password -pp usage help it is showing usage -h can you help me regarding this am exceuting below command it is working This wiki page describes the script sqoopTables.sh (GitHub: https://github.com/sourygnahtw/hadoopUtils/blob/master/scripts/sqoop/sqoopTables.sh) as per the git hub code bash sqoop2.sh -d dholding -H mydatbase -o 102.05.61.106:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt [Fri Nov 23 02:29:11 EST 2018] Creating the table sourcedatabse.employee from the SQLserver table sourcedatbase.employee but if am passing variable of username and password with options it is throwing below error please help how to resolve this below error bash sqoop2.sh -u trainer1 -pp hsgh21234 -d sourcedatabse -H mydatbase -o 102.05.01.146:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt 1usage: sqoop2.sh [-b <report directory>] [-c <directory for java code>] [-u <username>] [-pp <password>] [-d <source database] [-H <hive database>] [-o <source server>] [-p <parallelism>] [-q <queue>] <fileName> usage: sqoop2.sh -h the file must contain on each line the name of a table you want to sqoop. If you want to sqoop 5 tables, you need 5 lines in your file thanks in advance Swathi.T
... View more
Labels:
11-23-2018
08:11 AM
if am exceuting below command it is working This wiki page describes the script sqoopTables.sh (GitHub: https://github.com/sourygnahtw/hadoopUtils/blob/master/scripts/sqoop/sqoopTables.sh) as per the git hub code bash sqoop2.sh -d dholding -H mydatbase -o 102.05.61.106:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt [Fri Nov 23 02:29:11 EST 2018] Creating the table sourcedatabse.employee from the SQLserver table sourcedatbase.employee but if am passing variable of username and password with options it is throwing below error please help how to resolve this below error bash sqoop2.sh -u trainer1 -pp hsgh21234 -d sourcedatabse -H mydatbase -o 102.05.01.146:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt 1usage: sqoop2.sh [-b <report directory>] [-c <directory for java code>] [-u <username>] [-pp <password>] [-d <source database] [-H <hive database>] [-o <source server>] [-p <parallelism>] [-q <queue>] <fileName> usage: sqoop2.sh -h the file must contain on each line the name of a table you want to sqoop. If you want to sqoop 5 tables, you need 5 lines in your file thanks in advance Swathi.T
... View more
Labels:
11-23-2018
07:36 AM
if am exceuting below command it is working bash sqoop2.sh -d dholding -H mydatbase -o 102.05.61.106:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt [Fri Nov 23 02:29:11 EST 2018] Creating the table sourcedatabse.employee from the SQLserver table sourcedatbase.employee but if am passing variable of username and password with options it is throwing below error please help how to resolve this below error bash sqoop2.sh -u trainer1 -pp hsgh21234 -d sourcedatabse -H mydatbase -o 102.05.01.146:0006 -p 1 -q etl /home/trainer1/703226740/tablenames.txt
1usage: sqoop2.sh [-b <report directory>] [-c <directory for java code>] [-u <username>] [-pp <password>] [-d <source database] [-H <hive database>] [-o <source server>] [-p <parallelism>] [-q <queue>] <fileName> usage: sqoop2.sh -h
the file must contain on each line the name of a table you want to sqoop. If you want to sqoop 5 tables, you need 5 lines in your file
... View more
11-15-2018
03:15 PM
Hi all, Please kindly help me with example how create dynamic folder for each and every load azure data factory Thanks in advance, Swathi.T
... View more
05-24-2018
07:33 AM
I have table partitions as shown in below data/fact_table_name/2017/09 data/fact_table_name/2017/9 i need to combine 9 partition with 09 whole data should be in one partition data/fact_table_name/2017/09 using Hive please help me out how to resolve it Thanks in advance swathi.T
... View more
04-03-2018
02:51 AM
Can you provide example code for this for this just again I need to store same data in same form by splitting data into country wise storing in different ccountry folder with country specific data using spark scala
... View more
03-31-2018
03:18 AM
I have data like this below ; delimited file without using CSV and databricks packages Lghhjj^country:US;name:swathi;age:;Dept:engineer Ghahjah^country:India;name:shshsh;age:48;Dept:management How to read this type file and also I need to search string distinct country collect all rows different country and save into different folder Us country rows into one folder or file India country rows into one folder or file Save as same format ; key and value without any modification on data Please help me out how to do this
... View more
Labels:
03-29-2018
08:40 PM
without using csv packages i need to do save file with data with header
by using my code it is loading without header
can you help me out from this issue
import sqlContext._
import org.apache.spark.sql._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType};
val rdd1withheader = sc.textFile("klksdk.txt")
val rdd2 = rdd1withheader.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1) else iter }
val rdd3 = rdd1withheader.map(_.split("\\|",-1))
val headerColumns = rdd1withheader.first()
val rdd4 = rdd2.map(p=>Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6), p(7), p(8), p(9), p(10), p(11), p(12), p(13), p(14), p(15), p(16), p(17), p(18), p(19), p(20), p(21), p(22), p(23), p(24), p(25), p(26), p(27), p(28), p(29), p(30), p(31), p(32), p(33), p(34), p(35), p(36), p(37), p(38), p(39), p(40), p(41), p(42), p(43), p(44), p(45), p(46), p(47), p(48), p(49), p(50), p(51), p(52), p(53), p(54), p(55), p(56), p(57), p(58), p(59), p(60), p(61), p(62), p(63), p(64), p(65), p(66), p(67), p(68), p(69), p(70), p(71), p(72), p(73), p(74), p(75), p(76), p(77), p(78), p(79), p(80), p(81), p(82), p(83), p(84), p(85), p(86), p(87), p(88), p(89), p(90), p(91), p(92), p(93), p(94), p(95), p(96), p(97), p(98), p(99), p(100), p(101), p(102), p(103), p(104), p(105), p(106), p(107), p(108), p(109), p(110), p(111), p(112), p(113), p(114), p(115), p(116), p(117), p(118), p(119), p(120), p(121), p(122), p(123), p(124), p(125), p(126), p(127), p(128), p(129), p(130), p(131), p(132), p(133), p(134), p(135), p(136), p(137), p(138), p(139), p(140), p(141), p(142)))
val df1 = sqlContext.createDataFrame(rdd4,schema)
val df2 = df1.registerTempTable("liveramp_brandcode")
val brandcode1 = sqlContext.sql("select * from liveramp_brandcode where brand_code='OAP'").toDF(headerColumn:_*)
brandcode1.map(_.mkString("|")).coalesce(1).saveAsTextFile("/user_match/Lbnnnn_OAP.txt")
Thanks in advance
swathi.T
... View more
Labels:
03-29-2018
04:44 AM
input data input.txt
... View more
03-29-2018
04:36 AM
input.data ef47cd52f7ed4044148ab7b1cc897f55|TEST_F1|TEST_L1|7109 Romford Way||North Richland Hills|TX|76182-5027|5027|test1498@yahoo.com|||||MNY|USA|1989||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||N|N|N|N|N|N||||||||||||||||||||||||||||||||||||||||||||||||||||||| 556510f9cea2e32260eb913e976b7ef0|TEST_F2|TEST_L2|11 South Rd||Chester|NJ|07930-2739|2739|test@embarqmail.com|||||OAP|USA|1964|||||Female||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 91daac14d56047c45cb27227b46b8074|TEST_F3|TEST_L3|1724 Holly Ln||Pampa|TX|79065-4536|4536|test@sbcglobal.net|||||OAP|USA|1941|||||Female|||||SKINTONE_LIGHT|||||||||||||||||||||EYECOLOR_BLUE|||||HAIRCOLOR_AUBURN|||||||||||||||||||||||||||EN|||N|Y|N|N|N||||INT_HAIR_GREY_COVERAGE,INT_HAIR_TRENDS||||||||||||||||||||||||||||||||||||||||||||||||||||4536|4536|test@sbcglobal.net|||||OAP|USA|1941|||||Female|||||SKINTONE_LIGHT|||||||||||||||||||||EYECOLOR_BLUE|||||HAIRCOLOR_AUBURN|||||||||||||||||||||||||||EN|||N|Y|N|N|N||||INT_HAIR_GREY_COVERAGE,INT_HAIR_TRENDS||||||||||||||||||||||||||||||||||||||||||||||||||||
... View more
03-29-2018
04:34 AM
val rdd2 = rddwithheader.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1) else iter } val rowrdd3 = rdd2.map(_.split("\\|")).map(p=>schema(p(0),p(1), p(2), p(3), p(4), p(5), p(6), p(7), p(8), p(9), p(10), p(11), p(12), p(13), p(14), p(15), p(16), p(17), p(18), p(19), p(20), p(21), p(22), p(23), p(24), p(25), p(26), p(27), p(28), p(29), p(30), p(31), p(32), p(33), p(34), p(35), p(36), p(37), p(38), p(39), p(40), p(41), p(42), p(43), p(44), p(45), p(46), p(47), p(48), p(49), p(50), p(51), p(52), p(53), p(54), p(55), p(56), p(57), p(58), p(59), p(60), p(61), p(62), p(63), p(64), p(65), p(66), p(67), p(68), p(69), p(70), p(71), p(72), p(73), p(74), p(75), p(76), p(77), p(78), p(79), p(80), p(81), p(82), p(83), p(84), p(85), p(86), p(87), p(88), p(89), p(90), p(91), p(92), p(93), p(94), p(95), p(96), p(97), p(98), p(99), p(100), p(101), p(102), p(103), p(104), p(105), p(106), p(107), p(108), p(109), p(110), p(111), p(112), p(113), p(114), p(115), p(116), p(117), p(118), p(119), p(120), p(121), p(122), p(123), p(124), p(125), p(126), p(127), p(128), p(129), p(130), p(131), p(132), p(133), p(134), p(135), p(136), p(137), p(138), p(139), p(140), p(141), p(142))) error: overloaded method value apply with alternatives: (fieldIndex: Int)org.apache.spark.sql.types.StructField <and> (names: Set[String])org.apache.spark.sql.types.StructType <and> (name: String)org.apache.spark.sql.types.StructField other code shu wat u gave overloaded method value apply with alternatives: exception how to handle this kindly help on this thanks in advance
... View more
03-29-2018
02:28 AM
I did but if my data have null values while loading my data into rdd this showing arrayoutof bound exception 88 That data has 142 fields some null values inside that file how I can hanlde
... View more
03-28-2018
09:47 AM
val rdd2 = rdd1.map(_.split("^")) rdd2.collect res16: Array[Array[String]] = Array(Array(OAP^US^xxahggv), Array(MNY^US^sfskdgsjkg), Array(ESS^US^fxjshgg)) it is not split well is the issue i not getting can show me the syntax am not able to find thanks in advance
... View more
03-27-2018
08:14 AM
please help out using spark scala how to solve this problem this task assigned to me thanks in advance swathi.T
... View more
03-27-2018
08:11 AM
1 Kudo
i have csv file example with schema test.csv name,age,state swathi,23,us srivani,24,UK ram,25,London sravan,30,UK we need to split into different files according to state US state data should be loaded into (with schema) output /user/data/US.txt name,age,state swathi,23,us /user/data/UK name,age,state srivani,24,UK sravan,30,UK /user/data/London name,age,state ram,25,London
... View more
Labels:
11-09-2017
02:48 PM
event csv file looks like this |display_id| uuid|document_id|timestamp|platformgeo_location| 1|cb8c55702adb93| 379743| 61| 3|
| 2|79a85fa78311b9| 1794259| 81| 2|
| 3|822932ce3d8757| 1179111| 182| 2|
| 4|85281d0a49f7ac| 1777797| 234| 2|
| This my code spark scala code import org.joda.time._ case class flight(display_id: Int ,uuid:String, document_id :Int, timestamp:String, platformgeo_location:String) valstreamdf=sc.textFile("/FileStore/tables/y6ak4fzq1504260076447/events.csv").map(_.split(",")).map(x=>flight(x(0).toInt,x(1).toString,x(2).toInt,x(3).toString,x(4).toString)).toDF() streamdf.show() streamdf.registerTempTable("event1") val result = sqlContext.sql("select * from event1 limit 10") val addP = (p: Int) => udf( (x: Int) => x + p )
val stamp = streamdf.withColumn("timestamp", addP(1465876799)($"timestamp")).toDF() stamp.show()
stamp.registerTempTable("stamp") new org.joda.time.DateTime(1465876799*1000) val df = sqlContext.sql("select from_unixtime(timestamp,'YYYY-MM-dd') as 'ts' from stamp") when am exceuting last command which is in bold val df type mismatch error am getting how to resolve this problem please help me out Thanks in advance swathi.T
... View more
Labels:
09-18-2017
11:31 AM
am having csv file data like this as shown below example
1,"Air Transport
International, LLC",example,city i have to load this data in hive like this as shown below 1,Air Transport InternationalLLC,example,city but actually am getting like below?? 1,Air Transport International, LLC,example,city how to solve this problem please give me solution thanks with regards, swathi.T
... View more
- Tags:
- Data Processing
- Hive
Labels:
07-25-2017
01:34 PM
Explanation: Mycode: ############# import org.joda.time.{DateTimeZone}
import org.joda.time.format.DateTimeFormat case class datecheck(Date:String,ID:Int)
val daterdd =sc.textFile("/FileStore/tables/m67ki3931500986208915/Datetest.csv").map(_.split(",")).map(x=>datecheck(x(0),x(1).toInt)).toDF() daterdd.registerTempTable("datecheck") val datedf = sqlContext.sql("select ID,(from_unixtime(Date, 'dd-MM-yyyy')) as dateformat from datecheck order by ID desc")
datedf.show() ################# Problem statement: After performing date format operation in spark sql am getting output dataframe date column as null how to get output as date please kindly solve this problem Thanks in advance swathi.T
... View more
Labels:
07-24-2017
12:10 PM
step1: Am having one csv file having schema and columns customer_id, reviews,date,product id loading into pig variable step2: product_id is there but product category is not present in the csv file how to create product category based on reviews coulmn exsisted csv problem statement: example in on reviews column having this type of text :product is dog biscuits was so nice my dog was healthy looking so good which belongs to "pet food" product category please send me sample example code please help me out how to create product category using text reviews thanks in advance swathi.T
... View more
- Tags:
- Pig
Labels:
07-11-2017
05:24 AM
{((["plan_type":"RATEPLAN_TYPE"],["plan_code":"RATEPLAN_CODE"],["market_code":"RATEPLAN_MARKETCODE"]))}
b = FOREACH a GENERATE TOBAG(f1,f2,f3);
DUMP b;
output:
({("plan_type":"rateplantype"),("plan_code":"ratecode"),("market_code":"mrktcode")})
but i need below output all multiple maps into single map put into single bag
[{"plan_type":"rateplantype","plan_code":"ratecode","market_code":"mrktcode"}]
can share sample code
Thanks in advance
swathi.T
... View more
07-10-2017
10:42 AM
Hi friends,
In python how to control configuration files am having two configuration file one is by default configuration another real time config file if the real time config file having some object details some defined as empty if any defined values are empty it should take from default configure json file otherwise it not empty should take from current config json file?? step1: am having on config json file as shown below: [{
"parser_config": {
"vista": {
"Parser_info": {
"module_name": "A",
"class_name": "a",
"standardformat1": {
"transaction": "filename",
"terminal": "filenme2",
"session": "filename3"
}
}
},
"Agilis": {
"Parser_info": {
"module_name": "B",
"class_name": "b",
"standardformat1": {
"transaction": "filename",
"terminal": "filenme2",
"session": "filename3"
}
}
}
}, "merger_processor_config": {
"Merger": {
"standardformat1": {
"vista": {
"file_names": ["transaction", "terminal"],
"parser_output_join_columns": ["TERMINALID"]
},
"commander": {
"file_names": ["transaction", "terminal"],
"parser_output_join_columns": ["TERMINALID"]
}
},
"source_merge_join_columns": [
"TERMINALID",
"BUSINESSDATE",
"TXNSEQNO"
],
"input_path": "/tenantShortName/yyyyMMdd/",
"primary_datatype": "Vista",
"module_name": "A",
"class_name": "a",
"merged_output": "true",
"merger_output_filepath": "/customer1/MergerOutput/"
},
"Processor": {
"terminal_transaction_fact_processor": {
"module_name": "nme1",
"class_name": "nme2",
"usecase_processor": {
"usecase1": {
"module_name": "nme1",
"class_name": "nme2"
}
}
}
}
},
"path_details": {
"parent_path": "wasbs://XXXX@XXXXXstorage.blob.core.windows.net/tenantShortName/"
},
"db_details": {
"datawarehouse_url": "",
"datawarehouse_username": "",
"datawarehouse_password": ""
}
}] step2: am calling that configuration file objects here import json from pprint import pprint with open('Config.json') as data_file: data = json.load(data_file) #val = data["parser_config"] for val in data: print"parser_config =====",val["parser_config"]["vista"] step3: If am having another swathi_configure json file in that file am having few values of objects as shown in below [{
"parser_config": {
"vista": {
"Parser_info": {
"module_name": "A",
"class_name": "a",
"standardformat1": {
"transaction": "filename",
"terminal": "filenme2",
"session": "filename3"
}
}
}, "path_details": { "parent_path": "wasbs://XXXX@XXXXXstorage.blob.core.windows.net/tenantShortName/" }, "db_details": { "datawarehouse_url": "", "datawarehouse_username": "", "datawarehouse_password": "" } }]
step4: In the config file having only three objects parser_config,path_details,db_details remaining objects are not existed not existed fields to be from default config file if any objects not existed in current config file it should go for default config file and collects like a trace back. If the current config file having objects it should take current of objects in from current config json file how to write a code to control on config.json& swathi_configure.json please help today i need to complete this task give me how to solve this problem
Thanks in advance swathi.tukkaraju
... View more
- Tags:
- python
07-10-2017
06:09 AM
1 Kudo
Hi friends, am having pig output like this as show in below actual output:{((["plan_type":"RATEPLAN_TYPE"],["plan_code":"RATEPLAN_CODE"],["market_code":"RATEPLAN_MARKETCODE"]))} I want output as below: [{"plan_type":"rateplantype","plan_code":"ratecode","market_code":"mrktcode"}] how to get the above output please help me to solve this problem thanks in advance swathi.tukkaraju
... View more
- Tags:
- Pig
Labels:
04-05-2017
01:07 PM
sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
before executing this query i have to use try catch block if table doesn't exists it should say throw exception while executing spark action
please help me how to do this ??
Thanks in advances
swathi
... View more
Labels:
03-31-2017
09:55 AM
please help me out this problem thanks in advance swathi tukkaraju
... View more
Labels:
01-05-2017
06:47 PM
thanks clukasik, i solved and executed with where and query approach Eg: sqoop import --connect $connectors//$origServer/$origDatabase --driver $drivers --username $username --password $password --query "select a.* from $origTable a where CAST(ts as DATE)>='$startdate' and CAST(ts as DATE)<='$enddate' AND \$CONDITIONS" --hive-import --hive-database $hiveDatabase --hive-table $myTable -m 1 --fields-terminated-by '\t' --incremental lastmodified --check-column ts --merge-key id --target-dir $targetTmpHdfsDir/$myTable thanks with regards, swathi.T
... View more
01-02-2017
07:14 PM
please help me how to access HDFS &b hive tables in (Horton works) Spark environment using python code how Acess the data sets into Spark i need step by step in details please help me how to do this scenario i need to finish this task by tomorrow please help me thanks with regards swathi.T
... View more
Labels:
12-14-2016
09:45 AM
To load data from finacle infosys database to HDP 2.4 hive /H- base database
... View more