Member since
03-23-2017
14
Posts
3
Kudos Received
0
Solutions
04-11-2019
02:45 AM
Dear Folks, Note: It's an urgent requirement, your suggestions will be appreciated. Here my requirement, it's a batch job. For every run, I have to load five hive tables. I created separate dataframe objects for all five tables and calling it inside the main function. Here am using flags to get the user input and starts running the job. Below code works fine for two tables. If I pass the argument "all", five tables got loaded without issue. sometimes on an ad-hoc basis, I may need to load two tables or three tables based on requirement. How can I achieve in my code? Eg: For today run's, I need to load only three tables. I passed the table names as arguments while submitting the job spark-submit --class .. --master yarn eimreporting tableA tableB tableC code: ----
object Medinsight_Main {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Eim_Reporting")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
try {
if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "all") {
eiminsight_claim_agg.Transform(sqlContext) --> calling TableA object
eiminsight_member.Transform(sqlContext) ---> calling TableB object
}
else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableA") {
eiminsight_claim_agg.Transform(sqlContext) ---> calling TableA object
}
else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableB") {
eiminsight_member.Transform(sqlContext) ---> calling TableB object
}
else {
System.out.println("No argumnts"
}
}
catch{
exception
}
finally{
sc.stop()
}
... View more
Labels:
03-08-2018
02:16 PM
Dear Forum Folks, Need help to parse the Nested JSON in spark Dataframe. Here am pasting the sample JSON file. Your help would be appreciated. Please give an idea to parse the JSON file. { “meta” : { “view” : { “id” : “4mse-ku6q”, “name” : “Traffic Violations”, “averageRating” : 0, “category” : “Public Safety”, “createdAt” : 1403103517, “description” : “This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or the officer issuing the violation will not be published.\r\n\r\nUpdate Frequency: Daily”, “displayType” : “table”, “downloadCount” : 85018, “hideFromCatalog” : false, “hideFromDataJson” : false, “iconUrl” : “fileId:r41tDc239M1FL75LFwXFKzFCWqr8mzMeMTYXiA24USM”, “indexUpdatedAt” : 1519815131, “newBackend” : false, “numberOfComments” : 0, “oid” : 8890705, “provenance” : “official”, “publicationAppendEnabled” : false, “publicationDate” : 1411040702, “publicationGroup” : 1620779, “publicationStage” : “published”, “rowClass” : “”, “rowsUpdatedAt” : 1519813971, “rowsUpdatedBy” : “ajn4-zy65”, “tableId” : 1722160, “totalTimesRated” : 0, “viewCount” : 26908, “viewLastModified” : 1494270268, “viewType” : “tabular”, “columns” : [ { “id” : -1, “name” : “sid”, “dataTypeName” : “meta_data”, “fieldName” : “:sid”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “id”, “dataTypeName” : “meta_data”, “fieldName” : “:id”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “position”, “dataTypeName” : “meta_data”, “fieldName” : “:position”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “created_at”, “dataTypeName” : “meta_data”, “fieldName” : “:created_at”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “created_meta”, “dataTypeName” : “meta_data”, “fieldName” : “:created_meta”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “updated_at”, “dataTypeName” : “meta_data”, “fieldName” : “:updated_at”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “updated_meta”, “dataTypeName” : “meta_data”, “fieldName” : “:updated_meta”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : -1, “name” : “meta”, “dataTypeName” : “meta_data”, “fieldName” : “:meta”, “position” : 0, “renderTypeName” : “meta_data”, “format” : { }, “flags” : [ “hidden” ] }, { “id” : 167291006, “name” : “Date Of Stop”, “dataTypeName” : “calendar_date”, “description” : “Date of the traffic violation.”, “fieldName” : “date_of_stop”, “position” : 2, “renderTypeName” : “calendar_date”, “tableColumnId” : 20333361, “width” : 97, “cachedContents” : { “largest” : “2018-02-27T00:00:00”, “non_null” : 1251972, “null” : 0, “top” : [ { “item” : “2018-01-15T00:00:00”, “count” : 20 }, { “item” : “2018-01-16T00:00:00”, “count” : 19 }, { “item” : “2018-01-18T00:00:00”, “count” : 18 }, { “item” : “2018-01-19T00:00:00”, “count” : 17 }, { “item” : “2018-01-21T00:00:00”, “count” : 16 }, { “item” : “2018-01-22T00:00:00”, “count” : 15 }, { “item” : “2018-01-23T00:00:00”, “count” : 14 }, { “item” : “2018-01-24T00:00:00”, “count” : 13 }, { “item” : “2018-01-25T00:00:00”, “count” : 12 }, { “item” : “2018-01-26T00:00:00”, “count” : 11 }, { “item” : “2018-01-27T00:00:00”, “count” : 10 }, { “item” : “2018-01-29T00:00:00”, “count” : 9 }, { “item” : “2018-01-30T00:00:00”, “count” : 8 }, { “item” : “2018-01-31T00:00:00”, “count” : 7 }, { “item” : “2018-02-01T00:00:00”, “count” : 6 }, { “item” : “2018-02-02T00:00:00”, “count” : 5 }, { “item” : “2018-02-03T00:00:00”, “count” : 4 }, { “item” : “2018-02-05T00:00:00”, “count” : 3 }, { “item” : “2018-02-06T00:00:00”, “count” : 2 }, { “item” : “2018-02-07T00:00:00”, “count” : 1 } ], “smallest” : “2012-01-01T00:00:00” }, “format” : { “view” : “date”, “align” : “left” } }, { “id” : 167291007, “name” : “Time Of Stop”, “dataTypeName” : “text”, “description” : “Time of the traffic violation.”, “fieldName” : “time_of_stop”, “position” : 3, “renderTypeName” : “text”, “tableColumnId” : 20333363, “width” : 89, “cachedContents” : { “largest” : “23:59:00”, “non_null” : 1251972, “null” : 0, “top” : [ { “item” : “07:28:00”, “count” : 20 }, { “item” : “00:21:00”, “count” : 19 }, { “item” : “01:56:00”, “count” : 18 }, { “item” : “01:41:00”, “count” : 17 }, { “item” : “23:54:00”, “count” : 16 }, { “item” : “02:48:00”, “count” : 15 }, { “item” : “23:30:00”, “count” : 14 }, { “item” : “06:58:00”, “count” : 13 }, { “item” : “07:10:00”, “count” : 12 }, { “item” : “22:52:00”, “count” : 11 }, { “item” : “01:55:00”, “count” : 10 }, { “item” : “01:10:00”, “count” : 9 }, { “item” : “23:26:00”, “count” : 8 }, { “item” : “23:06:00”, “count” : 7 }, { “item” : “23:25:00”, “count” : 6 }, { “item” : “23:35:00”, “count” : 5 }, { “item” : “00:32:00”, “count” : 4 }, { “item” : “23:43:00”, “count” : 3 }, { “item” : “23:49:00”, “count” : 2 }, { “item” : “07:07:00”, “count” : 1 } ], “smallest” : “00:00:00” }, “format” : { “align” : “left” } }, { “id” : 167291040, “name” : “Geolocation”, “dataTypeName” : “location”, “description” : “Geo-coded location information.”, “fieldName” : “geolocation”, “position” : 36, “renderTypeName” : “location”, “tableColumnId” : 22014969, “width” : 100, “cachedContents” : { “largest” : { “latitude” : “39.0835433333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.152665” }, “non_null” : 1158178, “null” : 93794, “top” : [ { “item” : { “latitude” : “39.008325”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.049165” }, “count” : 20 }, { “item” : { “latitude” : “39.1400066666667”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.2062733333333” }, “count” : 19 }, { “item” : { “latitude” : “39.1453866666667”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.151475” }, “count” : 18 }, { “item” : { “latitude” : “39.0584283333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.0480016666667” }, “count” : 17 }, { “item” : { “latitude” : “39.064025”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.0948166666667” }, “count” : 16 }, { “item” : { “latitude” : “38.9868028333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.105421” }, “count” : 15 }, { “item” : { “latitude” : “39.041725”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.058525” }, “count” : 14 }, { “item” : { “latitude” : “39.1450033333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.2026783333333” }, “count” : 13 }, { “item” : { “latitude” : “39.15924”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.2197733333333” }, “count” : 12 }, { “item” : { “latitude” : “39.1497083333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.2087416666667” }, “count” : 11 }, { “item” : { “latitude” : “39.066465”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.062715” }, “count” : 10 }, { “item” : { “latitude” : “39.00684”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.04998” }, “count” : 9 }, { “item” : { “latitude” : “39.0072766666667”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.0492216666667” }, “count” : 8 }, { “item” : { “latitude” : “39.148245”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.2371266666667” }, “count” : 7 }, { “item” : { “latitude” : “39.12278”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.1640333333333” }, “count” : 6 }, { “item” : { “latitude” : “39.004515”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-76.980765” }, “count” : 5 }, { “item” : { “latitude” : “39.0051116666667”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-76.9967766666667” }, “count” : 4 }, { “item” : { “latitude” : “39.0934533333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.13202” }, “count” : 3 }, { “item” : { “latitude” : “39.09237”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.131295” }, “count” : 2 }, { “item” : { “latitude” : “39.1007”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.1412233333333” }, “count” : 1 } ], “smallest” : { “latitude” : “39.0835433333333”, “human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}", “longitude” : “-77.152665” } }, “format” : { “view” : “address_coords”, “align” : “left” }, “subColumnTypes” : [ “human_address”, “latitude”, “longitude”, “machine_address”, “needs_recoding” ] } ], “disabledFeatureFlags” : [ “allow_comments” ], “grants” : [ { “inherited” : false, “type” : “viewer”, “flags” : [ “public” ] } ], “metadata” : { “jsonQuery” : { “order” : [ { “ascending” : false, “columnFieldName” : “date_of_stop” } ] }, “rdfSubject” : “0”, “rdfClass” : “”, “custom_fields” : { “Dataset Information” : { “Departments” : “Police, Department of”, “Update Frequency” : “Daily” } }, “rowIdentifier” : 167291005, “availableDisplayTypes” : [ “table”, “fatrow”, “page” ], “rowLabel” : “”, “renderTypeConfig” : { “visible” : { “table” : true } } }, “owner” : { “id” : “ajn4-zy65”, “displayName” : “MCG ESB Service”, “screenName” : “MCG ESB Service”, “type” : “interactive”, “flags” : [ “organizationMember” ] }, “query” : { “orderBys” : [ { “ascending” : false, “expression” : { “columnId” : 167291006, “type” : “column” } } ] }, “rights” : [ “read” ], “tableAuthor” : { “id” : “ajn4-zy65”, “displayName” : “MCG ESB Service”, “screenName” : “MCG ESB Service”, “type” : “interactive”, “flags” : [ “organizationMember” ] }, “tags” : [ “traffic”, “stop”, “violations”, “electronic issued.” ], “flags” : [ “default”, “restorable”, “restorePossibleForType” ] } }, “data” : [ [ 2118167, “EE8BC302-660F-48C4-B422-17427ECE821F”, 2118167, 1482239054, “498050”, 1482239054, “498050”, null, “2013-09-24T00:00:00”, “17:11:00”, “MCP”, “3rd district, Silver Spring”, “DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION”, “8804 FLOWER AVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2008”, “FORD”, “4S”, “BLACK”, “Citation”, “13-401(h)”, “Transportation Article”, “No”, “BLACK”, “M”, “TAKOMA PARK”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 3064529, “9763533B-48A3-480A-BB54-1EF8C60849AF”, 3064529, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-29T00:00:00”, “10:19:00”, “MCP”, “2nd district, Bethesda”, “DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC CONTROL DEVICE INSTRUCTIONS”, “WISCONSIN AVE@ ELM ST”, “38.981725”, “-77.0927566666667”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “VA”, “02 - Automobile”, “2001”, “TOYOTA”, “COROLLA”, “GREEN”, “Citation”, “21-201(a1)”, “Transportation Article”, “No”, “WHITE”, “F”, “FAIRFAX STATION”, “VA”, “VA”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “38.981725”, “-77.0927566666667”, null, false ] ] , [ 2118171, “938F1DC7-DE76-45E8-AF25-361CBCC8507C”, 2118171, 1482239054, “498050”, 1482239054, “498050”, null, “2014-12-01T00:00:00”, “12:52:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE STOP AND YIELD AT THRU HWY”, “CHRISTOPHER AVE/MONTGOMERY VILLAGE AVE”, “39.1628883333333”, “-77.2290883333333”, “No”, “No”, “No”, “Yes”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2001”, “HONDA”, “ACCORD”, “SILVER”, “Citation”, “21-403(b)”, “Transportation Article”, “No”, “BLACK”, “F”, “UPPER MARLBORO”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.1628883333333”, “-77.2290883333333”, null, false ] ] , [ 3064530, “75858866-32C1-4E6B-8C01-322CD15F33EB”, 3064530, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-29T00:00:00”, “09:22:00”, “MCP”, “3rd district, Silver Spring”, “FAILURE YIELD RIGHT OF WAY ON U TURN”, “CHERRY HILL RD./CALVERTON BLVD.”, “39.056975”, “-76.9546333333333”, “No”, “No”, “No”, “Yes”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “1998”, “DODG”, “DAKOTA”, “WHITE”, “Citation”, “21-402(b)”, “Transportation Article”, “No”, “BLACK”, “M”, “FORT WASHINGTON”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.056975”, “-76.9546333333333”, null, false ] ] , [ 3064531, “077389FD-023F-420B-909C-85034959FEF5”, 3064531, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE OF DR. TO MAKE LANE CHANGE TO AVAIL. LANE NOT IMMED. ADJ. TO STOPPED EMERG. VEH,”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “21-405(e1)”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 2118177, “730662D9-7CC1-4815-BA0B-C28CD61BEC56”, 2118177, 1482239054, “498050”, 1482239054, “498050”, null, “2013-08-27T00:00:00”, “00:55:00”, “MCP”, “2nd district, Bethesda”, “NEGLIGENT DRIVING VEHICLE IN CARELESS AND IMPRUDENT MANNER ENDANGERING PROPERTY, LIFE AND PERSON”, “CONNECTICUT/CHEVY CHASE LAKE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2013”, “HYUNDAI”, “ELANTRA”, “GRAY”, “Citation”, “21-901.1(b)”, “Transportation Article”, “No”, “WHITE”, “F”, “SILVER SPRING”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 2118178, “82062BF4-BFCB-4A0D-8416-22D262569EBE”, 2118178, 1482239054, “498050”, 1482239054, “498050”, null, “2013-10-08T00:00:00”, “13:23:00”, “MCP”, “4th district, Wheaton”, “DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION”, “GEORGIA AVE / BEL PRE RD”, “39.0933833333333”, “-77.0795516666667”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “1993”, “FORD”, “PICKUP”, “BLACK”, “Citation”, “13-401(h)”, “Transportation Article”, “No”, “HISPANIC”, “M”, “BELTSVILLE”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.0933833333333”, “-77.0795516666667”, null, false ] ] , [ 2118179, “E3BB6016-5938-41B4-9C3B-7A1DEE7FFBB6”, 2118179, 1482239054, “498050”, 1482239054, “498050”, null, “2015-04-24T00:00:00”, “00:38:00”, “MCP”, “1st district, Rockville”, “DRIVER FAIL TO STOP AT FLASHING RED TRAFFIC SIGNAL STOP LINE”, “EB MONTROSE PKWY/EAST JEFFERSON ST”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “VA”, “02 - Automobile”, “2003”, “DODGE”, “SPRINTER”, “WHITE”, “Citation”, “21-204(b)”, “Transportation Article”, “No”, “HISPANIC”, “M”, “SILVER SPRING”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 3064532, “0D05BB47-567E-4521-B85C-1BD77C0ADDED”, 3064532, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE OF INDIVIDUAL DRIVING ON HIGHWAY TO DISPLAY LICENSE TO UNIFORMED POLICE ON DEMAND”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “16-112©”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 3064533, “39689BC8-0E7E-4525-AC40-C469AE08F4B0”, 3064533, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “DRIVING VEHICLE ON HIGHWAY WITH AN EXPIRED LICENSE”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “16-115(g)”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ] , [ 2118181, “7980162D-D02E-4288-943B-52150C154DF1”, 2118181, 1482239054, “498050”, 1502443108, “498050”, null, “2014-02-14T00:00:00”, “20:10:00”, “MCP”, “1st district, Rockville”, “FAILURE TO DRIVE ON RIGHT HAND ROADWAY OF DIVIDED HWY”, “GATEWAY CENTER DR @ CLARKSBURG RD”, “39.2348434333333”, “-77.28153995”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2005”, “CADI”, “STS”, “BLACK”, “Citation”, “21-311(1)”, “Transportation Article”, “No”, “WHITE”, “M”, “POINT OF ROCK”, “MD”, “WV”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.2348434333333”, “-77.28153995”, null, false ] ] }
... View more
Labels:
03-01-2018
12:14 PM
In below XML raw table author name was split by comma. But I need to insert a single value in the Main table. XML Raw table (Staging table) CREATE TABLE xml_raw(line string); LOAD DATA LOCAL INPATH ‘/home/xxxxx/book_simple.txt’ INTO TABLE xml_raw; Eg: SELECT XPATH_STRING(line,‘catalog/book/author’) FROM xml_raw; Gambardella, Matthew Ralls, Kim Corets, Eva Main Table CREATE TABLE books (author string, title string, genre string, price double, publish_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n’ STORED AS TEXTFILE; Below is the insert command to load the main table, but the result comes as not expected. INSERT OVERWRITE TABLE books SELECT XPATH_STRING(line,‘catalog/book/author’), XPATH_STRING(line,‘catalog/book/title’), XPATH_STRING(line,‘catalog/book/genre’), XPATH_DOUBLE(line,‘catalog/book/price’), XPATH_STRING(line,‘catalog/book/publish_date’) FROM xml_raw; Result SELECT * FROM books ; OK books.author books.title books.genre books.price books.publish_date Gambardella Matthew XML Developer’s Guide NULL 44.95 Ralls Kim Midnight Rain NULL 5.95 Corets Eva Maeve Ascendant NULL 5.95 But required result to be like books.author books.title books.genre books.price books.publish_date Corets,Eva Maeve Ascendant Fantasy 5.95 2000-11-17 I need author name to be in the single column. Awaiting your response.
... View more
- Tags:
- Data Processing
- Hive
Labels:
01-26-2017
02:28 PM
@Kashif, shall I have any sample query how to use instr and substr
... View more
01-26-2017
02:05 PM
Dear Members, I have a table "Processor types" with column name processor. Query SELECT processor, SUM(processor) AS Total from processor-types where (processor like '%i210%')
and ddate like '20161009' group by processor output
Processor
Total
i210at
4
i210-at
6
i210 2016
8
intel+ethernet+connection+i210
1
wgi210it+s+ljxt
4
wgi210is_sljxx
3
i210-t1
4
ethernet i210-t1
5
intelâ® ethernet controller i210-at
5
wgi210is
2
But I need output in summarized form of the total and generic name in processor column as i210. It mean whatever the string in column it has to be replaced by 'i210'. Eg: (ethernet i210-t1) string to replaced by i210
Processor
Total
i210
42
will you all please suggest me how to do. If so send me sample query.
... View more
Labels:
05-15-2016
04:16 PM
Plz find the RM UI logs: Log Type: directory.info Log Upload Time: Sun May 15 16:14:57 +0000 2016 Log Length: 23827 Showing 4096 bytes of 23827 total. Click here for the full log. -r-xr-xr-x 1 yarn hadoop 33361 Feb 10 06:42 ./mr-framework/hadoop/include/hdfs.h
789714 8 -r-xr-xr-x 1 yarn hadoop 6330 Feb 10 06:42 ./mr-framework/hadoop/include/Pipes.hh
789729 4 drwxr-xr-x 4 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share
790085 4 drwxr-xr-x 3 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/doc
926452 4 drwxr-xr-x 6 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/doc/hadoop
789730 4 drwxr-xr-x 9 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop
926043 4 drwxr-xr-x 6 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/common
926399 4 drwxr-xr-x 5 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/mapreduce
926279 4 drwxr-xr-x 5 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/tools
789973 4 drwxr-xr-x 3 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/httpfs
926140 4 drwxr-xr-x 7 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/hdfs
789842 4 drwxr-xr-x 3 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/kms
789731 4 drwxr-xr-x 5 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/share/hadoop/yarn
790086 4 -r-xr-xr-x 1 yarn hadoop 1366 Feb 10 06:42 ./mr-framework/hadoop/README.txt
789695 4 drwxr-xr-x 2 yarn hadoop 4096 Feb 10 06:42 ./mr-framework/hadoop/bin
789698 4 -r-xr-xr-x 1 yarn hadoop 830 Feb 10 06:42 ./mr-framework/hadoop/bin/kill-name-node
789706 12 -r-xr-xr-x 1 yarn hadoop 8782 Feb 10 06:42 ./mr-framework/hadoop/bin/hdfs.cmd
789702 8 -r-xr-xr-x 1 yarn hadoop 6594 Feb 10 06:42 ./mr-framework/hadoop/bin/hadoop
789699 128 -r-xr-xr-x 1 yarn hadoop 129158 Feb 10 06:42 ./mr-framework/hadoop/bin/test-container-executor
789700 12 -r-xr-xr-x 1 yarn hadoop 12069 Feb 10 06:42 ./mr-framework/hadoop/bin/yarn.cmd
789697 12 -r-xr-xr-x 1 yarn hadoop 8786 Feb 10 06:42 ./mr-framework/hadoop/bin/hadoop.cmd
789705 16 -r-xr-xr-x 1 yarn hadoop 12329 Feb 10 06:42 ./mr-framework/hadoop/bin/hdfs
789703 8 -r-xr-xr-x 1 yarn hadoop 6059 Feb 10 06:42 ./mr-framework/hadoop/bin/mapred
789708 4 -r-xr-xr-x 1 yarn hadoop 1857 Feb 10 06:42 ./mr-framework/hadoop/bin/rcc
789704 112 -r-xr-xr-x 1 yarn hadoop 114438 Feb 10 06:42 ./mr-framework/hadoop/bin/container-executor
789701 4 -r-xr-xr-x 1 yarn hadoop 888 Feb 10 06:42 ./mr-framework/hadoop/bin/kill-secondary-name-node
789707 8 -r-xr-xr-x 1 yarn hadoop 7465 Feb 10 06:42 ./mr-framework/hadoop/bin/mapred.cmd
789696 16 -r-xr-xr-x 1 yarn hadoop 14452 Feb 10 06:42 ./mr-framework/hadoop/bin/yarn
789715 16 -r-xr-xr-x 1 yarn hadoop 15429 Feb 10 06:42 ./mr-framework/hadoop/LICENSE.txt
1580343 60 -r-x------ 1 yarn hadoop 60530 May 15 16:14 ./commons-logging-1.1.1.jar
1580373 152 -r-x------ 1 yarn hadoop 152255 May 15 16:14 ./parquet-hadoop-1.4.1.jar
1580225 12 -r-x------ 1 yarn hadoop 10378 May 15 16:14 ./parquet-generator-1.4.1.jar
1580397 20 -r-x------ 1 yarn hadoop 19816 May 15 16:14 ./opencsv-2.3.jar
1580210 392 -r-x------ 1 yarn hadoop 400090 May 15 16:14 ./avro-1.7.5.jar
1317565 4 drwxr-xr-x 2 yarn hadoop 4096 May 15 16:14 ./jobSubmitDir
1055507 4 -r-x------ 1 yarn hadoop 488 May 15 16:14 ./jobSubmitDir/job.split
1055504 4 -r-x------ 1 yarn hadoop 25 May 15 16:14 ./jobSubmitDir/job.splitmetainfo
1577395 1736 -r-x------ 1 yarn hadoop 1774917 May 15 16:14 ./kite-data-hive-1.0.0.jar
1055515 4 -rwx------ 1 yarn hadoop 707 May 15 16:14 ./default_container_executor.sh
1577378 1704 -r-x------ 1 yarn hadoop 1741973 May 15 16:14 ./kite-data-mapreduce-1.0.0.jar
... View more
05-15-2016
03:38 PM
Getting error in sqoop import from Mysql into hive as Parquet Format. But table is created but no data's in it. Command: sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db --username=retail_dba -P --table departments --hive-import --hive-home="/apps/hive/warehouse" --create-hive-table --hive-table departments --hive-database retail_edw --fields-terminated-by '|' --lines-terminated-by '\n' --as-parquetfile -m 4 Error as Below 16/05/15 15:31:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/05/15 15:31:07 INFO tool.CodeGenTool: Beginning code generation
16/05/15 15:31:07 INFO tool.CodeGenTool: Will generate java class as codegen_departments
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/15 15:31:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/05/15 15:31:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/05/15 15:31:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.0.0-169/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/f26881f08d02b67aebb72e231c44fa1a/codegen_departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/05/15 15:31:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f26881f08d02b67aebb72e231c44fa1a/codegen_departments.jar
16/05/15 15:31:11 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/05/15 15:31:11 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/05/15 15:31:11 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/05/15 15:31:11 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/05/15 15:31:11 INFO mapreduce.ImportJobBase: Beginning import of departments
16/05/15 15:31:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/05/15 15:31:15 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
16/05/15 15:31:15 INFO hive.metastore: Connected to metastore.
16/05/15 15:31:15 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
16/05/15 15:31:15 INFO hive.metastore: Connected to metastore.
16/05/15 15:31:15 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: departments
16/05/15 15:31:16 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
16/05/15 15:31:16 INFO hive.metastore: Connected to metastore.
16/05/15 15:31:20 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/05/15 15:31:20 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.183.132:8050
16/05/15 15:31:21 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
16/05/15 15:31:21 INFO hive.metastore: Connected to metastore.
16/05/15 15:31:24 INFO db.DBInputFormat: Using read commited transaction isolation
16/05/15 15:31:24 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`department_id`), MAX(`department_id`) FROM `departments`
16/05/15 15:31:25 INFO mapreduce.JobSubmitter: number of splits:4
16/05/15 15:31:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1463318852105_0012
16/05/15 15:31:27 INFO impl.YarnClientImpl: Submitted application application_1463318852105_0012
16/05/15 15:31:27 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1463318852105_0012/
16/05/15 15:31:27 INFO mapreduce.Job: Running job: job_1463318852105_0012
16/05/15 15:31:41 INFO mapreduce.Job: Job job_1463318852105_0012 running in uber mode : false
16/05/15 15:31:41 INFO mapreduce.Job: map 0% reduce 0%
16/05/15 15:31:41 INFO mapreduce.Job: Job job_1463318852105_0012 failed with state FAILED due to: Application application_1463318852105_0012 failed 2 times due to AM Container for appattempt_1463318852105_0012_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://sandbox.hortonworks.com:8088/cluster/app/application_1463318852105_0012Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e28_1463318852105_0012_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
16/05/15 15:31:41 INFO mapreduce.Job: Counters: 0
16/05/15 15:31:41 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/05/15 15:31:41 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 22.2023 seconds (0 bytes/sec)
16/05/15 15:31:41 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
16/05/15 15:31:41 INFO mapreduce.ImportJobBase: Retrieved 0 records.
16/05/15 15:31:41 ERROR tool.ImportTool: Error during import: Import job failed!
... View more
Labels:
05-02-2016
02:04 PM
Not able to view history logs in application master after running hive query in tez engine. Getting error as Tez UI Url is not defined. To enable tracking url pointing to Tez UI, set the config tez.tez-ui.history-url.base in the tez-site.xml Guys, am new to hadoop help me to sort out the problem
... View more
- Tags:
- Data Processing
- Hive
Labels:
04-22-2016
05:12 PM
I sort out the issue manish, thanks for your support, if i update entry of same ip in C:\Windows\System32\drivers\etc\hosts, am able to view the history logs through Application master itself
... View more
04-22-2016
05:05 PM
Thanks Manish, It works now but , when i click the job_id from from Application master to view history logs it fails and error am getting stated as above , may i know why anything i need to change, please help me
... View more
04-22-2016
04:41 PM
Plz find the detail eth1 Link encap:Ethernet HWaddr 00:0C:29:5C:4C:C8
inet addr:192.168.183.132 Bcast:192.168.183.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1634697 errors:0 dropped:0 overruns:0 frame:0
TX packets:40239 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:250967654 (239.3 MiB) TX bytes:20970278 (19.9 MiB) lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:1542905 errors:0 dropped:0 overruns:0 frame:0
TX packets:1542905 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1378392579 (1.2 GiB) TX bytes:1378392579 (1.2 GiB)
... View more
04-22-2016
03:22 PM
Thanks Manish When i check ifconfig no IP address assigned to etho. Getting error as [root@sandbox ~]# ifconfig eth0 | grep "inet addr"
eth0: error fetching interface information: Device not found
... View more
04-22-2016
03:02 PM
Thanks kuldeep, I tried but the same error persists. Anything else? My sandbox running in vmware workstation and my host os is windows10.
... View more
04-22-2016
02:41 PM
3 Kudos
Dear Hortonworks,
Am using hdp2.4 sandbox, once my job finished (eg. sqoop import job) not able to view the history logs through resource manager UI. If i click history am getting error as below. All the quick links in MR2 getting the same even if history server is started and running fine. http://sandbox.hortonworks.com:19888/ –> Job history server UI This site can’t be reached sandbox.hortonworks.com’s server DNS address could not be found. Looking yours support!
... View more