Member since
12-18-2018
10
Posts
1
Kudos Received
0
Solutions
01-18-2019
05:32 AM
@Shu Hi Shu, Thanks for reply !! No its actually not possible for me to store this JSON in separate hive table using json serde. Actually there are 360 different type of events with underlying schema changes and I need to process them dynamically on fly. So what I have done is stored the datamap(which is one of the key in entire json event) into a hive column table and then I am using get_json_object to fetch the json elements as and when required. But the datamap(stored as a string column) itself has nested complex structure so I want to unnest that.
... View more
01-16-2019
03:14 PM
1 Kudo
Hi,
I have one column in hive table wherein I have stored entire json data map as string. I am using get_json_object to fetch each element of json. However I have one element which is array of structs. I need to explode that array of structs.
Event Sample:
{"evtDataMap":{"ucmEvt":{"rscDrvdStateEntMap":[{"prov_Id":"OCP","acct_Id":"ACCOUNT_0","chanlTypeId":"Chat","derivedAvlFlg":"TRUE","activeWrkCnt":"0","avlState":"READY","workLimit":"2"},{"providerId":"OCP Email","accountId":"ACCOUNT_0","channelTypeId":"Email","derivedAvailableFlag":"TRUE","activeWorkCount":"0","availabilityState":"READY","workLimit":"2"},{"providerId":"OCP ShortMessageService","accountId":"ACCOUNT_0","channelTypeId":"ShortMessageService","derivedAvailableFlag":"TRUE","activeWorkCount":"0","availabilityState":"READY","workLimit":"2"}]}}}
Entire evtDataMap is stored in a hive column and I want the output like
prov_id
acct_Id
chanlTypeId
derivedAvlFlg
activeWrkCnt
avlState
workLimit
OCP
ACCOUNT_0
Chat
TRUE
0
READY
2
OCP Email
ACCOUNT_0
Email
TRUE
0
READY
2
OCP ShortMessageService
ACCOUNT_0
hortMessageServic
TRUE
0
READY
2
Basically I want to flatten the array of structs. I am using query like this with temp as (
select evName,get_json_object(evtDataMap,'$.ucmEvt.rscDrvdStateEntMap') as mapp from avaya.jmsrec_temp where evtName ='USER')
Select evtName, a.prov_id,a.acct_Id,a.chanlTypeId ,a.derivedAvlFlg,a.activeWrkCnt,a.avlState,a.workLimit from temp lateral view inline(temp.mapp) a as prov_id,acct_Id,chanlTypeId ,derivedAvlFlg,activeWrkCnt,avlState,workLimit eventName is a hive table column. I am getting an error as "FAILED: UDFArgumentException Top level object must be an array but was string" Tried array over get_json_object but still I am getting an error array(get_json_object(eventDataMap,'$.ucmEvent.user.resourceDerivedStateEntryMap')) as mapp Error FAILED: UDFArgumentException The sub element must be struct, but was string I am stuck with this issue since 3 days, any help is much much appreciated..
... View more
Labels:
- Labels:
-
Apache Hive
12-21-2018
02:30 PM
@jbarnett Thanks for your response !! I have already implemented json serde over hive tables where the underlying json schema was fixed and not changing in other scenario And TRANSFORM function helps in applying some transformation or logic over fixed set of fields in SELECT statement of hive. Definitely we can write python code and call some fields in TRANSFORM function but that fieldset needs to be fixed. Correct me if I am wrong ? I have pasted some sample events below. Consider 250+ type of these events where in each event has different underlying schema(different nested structures, no of fields, different data type formats ) and any event type can come on fly. The event file land in raw zone of my data lake which consists of 1000 events(batch defined) and then in processed zone, I want to process these events and store in some structured way. So that's the solution I am searching for. May be I can categorize the problem in below steps: 1. How shall I store these events in processed zone ? 2. How shall I process the events in some structured form, considering the dynamic nature of underlying json schema of each event ? Maybe there can more steps but as of now I can think of these two only. I am open with an tool/utility/technology to process these events. Sample Events: **************************************************************************************************************************************************** **************************************************************************************************************************************************** {"eventFamily":"CCELITE_RESD","eventName":"TIME22","eventVersion":"8.0","eventTimestamp":"1158511980000","snapinTimestamp":"1184729563007","sourceType":"CCElite","sourceId":"CM70","sourceVersion":"7.0","snapinId":"analytCollectorElite","snapinVersion":"3.2.0.0.1097","sequenceNumber":337,"eventDataMap":{"TZOffsetHours":"6","Year":"16","Hour":"13","CMTime":"1","TimeTag":"1158511980000","Minute":"53","DaylightSavings":"1","TZOffset":"0","TZOffsetMinutes":"0","DayOfYear":"81"},"collectedTime":1184729563007,"publishedTime":1184729563015} {"eventFamily":"CCELITE_RESD","eventName":"24ACW","eventVersion":"9.0","eventTimestamp":"1158590110000","snapinTimestamp":"1184729573010","sourceType":"CCElite","sourceId":"CM70","sourceVersion":"7.0","snapinId":"analytCollectorElite","snapinVersion":"3.2.0.1097","sequenceNumber":348,"eventDataMap":{"Hold":"0","Reconnect":"0","CMTime":"1","CMAgent":"5550181","UCIDPresent":"0","Direction":"0","WaitObserve":"0","TimeTag":"1158590110000","UCIDTimestamp":"0","Position":"5550181","UCIDCallSeqNo":"0","MeasuredInternal":"0","ExternalCall":"0","KeyboardDialed":"0","UCIDNetworkID":"0"},"collectedTime":1184729573010,"publishedTime":1184729573016} {"eventFamily":"UCA_MIN_EVENTS_PR","eventName":"USERP","eventCategory":"PUMP_UPSI","eventAction":"PUMP_UPSI","eventVersion":null,"sourceType":"UCA","sourceId":"UCA","sourceVersion":"EDD.wf.auras.com","snapinId":"AdminDataCollector","snapinVersion":"3.3.0.701301","sequenceNumber":null,"eventTimestamp":"1532080250298","matchEventField":null,"siteId":null,"siteName":null,"eventDataMap":{"ucaEvent":{"subscriptionId":"CeanaAdcQueue_10_134_44_196","user":{"userId":"analyt","tenant":null,"userName":"analyt","firstName":"analyt","lastName":"analyt","displayName":"analyt","attributes":{"Language":["English","French"],"Channel":["Voice"],"Service":["CRTAccnt"],"Location":["Inhouse"]},"role":"SUPERVISOR","supervisorId":null,"supervisorFirstName":null,"supervisorLastName":null,"supervisorDisplayName":null,"title":null}}}} {"eventFamily":"UCA_MIN_EVENTS_PR","eventName":"GROUP_MEMBER","eventCategory":"PUMP_UPSI","eventAction":"PUMP_UPSI","eventVersion":null,"sourceType":"UCA","sourceId":"UCA","sourceVersion":"rre4481.sazab.com","snapinId":"AdminDataCollector","snapinVersion":"3.2.1.61101","sequenceNumber":null,"eventTimestamp":null,"matchEventField":null,"siteId":"SITEIDXYZ","siteName":"SITENAMEDUB","eventDataMap":{"ucaEvent":{"subscriptionId":"CeanaAdcQueue_10_134_44_93","group":{"tenant":null,"groupId":"10020","name":"groupROI","type":"USER"}}}} {"eventFamily":"WORK_EVENTS_PR","eventName":"CONVERSATION_INTERACTION","eventVersion":"1","eventCategory":"REALTIME","eventAction":"WRITE","sourceType":"UCM","sourceId":"WFEDP42118V","sourceVersion":"3.4","snapinId":"UIDataCollector","snapinVersion":"3.4","eventTimestamp":"1532080290098","eventDataMap":{"ucmEvent":{"subscriptionId":"EANA_LIVE","notificationType":"CONVERSATION_INTERACTION","delta":[],"WORK":{"Id":"WORK_ID_0","CONTACT":{"Id":"CONTACT_ID_A_0","conversationInteraction":{"Id":"RESOURCE_ID_B_0","providerId":"CM3456","resourceId":"RESOURCE_ID_B_0","channelTypeId":"Voice","accountId":"ACCOUNT_0","conversationId":"CONTACT_ID_A_0","activityCode":"null","transferredServiceId":"null","routePointId":"8344568","activityCodeEffectiveDT":"null","previousActivityCode":"null","previousActivityCodeEffectiveDT":"null","afterContactWorkEnabled":"null","offeredService":{"serviceName":"Ceana Service","attributes":{"Language":["French"],"Channel":["Voice"],"Service":["CRTAccnt"],"Location":["Inhouse"]},"priority":5,"serviceID":"Channel.Voice|Language.French|Location.Inhouse|Service.CRTAccnt"},"requestedService":null,"interactionType":"POCALLED","interactionTypeDT":"1792080290098","previousInteractionType":"null","previousInteractionTypeDT":"null","interactionTypeEndDT":"null","version":"1","createDT":"1532080290098","state":"ALERTNG","stateEffectiveDT":"1534580290098","previousState":"DEF","previousStateEffectiveDT":"null","stateReason":"DEF","stateReasonEffectiveDT":"1534580290098","previousStateReason":"DEF","previousStateReasonEffectiveDT":"null","isRoutedCall":"true","isExternal":"false"}}}}}} Thanks Anish
... View more
12-19-2018
01:34 PM
We have a file in hdfs which contains multiple json events all with different schema and we want to batch process that file. The underlying schema of those events is different, means one event can 10 fields, other event can have 8 fields with nested structure, another event can have 5 nested fields with all underlying individual fields. I mean to say schema is not fixed. What is the best strategy to process events in such scenario. We are open with any tool like Spark, Hive etc for batch processing of events. The end result is to give the structured format to these events so that we can analyse these by combining with other hive tables/datasets.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
12-11-2018
09:57 AM
Anyone who has encountered a similar problem. I noticed that Nifi is also not able to read data also(through GetHDFS). The problem which I am anticipating is Nifi Client is able to connect to NameNode but its not able to connect to Data Node since NameNode sends it a invalid private address (I am using using Hortonworks Sandbox) and that's why NiFi client is unable to read/write data from HDP
... View more
12-07-2018
01:15 PM
Hi, I am struggling with an error. I am trying to write a file on HDFS from NiFi but its failing with error below. I am using Hortonworks Sandbox(HDP) and NiFi(HDF) both hosted on ESXi server. Caused by: org.apache.hadoop.ipc.RemoteException: File /user/nifi/.temp1.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. Checklist: 1. Lot of space available in DFS, means HDFS health is in good state. I am able to execute hadoop fs -put <filename> /tmp and write considerable file of good size onto HDFS. 2. Property set to true <property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property> 3. hdfs-site.xml,core-site.xml copied to HDF 4. HDP hostname entry copied to /etc/hosts on NiFi Machine
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi