Member since
07-25-2018
174
Posts
29
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2541 | 03-19-2020 03:18 AM | |
1655 | 01-31-2020 01:08 AM | |
752 | 01-30-2020 05:45 AM | |
1252 | 06-01-2016 12:56 PM | |
1392 | 05-23-2016 08:46 AM |
05-01-2020
07:25 AM
Let me try with checkpoint. Thanks for your reply. @graghu
... View more
03-25-2020
11:18 PM
Yes , I am also facing same issue . Any update on it or workaround?
... View more
- Tags:
- i
03-19-2020
03:18 AM
Initially , My query was not converted in one line.Due to which i was facing syntax error in beeline itself. I converted complete big query into one line and working now.
... View more
03-18-2020
11:02 PM
@EricL Thanks for your reply. I am getting below error in hiveserver log. 10:20:02 START Executing current statement for: 'SCIP_Test_Env' [Hive] 10:20:03 FAILED [SELECT - 0 rows, 0.574 secs] [Code: 0, SQL State: 08S01] org.apache.hive.org.apache.thrift.transport.TTransportException: HTTP Response code: 500
... View more
03-18-2020
06:46 AM
Hi All, I have implement almost 300 lines of hive query. The query template is as below. Insert into targetTable Select col1,col2...........cols . . . . (Almost 300 lines in select query with lots of case when statemnet) From source table My observation is that it works without giving any error in apache zeppeline. But when i trying to execute it through beeline it's failing with transport exception response code 500. Other observation,when i remove few number of columns from select query in that also it works fine. I do not understand why query is failing ? I need to know the reason behind failure? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
02-13-2020
03:06 AM
Accidently , I marked this answer as resolved . @rajkumar_singh I am getting below output after executing "hdfs groups <username>" command. <username>@<kerberose principle > : domain users dev_sudo As i am not much aware of cluster configuration So , Could you please help me to understand the output of this command.
... View more
02-11-2020
08:26 PM
Hi Everyone, We have enabled hive compaction on transnational table in order to improve fetch query performance.Somehow it's not working at my end, am facing attached error which i observed in "hivemetastore.log". Can somebody please let us know, How this issue can be resolved ?
... View more
Labels:
- Labels:
-
Apache Hive
02-09-2020
07:48 AM
I am also not able to understand, whether compaction is enabled on my table or not? Does anyone know how to filter the output of show compactions query.
... View more
02-04-2020
09:18 AM
Hi all, Above solution is failing at one scenario, Scenario: if multiple flow files processed at a time and landed in the nifi queue which is used after update query ( i.e. puthiveql which increment processed_file_cnt by one for every flow file ) processor ,then there might be chances of triggering the next flow multiple times and that is wrong. Because we do select processed_file_cnt first and then doing the comparison for processed_file_cnt with input_file_cnt.
... View more
01-31-2020
08:49 PM
Hi , My assumptions was wrong , putsql processor does execute update query per flowfile
... View more
01-31-2020
01:08 AM
1 Kudo
I have resolved this problem statement with the below approach 1. Count the number of input_file_count (fragments.count gives me the result) 2. Prepare transnational table with the scheam (primary_key_id , processesed_file_cnt). 3. Add inital record at the beginning of the NiFi pipeline using INSERT INTO command. e.g. INSERT INTO <tableName> values(23,0) 4. Keep on update the processed_file_cnt using update query by one for each batch 5. select processed_file_cnt column and compare it with input_file_count. 6. If matched , then kick start next flow else terminate Unmatched relationship to same RouteOnAttribute processor.
... View more
01-30-2020
05:45 AM
Thanks for reaching out to me. It was my mistake , I was redirecting/connecting the wrong out port to pg2_in. It's resolved now
... View more
01-29-2020
09:28 AM
Hi All,
I have used three process groups in nifi (pg1,pg2,pg3). There is an output port called as pg1_Out coming out from pg1. This port is connected to pg2_in and pg3_in input ports.
pg1_out------>pg2_in
|
|
|
pg3_in
Problem : when i trigger these process groups then flowfiles are passing to only pg3_in but not to pg2_in. Ideally , the flow files should get passed to both input port.
I do not understand why am i seeing such behaviour in nifi.
Does anyone has faced such problem before? Please respose to me your thoughts on this. I would be really appreciated to your help.
... View more
Labels:
- Labels:
-
Apache NiFi
01-21-2020
09:52 AM
Hi , Thanks for your replay. Yes ,we can use wait and notify processor. But how would i ensure that previous has been completed or failed. What is that criteria or condition? Let me explain you current nifi flow,we have 3 layers and in/out for each layer always requires 4 flowfiles (number of flowfiles depend on input parameters). How can i integrate wait/notify pattern by keeping this in mind? Do you have any nifi example where it's already implemented? Could you please share nifi template xml with me.
... View more
01-20-2020
08:42 PM
Hi @DennisJaheruddi Thank you for writing to me. There are some situation or places in NiFi flow ,where i require to execute SQL INSERT for each flowfilea and at some places need to execute SQL INSERT only once irrespective on number of flowfiles. Could you please point me in right direction for the same?
... View more
01-20-2020
08:33 PM
Hi @DennisJaheruddi Thanks for you reply. Don't you think, if we use wait and Notify processor then pipeline will get proceeded ahead for next execution after the expiration of Expiration Duration attributes value of wait processor. The situation may occur like the execution is not completed for previous layer but NiFi flow get triggered due expiration of above attribute. It seems wrong behavior to me.
... View more
- Tags:
- n't
01-19-2020
05:36 AM
Hi All,
I am tyring to execute update query through putSQL processor. I want to update number of processed file count column of one of the transactional table in mysql.
The issue is that whenever multiple flowfiles coming as input to putSQL processor at a time,the processed file count is getting incremented and updated only once. Ideally,the behaviour should be like increment the counter one by one for each flowfile and then update the processed file count column in MySQL table.
NiFi flow:
Fetch processed file count using select query->updateattribute : processed file count+1->putSQL:update query.
Thanks in advance
Please share you response on it.
... View more
- Tags:
- apache nifi
Labels:
- Labels:
-
Apache NiFi
01-16-2020
09:48 AM
Hello all,
Problem statement: We have 3 layers in our project,for each layer we have nifi flow. Currently, the layers or nifi flow for each layer is getting executed one after another (sequentially).
Execution flow: layer1->layer2->layer3-> so on.
The trigger point or input parameters to the flow is nothing but number of source files. Which will be processed further in every layer one by one.
I am looking for such solution ,where my next nifi flow for next layer should not get triggered until completion of previous one for previous layer and for all files.
And if current layer got succeeded for all the files then only next one will kick off.
Please suggest me your thought process on the same. I will be very appreciated on your response.
Thank you in advance.
... View more
- Tags:
- apache nifi
- NiFi
Labels:
- Labels:
-
Apache NiFi
06-08-2019
09:27 AM
Hi All, I want to create the custom hashmap accumulator in spark for one of my use case. I have already referred and implemented Accumulator as per code is given on below links but not found an end-to-end example for the same. I am trying to do it in java. https://www.brosinski.com/post/extending-spark-accumulators/ Problem : I want to know, How Can I increment value for each key in the same accumulator within foreach() transformation?
... View more
- Tags:
- Data Processing
- Spark
Labels:
- Labels:
-
Apache Spark
02-03-2017
09:34 AM
HI, Our team working on Hadoop framework/technologies with Hive,
Ambari, Ranger and planning to create a dashboard, which provides following
information :
Execution time of a Hive query. Size of data generated by the created table. Frequency of using each hive schema, table and
each column of respective table. User/ Application name or ID firing any query. Resources usage of each application /User. For now, we are trying to use HiveMetastore and Ranger audit
logs to access the above mentioned information. Is there any other better way to fetch the information above
? Kindly let me know, if I need to provide any more
information.
... View more
- Tags:
- Data Processing
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
01-07-2017
03:22 PM
Thank you Rguruvannagari. This solution really worked for me.
... View more
01-07-2017
11:00 AM
Hi, I have installed below Hadoop services and all services are running fine services.png The problem is that when i am executing below query from Hive cli I am getting the exception in terminal as. Query : select count(*) from mytables; Exception: short exception description: Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://<nanodeHostIp>:8020/hdp/apps/2.2.9.0-3393/mapreduce/mapreduce.tar.gz) please find attached exception file exception.txt and I have also checked path on HDFS "/hdp" which is not completely present on file system so my question is, As have done fresh hadoop installation through Apache ambari then why ambari did not created files or directories under "/hdp" directory automatically. What is the solution for above problem? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
01-03-2017
12:05 PM
Hi Ayub, Here is the link for above same question, https://community.hortonworks.com/questions/75818/issue-regarding-apache-atlas-rest-api-to-create-hi.html
... View more
01-03-2017
10:53 AM
Hi Ayub, We have two table i.e. table4 and table5 having columns Id, Name, Age. we have inserted entity metadata, lineage metadata of both table in Atlas and able to see the schema and lineage graph in Atlas. After that I have deleted the entity metadata of table5 and reinserted entity metadata of table5. Next I have inserted the same lineage metadata (earlier lineage JSON metadata) of both table in atlas; however not able to see the lineage graph of two table. getting below response message from atlas server. {"requestId":"qtp662559856-30620 -
26dbb640-9629-4c29-b209-32331e52962e","entities":{}} Please find here the below lineage JSON metadata and let me know the mistake I have done. lineagejson.txt so after deleting table hive table entity and reinserting same metadata(i.e. hive table entity JSON data) we are unable to see the lineage in atlas ,so what could be the reason behind this? what actually mistake I am making in lineage json second time because of which I am not getting lineage? Do I need to change the value of process id or process name in JSON ?
... View more
01-03-2017
07:01 AM
Hi Ayub, The above issue is solved. Actually there was mistake in JSON. Whenever we have multiple columns in a table we must have to provide different random GUID long number (not same for all the column) even though it's negative number only in such a case Apache atlas will able to different column names otherwise,will get same name for all columns in Apache Atlas UI. To make this work I have just provided different random ID for each columns as follows: Please find attached the correct JSON file :
... View more
01-03-2017
05:58 AM
Thank you Ayub,
it's working fine for me,
I have one query here, I have one table table5 has columns id, name and age, After inserting the table5 metadata into Atlas I am getting the repeated column I.e. name only. however not getting metadata for Id and age column. Please find here the below table5 JSON & let me know if any mistake is there in JSON. Please find attached Atlas image I am getting output like shown in image. atlas-snapshot.png [
{
"traits":{
},
"traitNames":[
],
"values":{
"ownerType":2,
"owner":"root",
"qualifiedName":"default@Sandbox",
"clusterName":"Sandbox",
"name":"default",
"description":"emr hive database",
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/\/warehouse",
"parameters":{
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_db",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_db",
"id":"-11893021824425525",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{
},
"traitNames":[
],
"values":{
"owner":"root",
"temporary":false,
"lastAccessTime":"2017-01-03T11:02:53.000Z",
"qualifiedName":"default.table5@Sandbox",
"columns":[
{
"traits":{
},
"traitNames":[
],
"values":{
"owner":"root",
"qualifiedName":"default.table5.name@Sandbox",
"name":"name",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{
},
"traitNames":[
],
"values":{
"owner":"root",
"qualifiedName":"default.table5.id@Sandbox",
"name":"id",
"type":"int",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{
},
"traitNames":[
],
"values":{
"owner":"root",
"qualifiedName":"default.table5.age@Sandbox",
"name":"age",
"type":"int",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
}
],
"tableType":"MANAGED_TABLE",
"sd":{
"traits":{
},
"traitNames":[
],
"values":{
"qualifiedName":"default.table5@Sandbox_storage",
"storedAsSubDirectories":false,
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/warehouse\/table5",
"compressed":false,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"parameters":{
},
"serdeInfo":{
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde"
},
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
},
"numBuckets":-1
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_storagedesc",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_storagedesc",
"id":"-11893021824425523",
"state":"ACTIVE",
"version":0
}
},
"createTime":"2017-01-03T11:02:53.000Z",
"name":"table5",
"partitionKeys":[
],
"parameters":{
"totalSize":"0",
"rawDataSize":"0",
"numRows":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numFiles":"0",
"transient_lastDdlTime":"1482917693"
},
"db":{
"traits":{
},
"traitNames":[
],
"values":{
"ownerType":2,
"owner":"root",
"qualifiedName":"default@Sandbox",
"clusterName":"Sandbox",
"name":"default",
"description":"emr hive database",
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/\/warehouse",
"parameters":{
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_db",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_db",
"id":"-11893021824425525",
"state":"ACTIVE",
"version":0
}
},
"retention":0
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_table",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
}
]
... View more
12-29-2016
12:27 PM
Thanks to see you gain Ayub, Could you please post what changes you have made in above json? did you change guid somewhere to link to dataset or something else?
... View more
12-29-2016
11:38 AM
Hi Guys, I able able to create lineage(i.e hive_process) between two dataset in apache atlas,i have referred below link to complete this task Link: https://community.hortonworks.com/questions/74875/how-to-create-hive-table-entity-in-apache-atlas-us.html#comment-75132 I am able to set lineage between table1 and table2 successfully but now my requirement like, Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this what changes I need to make in json file which we are using to create hive_process? which one is that property, you have set in json file because of it we can link table1 and table2?
... View more
Labels:
- Labels:
-
Apache Atlas
12-29-2016
11:36 AM
Hi Ayub, I am able to set lineage between table1 and table2 successfully but now my requirement like, Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this what changes I need to make in json file which we are using to create hive_process? which one is that property, you have set in json file because of it we can link table1 and table2?
... View more
12-29-2016
11:16 AM
Hi Ayub, As we have created two dataset entities and set the lineage between them also. Consider I have already created hive table(i.e .patient_raw_info) and it's metadata is also present in atlas and now I want to create lineage between already exist dataset(i.e. patient_raw_info) and the one which I will going to create by using your REST API (i.e. patient_validated_dataset) so my question is How can I create hive_process between already exist dataset and the other one? what changes I need to make in json file which we are using to create hive_process (i.e. lineage) ? I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage? How can I link them from, patient_raw_info--->patient_validated_dataset
... View more