About SagarKanani

SagarKanani · ‎08-24-2020

I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?

SagarKanani · ‎08-24-2020

If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.

SagarKanani · ‎08-21-2020

HI....You can use datediff function like this : WHERE datediff(clinic_discharge_dt,clinic_admit_dt) <= 30 Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

If you are using Kerberos for authentication, when a job is submitted, the user permissions are evaluated first by Ranger and once the authorization is successful, only then the Kerberos ticket is delegated to hive user and the hive user starts the execution. So, as long as the user who is submitting the job has a policy in Ranger, it should work as expected. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-20-2020

You need to provide more information here...Is Updatedate processor an Updateattribute processor? What does invokescriptor processor do? Are you storing an attribute called state or are you using the processor's state? Assuming its just a variable in flowfile, your attempt 2 should work. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-20-2020

Assuming that you will create table only once, it is very difficult since you need to parse the columns and data types. The amount of effort is not worth for a one time thing. So, i would suggest you to create the table manually. If you are going to create a table dynamically everytime, i would be interested in how you did this or if you have made some progress i would be happy to help. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-10-2020

Can you provide errors from hdfs log file ? It seems to me that it is likely to be a handshake problem but logs can give more insight. You should double check the keystore and truststore (i am not sure of Cloudera Manager) to understand they are setup correctly.

jloormoreira · ‎08-07-2020

Thanks for your point and if you got time, please read the solution that i found out somewhere in the internet.

aen0180 · ‎08-06-2020

I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter. Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_managing_host_configuration_groups.html

Online	Offline
Last Visited	‎10-21-2024 07:58 AM

Member Since	‎07-13-2020 05:50 AM
Last Visited	‎10-21-2024 07:58 AM
Posts	58
Kudos received	2

Cloudera Community

Re: Different files with different columns to be l...

Re: Convert Json to Avro in Nifi

Re: bucketing table with just one bucket vs partio...

Re: External table not loading data after Alter

Re: How to access the updateattributes inside groo...

Re: NIFI --How to get the Current Processor group ...

Re: bucketing table with just one bucket vs partio...

Re: Create new table with records between two date...

Re: External table not loading data after Alter

Re: How to restrict yarn queue access when Hive Im...

Re: How to access the updateattributes inside groo...

Re: How to create table from executeQuery in NiFi

Re: name nodes not working after Ssl encryption

Re: Error in Step 3 Ambari: automatically register...

Re: Adding New Hosts with Extra Disks