Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1216 | 09-04-2020 12:33 AM | |
7760 | 08-25-2020 12:39 AM | |
2425 | 08-24-2020 02:40 AM | |
2159 | 08-21-2020 01:06 AM | |
1151 | 08-20-2020 02:46 AM |
08-24-2020
02:42 AM
I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?
... View more
08-24-2020
02:40 AM
If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.
... View more
08-21-2020
01:20 AM
HI....You can use datediff function like this : WHERE datediff(clinic_discharge_dt,clinic_admit_dt) <= 30 Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
01:06 AM
The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
12:56 AM
If you are using Kerberos for authentication, when a job is submitted, the user permissions are evaluated first by Ranger and once the authorization is successful, only then the Kerberos ticket is delegated to hive user and the hive user starts the execution. So, as long as the user who is submitting the job has a policy in Ranger, it should work as expected. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:46 AM
You need to provide more information here...Is Updatedate processor an Updateattribute processor? What does invokescriptor processor do? Are you storing an attribute called state or are you using the processor's state? Assuming its just a variable in flowfile, your attempt 2 should work. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:22 AM
Assuming that you will create table only once, it is very difficult since you need to parse the columns and data types. The amount of effort is not worth for a one time thing. So, i would suggest you to create the table manually. If you are going to create a table dynamically everytime, i would be interested in how you did this or if you have made some progress i would be happy to help. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-10-2020
01:02 AM
Can you provide errors from hdfs log file ? It seems to me that it is likely to be a handshake problem but logs can give more insight. You should double check the keystore and truststore (i am not sure of Cloudera Manager) to understand they are setup correctly.
... View more
08-07-2020
09:31 PM
Thanks for your point and if you got time, please read the solution that i found out somewhere in the internet.
... View more
08-06-2020
07:49 AM
2 Kudos
I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter. Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_managing_host_configuration_groups.html
... View more
- « Previous
- Next »