About SagarKanani

SagarKanani · ‎08-25-2020

Can you provide more details on how you are trying to view the Avro file? And also it would be good to share the stack trace. i am assuming you are using Hive since you said that it is a column. Hive is schema on read so it will only evaluate the data when its read and not when its written in HDFS. Also good to check the timezone for timestamp format. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-25-2020

Please double check the command : cd sandbox.repo /tmp. This will not work so if you are getting no such directory that is correct. Please point to the link you referred to resolve the problem. The error here states that it cannot connect to the repository. If you are using the public repository then make sure you can connect to the link and no firewall is blocking your connection. If you are using the local repo, please check the firewall and if the user you are using to connect to the repository has sufficient privileges. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-25-2020

Hi....you need to check if the service has started correctly and it is running on port 50070. If its fine, then check if there is a firewall (i see its a sandbox so external firewall may not be a cause but check if machines firewall is stopped and disabled). Also good to check SELinux is disabled. If none of this work, try to restart the host machine on which the sandbox is running. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-25-2020

Hi...extremely sorry for the comment before. i thought it should work without trying it out first. I looked at it again and it seems its not possible to do it without a script.

SagarKanani · ‎08-25-2020

Hi....you should be able to see the avro in Nifi. When you open the file, there is an option on the top left "View as". It is by default Original. Change it to formatted and you should be able to see the avro correctly. Hope this helps.

SagarKanani · ‎08-25-2020

Where are you viewing the data ? Is it in Nifi? Avro is in binary format so its not readable in any editor. If you are viewing in Nifi, then you should check if the you embedded the schema or not. If the schema is embedded then something in the flow is wrong. If not, i suppose it can look the way it shows you. A quick test could be to dump it in HDFS and create an external hive table to check if the data looks correct. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-24-2020

I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?

SagarKanani · ‎08-24-2020

If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.

SagarKanani · ‎08-21-2020

HI....You can use datediff function like this : WHERE datediff(clinic_discharge_dt,clinic_admit_dt) <= 30 Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

Online	Offline
Last Visited	‎10-21-2024 07:58 AM

Member Since	‎07-13-2020 05:50 AM
Last Visited	‎10-21-2024 07:58 AM
Posts	58
Kudos received	2

Cloudera Community

Re: Different files with different columns to be l...

Re: Convert Json to Avro in Nifi

Re: bucketing table with just one bucket vs partio...

Re: External table not loading data after Alter

Re: How to access the updateattributes inside groo...

Re: Invalid timestamp error while loading a file t...

Re: HDP 2.5 Python Installation

Re: Connection failed to http://sandbox-hdp.horton...

Re: How to compare two Arrays in Nifi ?

Re: Convert Json to Avro in Nifi

Re: Convert Json to Avro in Nifi

Re: NIFI --How to get the Current Processor group ...

Re: bucketing table with just one bucket vs partio...

Re: Create new table with records between two date...

Re: External table not loading data after Alter