Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1216 | 09-04-2020 12:33 AM | |
7747 | 08-25-2020 12:39 AM | |
2417 | 08-24-2020 02:40 AM | |
2155 | 08-21-2020 01:06 AM | |
1151 | 08-20-2020 02:46 AM |
08-25-2020
04:01 AM
Can you provide more details on how you are trying to view the Avro file? And also it would be good to share the stack trace. i am assuming you are using Hive since you said that it is a column. Hive is schema on read so it will only evaluate the data when its read and not when its written in HDFS. Also good to check the timezone for timestamp format. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:58 AM
Please double check the command : cd sandbox.repo /tmp. This will not work so if you are getting no such directory that is correct. Please point to the link you referred to resolve the problem. The error here states that it cannot connect to the repository. If you are using the public repository then make sure you can connect to the link and no firewall is blocking your connection. If you are using the local repo, please check the firewall and if the user you are using to connect to the repository has sufficient privileges. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:54 AM
Hi....you need to check if the service has started correctly and it is running on port 50070. If its fine, then check if there is a firewall (i see its a sandbox so external firewall may not be a cause but check if machines firewall is stopped and disabled). Also good to check SELinux is disabled. If none of this work, try to restart the host machine on which the sandbox is running. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:50 AM
Hi...extremely sorry for the comment before. i thought it should work without trying it out first. I looked at it again and it seems its not possible to do it without a script.
... View more
08-25-2020
03:46 AM
Hi....you should be able to see the avro in Nifi. When you open the file, there is an option on the top left "View as". It is by default Original. Change it to formatted and you should be able to see the avro correctly. Hope this helps.
... View more
08-25-2020
12:39 AM
Where are you viewing the data ? Is it in Nifi? Avro is in binary format so its not readable in any editor. If you are viewing in Nifi, then you should check if the you embedded the schema or not. If the schema is embedded then something in the flow is wrong. If not, i suppose it can look the way it shows you. A quick test could be to dump it in HDFS and create an external hive table to check if the data looks correct. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-24-2020
02:42 AM
I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?
... View more
08-24-2020
02:40 AM
If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.
... View more
08-21-2020
01:20 AM
HI....You can use datediff function like this : WHERE datediff(clinic_discharge_dt,clinic_admit_dt) <= 30 Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
01:06 AM
The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more