Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1943 | 09-04-2020 12:33 AM | |
| 10918 | 08-25-2020 12:39 AM | |
| 4083 | 08-24-2020 02:40 AM | |
| 2953 | 08-21-2020 01:06 AM | |
| 1805 | 08-20-2020 02:46 AM |
08-25-2020
03:46 AM
Hi....you should be able to see the avro in Nifi. When you open the file, there is an option on the top left "View as". It is by default Original. Change it to formatted and you should be able to see the avro correctly. Hope this helps.
... View more
08-25-2020
12:39 AM
Where are you viewing the data ? Is it in Nifi? Avro is in binary format so its not readable in any editor. If you are viewing in Nifi, then you should check if the you embedded the schema or not. If the schema is embedded then something in the flow is wrong. If not, i suppose it can look the way it shows you. A quick test could be to dump it in HDFS and create an external hive table to check if the data looks correct. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-24-2020
02:40 AM
If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.
... View more
08-21-2020
01:06 AM
The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
12:37 AM
Partitioning and bucketing are forms to improve hive performance. Neither is mandate but is good to have. The partitioning and bucketing depend a lot on how the table looks like. If the table has millions / billions of row or the table is too wide with hundreds of columns, the query performance is impacted greatly. To answer your question, the only effect is see is performance degradation. But, it again if the table is small (my assumption ~10-15 mil) then one bucket or more than one bucket will not bring significant improvement. But, with million of rows, it always good to bucket, so the query is evaluated only on the rows within a 1/2 buckets and this results in increased performance. When the table has billions and are wide as well, ideally it is always bucketed and partitioned both interchangeably. There is no perfect solution, it always defers depending on the scenario. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:46 AM
You need to provide more information here...Is Updatedate processor an Updateattribute processor? What does invokescriptor processor do? Are you storing an attribute called state or are you using the processor's state? Assuming its just a variable in flowfile, your attempt 2 should work. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-07-2020
02:25 AM
can you try removing this : force_https_protocol=PROTOCOL_TLSv1_2. This was true for previous versions of Ambari but probably not for the new versions. Another try is to try without SSL so as to understand if the communication is broken or the SSL config is a problem. Hope this helps.
... View more
08-04-2020
02:20 AM
Hi @asra we have a workaround in place for now. I am in touch with our security team to understand the root cause but haven't figured it out yet. All i did was to deploy a machine with UI (windows or Ubuntu since centos / redhat are command line only) within the same network as your ambari server. This will bypass any proxy / firewall / group policy settings. If not please provide detailed info about your setup and i will try to point out thing you can try.
... View more
07-28-2020
11:39 PM
Ahh ok...didnt check the documentation my bad. But, the question still lies if it will ignore the directory on all nodes or only old nodes. I am interested how this turns out. Maybe you can do a quick trial? I dont have a dev environment to try at the moment.
... View more
07-24-2020
03:32 AM
Policy is synced to all the nodes. You can check that in Ranger->Audit->Plugins. If not, then you should check if you have access policy for node identities,
... View more
- « Previous
-
- 1
- 2
- Next »