Member since
05-18-2016
71
Posts
39
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1211 | 12-16-2016 06:12 PM | |
456 | 11-02-2016 05:35 PM | |
2968 | 10-06-2016 04:32 PM | |
737 | 10-06-2016 04:21 PM | |
851 | 09-12-2016 05:16 PM |
07-13-2020
12:54 AM
1 Kudo
Hello @VidyaSargur, thanks for your answer. You are totally right. I only realized that this is an older thread after I had already posted. Therefore I already created a new thread (https://community.cloudera.com/t5/Support-Questions/Permanently-store-sqoop-map-column-hive-mapping-for-DB2/td-p/299556). regards
... View more
01-23-2020
11:46 AM
Okay so I wrote an example nifi process to do it https://www.datainmotion.dev/2020/01/flank-stack-nifi-processor-for-kafka.html
... View more
10-21-2019
03:25 PM
I could do the Map process on OrcFile, but Reduce fails with ‘.Can’t input data OCR[]’ error. Do you have some official documentation that confirm that OCR file does not work with incremental lastmodified import?
... View more
04-25-2019
12:41 PM
I have used localhost as address assuming you are use a standalone conf, else you need to find out the name of the namenode. Also it might just need hdfs instead of webhdfs as prefix of the address. law dissertation writing service
... View more
10-27-2017
03:37 PM
1 Kudo
Matt, Thanks for helping me with this. The real problem was with InferAvroSchema processor as it uses Kite to determine the data type of the record. if you have nulls or zeros as a record value, this inferAvroSchema is not consistent, and during a merge if a bin consists of some data of double or float data type, and some zeros, ConvertJSONtoAVRO fails as the schema inferred in incorrect. It would be wise to configure the schema manually in the ConvertJSONtoAVRO schema instead of using InferAvroSchema, if that makes sense..
... View more
02-03-2017
07:19 PM
1 Kudo
@Aditya update yarn-site.xml. Find out the parameters based on the documentation and update them to increase your resource and restart yarn and its affected components.
... View more
12-19-2016
01:25 PM
Back at it this morning and, while I don't quite get what's happening, I consider this resolved. This morning I did the following: removed a prior install of Apache Zeppelin after I realized that after a reboot, it still responded to localhost:8080 confirmed it was indeed gone started up Virtual box and started the sandbox then zeppelin still responded to localhost:8080, which really confused me then tried localhost:9995, to which a different zeppelin page responded - so that was a good thing then, remembering something from a previous experience, I tried 127.0.0.1:8080 and then Ambari responded with its login page This is now the second time I have seen localhost and 127.0.0.1 be treated differently; one of these days I'll have to figure out why. But for now, I'm back in business and continuing the tutorial. Thanks everyone for their help! Cecil
... View more
12-16-2016
03:41 AM
Thanks for the workaround. This has solved my group-access problem.
... View more
10-26-2016
01:39 AM
Did you install the vagrant plugin "vagrant-hostmanager"? It is listed a requirement at the top of the tutorial.
... View more
10-16-2018
02:04 PM
Hi! I tried to use this setup for MariaDB - without success... i.e. my trial already failed at the CDC-Processor (with a dockerized NiFi and the org.mariadb.jdbc.Driver). Is MariaDB known to not work? PS: without the Distributed-Map-Cache-Client - it works (of course I don't get the table and column names - which I guess would be "more than just nice"... with the DMCC I get a JDBC error "creating binlog enrichment"
... View more
08-11-2016
03:38 PM
1 Kudo
This is a great article for anyone looking to ingest data quickly and store in compressed formats. This will work very well For POC, testing and sandbox type of activities. I used something like this and made it production grade at a client by automating some of the jobs using oozie. Once the data was loaded we also had verification scripts that would audit what came in and what got dropped.. Also we had clean up scripts that would remove all the raw data from HDFS, once the data was set in Hive in ORC format that was compressed and partitioned. With the advent of Nifi and Spark, its worth looking at building an Nifi processor in conjuction with spark jobs to load the data seamlessly into Hive/Hbase in compressed formats as its being loaded.
... View more
03-24-2017
03:26 PM
@Eric Hanson I don't have an official opinion on this. It really depends on the available resources. If the cluster is really large, then it may be beneficial to put the KDC on its own VM; but for a small cluster (<15 hosts), that may be a bit overkill and the least utilized host for the KDC maybe sufficient. That said, the workload could be spread out by placing a one or more slave KDCs around the cluster, There is also the option to separate the kadmin and krb5kdc processes to different hosts - though this is more for security concerns than for performance or resource concerns. One thing to keep in mind. For Ambari server versions 2.5.0 and below, it appears that the cluster does an abnormal amount of kinit's. This is currently being looked into. So far, it is unclear whether this is a bug, expected behavior, or something in between. The effect of this issue on a small cluster is minimal and not noticeable over a short period of time. On a large cluster (say 900 nodes), the Kerberos log files tend to get large quickly. Performance of the KDC on such a cluster, even when the KDC exists on a host with Hadoop services, does not appear to be affected. The main issue is merely log file size. However, if an issue is found and fixed, less kinit's couldn't hurt. 🙂
... View more
12-12-2018
04:29 PM
Its good approach but the only point which I could find as disadvantage is multiple hops to achieve the desired result. Instead of performing joins we can apply windowing function to achieve the same in a single hop assuming you unique value column and last modified date in your scenario.
... View more