About omaritec

omaritec · ‎07-18-2018

Your point is flawless, I think the issue here (at least at my side) is that the workbench (which I tested in a bootcamp run by Cloudera an year ago) is pretty good, but isn't cheap also. For labs, developments and all that stuff it is not affordable for a small Company. In my case, my Company (consultancy) need to be able to develop a new product or service that makes use of ML techniques and would be best developed in a "shared notebook" fashion. The result would be probably sell to the customer together with the workbench, but of course we need to develop it first, with no guarantee of success. Although we are Cloudera resellers, there's no guarantee the Customer also wants to buy the CDSW license (maybe a "developer license" would cover this gap). That's why we need to switch to inexpensive software like Zeppelin and Livy to get the job done, at least in alpha stage. This is my point of view. Take care, O.

omaritec · ‎07-18-2018

Ok I understand your point but what if mappers are failing ? Yarn already sets up as many mappers as files number, should I increase this more ? Since only a minority of my jobs are failing, how can I tune yarn to use more mappers for these particular jobs?

omaritec · ‎05-02-2018

Hi, getting back on this old topic to have more answers in this subject. I have errors with mappers and reducers falling short on memory. Of course increasing the memory fix the issue, but as already menthioned I am wasting memory for jobs that doesn't need it. Plus, I was thinking that this stuff was made to scale, so it would handle a particularly great job just buy splitting it. In other words, I don't want to change memory values every time a new application fails due to memory limits. What is the best practice in this case? Thanks O.

omaritec · ‎07-21-2017

Please don't consider my previous message. While the aforementhioned error IS showing, it is not a blocking issue. I had issues with firewall. Bye

omaritec · ‎07-21-2017

Hey, just to point out that this issue arise also when following path B. Steps to reproduce (Centos 7.3, Manager version 5.12.0-1): 1. Manual install JDK on nodes 2. Grab cloudera-manager.repo file 3. Install via yum yum install cloudera-manager-daemons cloudera-manager-server 4. Change db.properties to fit external mySQL databases 5. Start Cloudera manager systemctl start cloudera-scm-server. And then it hangs with this error: ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: Failed to download manifest. Status code: 404 URI: https://www.cloudera.com/downloads/manifest.json So it is automatically pointing to this location, I haven't changed the parcels location yet. AFAIK you must change this default location, otherwise it will not work. Question is: how do I do that ???

omaritec · ‎01-26-2017

Thanks, I have the same problem. Could anyone explain if it is correct to manually install new packages on every node ? Is this a functionality you aquire with commercial Anaconda only? Thanks

omaritec · ‎05-30-2016

Thank you Romain, didn't notice it. Bye

omaritec · ‎05-10-2016

Hello, I have created a dashboard on Hue with twitter-demo collection on Cloudera Search. I am experimenting to see if I can segregate access to collections as per user name. I am able to create dashboards and in fact I see that Hue proxies the user to Solr, but on Hue I can access all the dashboard I create. Is it possible to limit access to users, provided their username or access level ? I want to find out if Hue+Search can be used for self-service BI, but I need to be able to differentiate access level. Thanks, bye Omar

omaritec · ‎11-30-2015

I resolved the problem on my own, I just want to point out that this strange behaviour was due to some incorrectness on data. At some point in time, partitioned data went from "table_folder/one_partition/another_partition" to "table_foldere/another_partition/one_partition" This caused the msck repair command to fail, only aligning metastore data to the latter partition type. At the moment I don't know what caused the inversion, I asked the dev team and they also don't know. By the way, fixing this problem (by recreating the table with the partition order in the correct way) let msck repair to work correctly. Bye Omar

omaritec · ‎11-05-2015

Hello, My client is asking me a way to backup hive tables on tape. I know, this is not "big-data style". This is mandatory for them so I need to accomodate. I found out a way to do this, but the procedure implies, when restoring, this procedure: - create the table using the DDL previously backed up via "show create table" statement; - mv the files to the warehouse dir/db/table just created; - run msck repair table on that table. The command works without error, however I found out that the original table has got about 111 million records, and the target only has got 37 millions. I compared the hdfs size of the folder and they are the same. I compared the number of partitions of the table and they are the same. I tried to run msck repair once again (just in case), but the result doesn't change. So I think the problem must be in the msck command: files are in place, but somehow it skips some in fixing. What do you think ? Bye Omar

Member Since	‎11-03-2015 05:59 AM
Last Visited
Posts	32

Cloudera Community

Re: msck repair table bad behaviour

Re: Cloudera and Notebooks (Zeppelin/HUE/Jupyter)

Re: Map and Reduce Error: Java heap space

Re: Map and Reduce Error: Java heap space

Re: manifest.json failed to download

Re: manifest.json failed to download

Re: Anaconda parcels - installing new package, usi...

Re: [Hue] Solr dashboard with autentication

[Hue] Solr dashboard with autentication

Re: msck repair table bad behaviour

msck repair table bad behaviour