Member since
11-03-2015
32
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
12721 | 11-30-2015 03:20 AM |
07-18-2018
02:56 AM
Your point is flawless, I think the issue here (at least at my side) is that the workbench (which I tested in a bootcamp run by Cloudera an year ago) is pretty good, but isn't cheap also. For labs, developments and all that stuff it is not affordable for a small Company. In my case, my Company (consultancy) need to be able to develop a new product or service that makes use of ML techniques and would be best developed in a "shared notebook" fashion. The result would be probably sell to the customer together with the workbench, but of course we need to develop it first, with no guarantee of success. Although we are Cloudera resellers, there's no guarantee the Customer also wants to buy the CDSW license (maybe a "developer license" would cover this gap). That's why we need to switch to inexpensive software like Zeppelin and Livy to get the job done, at least in alpha stage. This is my point of view. Take care, O.
... View more
07-18-2018
01:26 AM
Ok I understand your point but what if mappers are failing ? Yarn already sets up as many mappers as files number, should I increase this more ? Since only a minority of my jobs are failing, how can I tune yarn to use more mappers for these particular jobs?
... View more
05-02-2018
03:39 AM
Hi, getting back on this old topic to have more answers in this subject. I have errors with mappers and reducers falling short on memory. Of course increasing the memory fix the issue, but as already menthioned I am wasting memory for jobs that doesn't need it. Plus, I was thinking that this stuff was made to scale, so it would handle a particularly great job just buy splitting it. In other words, I don't want to change memory values every time a new application fails due to memory limits. What is the best practice in this case? Thanks O.
... View more
07-21-2017
07:37 AM
Please don't consider my previous message. While the aforementhioned error IS showing, it is not a blocking issue. I had issues with firewall. Bye
... View more
07-21-2017
02:38 AM
Hey, just to point out that this issue arise also when following path B. Steps to reproduce (Centos 7.3, Manager version 5.12.0-1): 1. Manual install JDK on nodes 2. Grab cloudera-manager.repo file 3. Install via yum yum install cloudera-manager-daemons cloudera-manager-server 4. Change db.properties to fit external mySQL databases 5. Start Cloudera manager systemctl start cloudera-scm-server. And then it hangs with this error: ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: Failed to download manifest. Status code: 404 URI: https://www.cloudera.com/downloads/manifest.json So it is automatically pointing to this location, I haven't changed the parcels location yet. AFAIK you must change this default location, otherwise it will not work. Question is: how do I do that ???
... View more
01-26-2017
06:38 AM
Thanks, I have the same problem. Could anyone explain if it is correct to manually install new packages on every node ? Is this a functionality you aquire with commercial Anaconda only? Thanks
... View more
05-10-2016
01:22 AM
Hello, I have created a dashboard on Hue with twitter-demo collection on Cloudera Search. I am experimenting to see if I can segregate access to collections as per user name. I am able to create dashboards and in fact I see that Hue proxies the user to Solr, but on Hue I can access all the dashboard I create. Is it possible to limit access to users, provided their username or access level ? I want to find out if Hue+Search can be used for self-service BI, but I need to be able to differentiate access level. Thanks, bye Omar
... View more
Labels:
- Labels:
-
Apache Solr
-
Cloudera Hue
-
Cloudera Search
11-30-2015
03:20 AM
I resolved the problem on my own, I just want to point out that this strange behaviour was due to some incorrectness on data. At some point in time, partitioned data went from "table_folder/one_partition/another_partition" to "table_foldere/another_partition/one_partition" This caused the msck repair command to fail, only aligning metastore data to the latter partition type. At the moment I don't know what caused the inversion, I asked the dev team and they also don't know. By the way, fixing this problem (by recreating the table with the partition order in the correct way) let msck repair to work correctly. Bye Omar
... View more
11-05-2015
08:12 AM
Hello, My client is asking me a way to backup hive tables on tape. I know, this is not "big-data style". This is mandatory for them so I need to accomodate. I found out a way to do this, but the procedure implies, when restoring, this procedure: - create the table using the DDL previously backed up via "show create table" statement; - mv the files to the warehouse dir/db/table just created; - run msck repair table on that table. The command works without error, however I found out that the original table has got about 111 million records, and the target only has got 37 millions. I compared the hdfs size of the folder and they are the same. I compared the number of partitions of the table and they are the same. I tried to run msck repair once again (just in case), but the result doesn't change. So I think the problem must be in the msck command: files are in place, but somehow it skips some in fixing. What do you think ? Bye Omar
... View more
Labels:
- Labels:
-
Apache Hive
-
HDFS