Member since
02-19-2018
29
Posts
0
Kudos Received
0
Solutions
11-28-2018
02:48 AM
Tim Armstrong wrote: Also, a more general tip is that you can set a default value for *any* query option via the dynamic resource pool interface. That is really helpful. Thanks!
... View more
11-28-2018
12:43 AM
Sorry Tim. Setting max limits in resource pools is not an option for us. They are based upon the estimated memory consumption and the estimates are sometimes wildly innacurate. This has resulted in valid production queries being blocked from running.
... View more
11-27-2018
07:58 AM
EricL: Can we do the equivalent of "SET MEM_LIMIT=100g;" in cluster wide config? ie can we enforce this so that no Impala query will suck up all the memory on the Impala service?
... View more
11-27-2018
07:49 AM
Did this get sorted? I am still having to discuss with colleagues as to when you have to manually "invalidate metadata <tablename>" after a BDR run
... View more
11-23-2018
09:05 AM
We currenly use SSSD on all of our boxes to provide groups information about users on our secure cluster. It fetches users' groups from an Active Directory/LDAP server which is far away and so slow. I can cache the results for some time but that still results in an initial slow request the first time, and we of course have lots of machines. SSSD doesn't share its cache with other SSSD systems. I have heard that it is possible for Sentry to connect to AD and read all the user groups in a LDAP domain. I could then use this information in authorization requests instead of SSSD. However I can't find any documentation for this. Is this a valid deployment option? What do I need to read to get Sentry to pre-load an AD domain. Does it filter on users who are in a specific group or does it fetch everything? Thanks! Alex
... View more
Labels:
08-29-2018
02:40 AM
Thanks for the information!
... View more
08-08-2018
08:36 AM
But with Informatica BDM you also have Blaze and Spark as alternative engines to Hive. The original post was quite vague and may not have been BDM related at all.
... View more
08-08-2018
08:27 AM
" UTC timezone conversion issue going on with only Parquet backed tables." > how do Cloudera Customers deal with this issue? I fear the first solution is to have all your servers use the same (UTC) timezone. We also have this flag setconvert_legacy_hive_parquet_utc_timestamps=true and hope to get rid of it once we move everything to UTC.
... View more
08-01-2018
02:11 AM
Hello bgooley, Thanks for your suggestion. I am re-reading the docs and I still think it tells me to add a CM peer any time I want to do a BDR replication, but I can accept that maybe my reading of the docs is wrong. I have been re-trying my tests without a peered CM but was not able to improve the situation. In the meantime we have taken a different track and started to use a new cluster as the target with a new Cloudera Manager and BDR seems to be working for that.
... View more
08-01-2018
02:06 AM
Hello Jim, That seems to have been the problem. Although the krb5.conf files were effectively identical the two Cloudera Managers had been configured by specifying the KDC by name and by IP address. We now have BDR working between two different Cloudera Managers, but not between two clusters with the same Cloudera Manager.
... View more
07-24-2018
04:54 AM
Are you using The BDR tool available with a cloudera Enterprise license? If that is the case then you probably should be using that tool and two separate Hive Metastores. No DB copying required. (I am trying this now - I am no expert yet 🙂 If you don't then you might consider having a shared metastore. Does that work for you? But finally if all you are doing is creating a Disaster Recovery type backup then I would assume you need all the tables in the Hive Metastore. But that is a guess.
... View more
07-24-2018
04:10 AM
has anyone seen this error when trying to set up Cloudera Manager backup BDR peering between two Kerberized clusters? (I think they are both 5.13.3)
Source and target realms are same but have different KDC. Please ensure those KDCs are on unified realm.
As far as I can see I am using the same KDC on both clusters. Can anyone suggest how I can check? I have looked in the /etc/krb5.conf They seem the same. I have done an nslookup on what it says is the KDC server - it is the same.
Any more tips for things to look for?
... View more
Labels:
07-24-2018
04:06 AM
Did you get this sorted? I thought the idea was that they were supposed to be in the same realm. Personally I am being told by the Cloudera Backup BDR tool that I have two different KDCs when I don't, AFAIK. You do!
... View more
07-23-2018
03:18 AM
> you are replicating from one Hive Service to another on the same cluster. No,, not the same cluster. Two different (but similarly named) clusters, one Cloudera Manager.
... View more
07-20-2018
03:14 AM
I can see records in the COMMANDS table with NAME HiveReplicationCommand appearing with STATE STARTED and then immediately STATE changes to FINISHED, but I cannot see why hive Replication sees this as not itself. eg COMMAND_ID bigint(20) NO PRI NULL NAME varchar(255) NO NULL STATE varchar(255) YES MUL NULL START_INSTANT bigint(20) YES MUL NULL END_INSTANT bigint(20) YES NULL ACTIVE int(11) YES MUL NULL RESULT_MESSAGE longtext YES NULL RESULT_DATA mediumblob YES NULL RESULT_DATA_MIME_TYPE varchar(255) YES NULL RESULT_DATA_FILENAME varchar(255) YES NULL SUCCESS bit(1) YES NULL SERVICE_ID bigint(20) YES MUL NULL ROLE_ID bigint(20) YES MUL NULL PARENT_ID bigint(20) YES MUL NULL HOST_ID bigint(20) YES MUL NULL RESULT_DATA_PATH varchar(255) YES NULL RESULT_DATA_REAPED bit(1) YES b'0' CLUSTER_ID bigint(20) YES MUL NULL OPTIMISTIC_LOCK_VERSION bigint(20) NO 0 SCHEDULE_ID bigint(20) YES MUL NULL ARGUMENTS longtext YES NULL AUDITED bit(1) NO b'0' FIRST_UPDATED_INSTANT bigint(20) YES NULL CREATION_INSTANT bigint(20) YES NULL 290498 HiveReplicationCommand STARTED 1532081148707 NULL 1 NULL NULL application/json summary.json \0 96 NULL NULL NULL /var/lib/cloudera-scm-server/commands/290498/summary9115221277369688254.json \0 NULL 9 1291 {"@class":"com.cloudera.cmf.service.hive.HiveReplicationCmdArgs","replicateData":true,"hdfsArguments":{"@class":"com.cloudera.cmf.service.hdfs.DistCpCommand$DistCpCommandArgs","alertConfig":null,"args":[],"atomic":false,"bandwidth":100,"copyListingOnSource":null,"delete":false,"destinationPath":"/bdr_test/hive_warehouse","diffRenameDeletePath":null,"dryRun":false,"exclusionFilters":[],"ignoreFailures":true,"ignoreSnapshotDiff":true,"log":null,"mapreduceServiceName":"CD-YARN-FKZFETrq","mrSchedulerPoolNameProperty":null,"numConcurrentMaps":20,"overwrite":false,"poolName":null,"preserve":"rbugpa","proxyUser":"tsk-xeu-cdl-smoke","rebase":false,"replaceNameservice":null,"scheduleId":1291,"scheduleName":"hive_test01","scheduledTime":null,"sequenceFilePath":null,"skipCrcCheck":true,"skipTrash":false,"snapshotPrefix":null,"sourceCluster":"EMILY_DEV_SQL","sourcePaths":null,"sourcePeer":"EMILY_DEV_SQL_SRC","sourceProxyUser":"tsk-xeu-cdl-smoke","sourceService":"CD-HDFS-YcXwCCHo","strategy":"DYNAMIC","summaryFile":null,"targetRoleIds":[],"update":true,"useSnapshots":null,"useSnapshotsDiff":null,"useWebHdfsForSource":null},"alertConfig":{"alertOnAbort":false,"alertOnFail":false,"alertOnStart":false,"alertOnSuccess":false},"allowColumnStats":true,"allowHiveFunctions":true,"args":[],"dryRun":false,"exportDir":null,"exportFile":"/user/hdfs/.cm/hive-staging/2018-07-20-10-05-48-290498/export.json","exportToHdfs":true,"lastSuccessfulEventId":null,"localExportOnly":false,"mappings":{},"overwrite":false,"replicateImpalaMetadata":false,"replicateImpalaMetadataUserOption":false,"runInvalidateMetadata":true,"scheduleId":1291,"scheduleName":"hive_test01","scheduledTime":null,"skipExportToTarget":true,"sourceCluster":"EMILY_DEV_SQL","sourcePeer":"EMILY_DEV_SQL_SRC","sourceService":"CD-HIVE-SqlQgVBT","tables":{"dbb_dst_scratch":[".*"]},"targetClientConfig":null,"targetRoleIds":[],"update":[]} 1532081149130 1532081148707
... View more
07-20-2018
02:10 AM
> If this does not help, then let us know... cloudera Manager has been restarted without success. > it is possible to mimic the database query that is responsible for detecting active commands. That would be excellent if you could give me that query then I could see what is causing it to report incorrectly. Thanks!
... View more
07-20-2018
01:50 AM
> we need to execute INVALIDATE METADATA; Oh - I thought that was done for you if you select the " Invalidate Impala Metadata on Destination" option. Are you invalidating all metadata across the board or just the tables that you know you update? In any case *something* has to reload/recalculate that metadata. Ifyou don't want it to be the first real user then I would try running some sort of query on the table yourself. That way you get the hit of the initial latency issue. I hope that helps. I am currently learning about how to use BDR so I am no expert.
... View more
07-19-2018
01:21 AM
> " The remote command failed with error message " indicates that the Hive Export command failed on the source Cloudera Manager server. yes. OP said: I happen to be copying betwee two clusters within the same Cloudera Manager I can clearly see that there are no running Hive Replications. I am the only person who has tried BDR in the entire company. The source Cloudera Manager is the same as the target Cloudera Manager. Only the cluster is different. There are no working/running Hive Replications.
... View more
07-18-2018
02:27 AM
I am trying to get Hive replication working and am not yet fully sure I understand the options. I can see that if you specify an individual database then you can leave the table specification blank - or specify a regular expression for the tables. I found that "*" was not an acceptable regular expression, so I wonder what rules they are using for that. However I really need wild cards to specify databases as well as tables. Is this possible? For instance imagine that I have 100 databases called area1_something_db and another 100 each called area2_something_db area3_something_db area4_something_db area5_something_db My choices right now are to replicate all of them all at once, or replicate them one database at a time. This is a nightmare due to the large number of databases. Ideally I want a replication job which does one specific area which I can schedule according to some business decision. Am I right in thinking that I cannot have multiple Hive replications going on at the same time even if they are totally different databases?
... View more
Labels:
07-18-2018
01:57 AM
Is there a good place to discuss Disaster Recovery, or do people just go to the relevant forum for each component and ask there?
I am currently trying out BDR and have it working for HDFS (but not Hive).
What seems to be missing is "best-practice" advice. For instance do people set up a BDR replication for the whole of the "/user" directory. If they have users on the target/backup/dr cluster are they stored in a different directory eg "/non-prod/user"
If the two clusters work in an active/active fashion then how do you move job/co-ordinator between Oozie instances?
Do people do DR of Kudu? So far the only options I can see are
a) dual ingest so that we update the primary and DR versions of Kudu at the same time
b) periodically dump all the data stored in Kudu into parquet and then load the parquet into the DR Kudu.
etc etc etc
Thanks
... View more
Labels:
07-18-2018
01:29 AM
I am testing BDR functionality and have not managed to create a working Hive replication job yet. Currently when running it I am getting Message: The remote command failed with error message: Another Hive replication command is already running for Database: MY_TABLE_NAME Table: . on service HIVE-2. I previously had Hive replication failing immediately because I had not specified the port (443) for the CM peering. What is causing this to fail immediately? I cannot see any logs apart from the above error message. I happen to be copying betwee two clusters within the same Cloudera Manager - but wont always be. Any ideas? thanks
... View more
Labels:
04-25-2018
02:46 AM
Hello, Sorry to revive an old thread but I would like to know if it is still true. I too am hit by this problem and, as described above, we have removed the S3 file browser for everyone. However I am thinking of upgrading my version of Hue as part of a move to a more recent CDH. Is this issue fixed in any more advanced versions of Hue? Do they talk to Hadoop for access permissions - and thus Sentry? Thanks
... View more
04-11-2018
04:58 AM
Let's go to the official docs https://www.cloudera.com/documentation/enterprise/latest/topics/install_upgrade_to_cdh5x_parcels.html " The minor version of Cloudera Manager you use to perform the upgrade must be equal to or greater than the CDH minor version. To upgrade Cloudera Manager, see Overview of Upgrading Cloudera Manager ." That final link is https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ag_upgrading_cm.html#concept_xkm_f4q_tw Goodluck! I hope that is your problem. I see from the community that one or two other people have had problems with finding the CDH 5.13 parcels but this is definitely something to try first. I have to say that this upgrading of manager separately is confusing - but makes sense from a practical point of view.
... View more
04-11-2018
03:09 AM
I am sorry that this is not more helpful but you may want to know that there is a 5.13.3 which has a few more bug fixes than 5.13.2 https://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-13-3-Released/m-p/66079 I think that you need to upgrade your Cloudera Manager first before you upgrade your cluster. I do not believe that works with parcels. Have you done upgraded Cloudera Manager? I don't beleive Cloudera Manager will let you install CDH versions newer than itself. personally on a CDH 5.10.1 cluster Cloudera Manager I am seeing CDH 5 5.13.3-1.cdh5.13.3.p0.2 Available Remotely [Download] Error for parcel CDH-5.13.3-1.cdh5.13.3.p0.2-el7.parcel : Parcel version 5.13.3-1.cdh5.13.3.p0.2 is not supported by this Cloudera Manager. Upgrade Cloudera Manager to at least 5.13.0 before using this parcel version. Is that what you are seeing? PS Saying "It is urgent" to the community just winds people up.
... View more
04-11-2018
02:58 AM
Am I right in reading the 5.13.3 and 5.14.2 tarball components? They both use Hive 1.1.0 and not Hive 1.2.0 I have a very similar problem - I want to switch off hive using the trash for certain tables
... View more
04-09-2018
01:24 AM
Thanks for this info Bill. I was wondering about it and was about to post a similar question. Does Cloudera have any guidelines as to using multiple AZ's (in one region)? My current thinking is 1) Two (or more) separate clusters - one in each AZ 2) A single Cloudera Manager in one AZ which controls all. 3) A single Cloudera Director in one AZ which creates all instances 4) A single Cloudera Navigator in one AZ which monitors/audits all. If we can turn some or all of these (2-4) into multiple AZ HA setup that would be really cool. I am a bit concerned about the cross region network traffic - but perhaps I should save that for my own thread.
... View more
03-29-2018
03:34 AM
Hello MSharma. I hope you have solved your problem by now but here are some thoughts. Is your data entirely hBase? I think that makes things more difficult and outside of my expertise.I think you need to look into procedures for backing up an hBase database. It is almost irrelevant that you are using S3. The problem you face would be the same no matter what the backup medium is. Normally - for most files and Hive tables I would lift and shift: Read from HDFS and copy to S3. If you have "at rest" encryption then I would expect that the reading process would decrypt the encrypted HDFS blocks - and you could use server side encryption on the S3 bucket instead. (Test this out first so you are comfortable with it before doing so). You would keep the data files - but lose any HDFS block information. Restoring those files would mean writing them into your cluster again as if they were brand new. If this data is not being updated though you might consider keeping it in S3 and reading it with fs.s3a I hope that helps but I am sorry I don't know how to backup hBase.
... View more
03-29-2018
03:16 AM
Sorry to drag up an old topic but I am hoping there is now some official guidance from Cloudera - or perhaps a reference architecture. As soon as an organisation builds a Cloudera EDH cluster in one Amazon AWS region (or Azure or Google Cloud, or whatever) then they soon realise they might want another cluster in a different region or a different Availability Zone. Personally I am happy sticking with one Region but multiple AZs - so I am thinking that I want one Cloudera Director, one Cloudera Manager, and separate clusters in each AZ.
... View more
02-19-2018
09:33 AM
I am wondering about running CDH clusters in an Amazon VPC. If I need to make my nodes larger I might move up an instance type. This doubles the number of CPUs and doubles the amount of memory on the box. Is there a simple way I can change my YARN and Impala config to make proper use of this extra memory and compute? Is the correct thing to go through the process as if I was setting up a brand new cluster? I can't find the up to date documentation for this. I am happy if you tell me to RTFM if you can also point me to what I need to read. In case it matters my clusters make heavy use of Impala. Thanks
... View more
Labels: