Member since
07-31-2013
1924
Posts
459
Kudos Received
311
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
841 | 07-09-2019 12:53 AM | |
3111 | 06-23-2019 08:37 PM | |
4472 | 06-18-2019 11:28 PM | |
4220 | 05-23-2019 08:46 PM | |
1688 | 05-20-2019 01:14 AM |
03-12-2019
07:36 PM
As far as the Thrift Server role goes, it will likely resolve itself when you enable kerberos, as that will introduce an auth negotiation protocol layer which will reject badly formed requests automatically. That is assuming bad requests are what's causing the frequent OOMEs despite heap raises, and not actual usage of the thrift service. For the Failover Controllers, NameNodes and other roles, this theory does not apply directly. Those may be a result of some other ongoing or one-off issues - worth investigating separately (on a different thread if here).
... View more
03-11-2019
07:21 PM
1 Kudo
The concerning bit is this part of the message: > The reported blocks 1 needs additional 1393 blocks to reach the threshold 0.9990 of total blocks 1396. This indicates that while your DataNodes have come up and began reporting in they are not finding any of their locally stored block files to send in as part of the reports. The NameNode is waiting for enough (99.9%) data to be available to users before it opens itself for full access, but its stuck in a never-ending loop because no DNs are reporting availability of those blocks. The overall number of blocks seem low, is this a test/demo setup? If yes, was the block data from the DNs ever wiped out or removed away as part of the upgrade/install attempts? Or perhaps were all DNs replaced in the test with new ones at any point? If the data is not of concern at this stage (and ONLY if so) can force your NameNode out of safemode manually via 'hdfs dfsadmin -safemode leave' command (as 'hdfs' user or any granted HDFS superuser). If you'd like to perform some more investigation on the blocks disappearance, checkout the DataNode logs where these blocks have resided in past.
... View more
03-11-2019
07:12 PM
If your HBase Thrift Server is not running under a secured cluster, there's a good chance it is crashing out with spurious OutOfMemoryError aborts. Part of the problem ties with the Thrift RPC layer not checking incoming request packets for validity, which ends up allowing things such as HTTP requests or random protocol request scans from security scanner software (such as Qualys, etc.) through to the RPC layer. This in turn ends up being interpreted incorrectly as a large allocation request at times, causing an OutOfMemoryError in Java due to the size it thinks the RPC request is attempting to send based on its first few bytes. You can confirm if this is the case by checking the stdout of your failed former Thrift Server processes. If you cannot spot that in the UI, visit the host that runs it and there should be lower numbered directories for the THRIFTSERVER role type under /var/run/cloudera-scm-agent/process/ which should still have the past logs/stdout.log files within it. Within the log file you should see a message such as the below that can help confirm this theory: # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh" # Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"... One way to avoid this from recurring is to switch on framed transport mode. This may break some clients if you do have active users using the HBase Thrift Server. To enable it, turn on the flag under HBase - Configuration - "Enable HBase Thrift Server Framed Transport"
... View more
03-07-2019
09:08 PM
It appears that you're trying to use Sqoop's internal handling of DATE/TIMESTAMP data types, instead of using Strings which the Oracle connector converts them to. Have you tried the option specified at https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_java_sql_timestamp? -Doraoop.timestamp.string=false You shouldn't need to map the column types manually in this approach.
... View more
03-07-2019
08:42 PM
Yes the individual components (such as Apache NiFi) are free to use and provided under an open source license (under APLv2). Are you asking specifically about its deployment integration with Cloudera Manager (Express)?
... View more
03-07-2019
08:36 PM
Sqoop's import to Hive is an extension of its import to HDFS (i.e. the Hive part is done after its regular HDFS import work), so if your formats are already acceptable and do not need further transformation you can do it as part of the Sqoop step directly. Sqoop also supports (via its HCatalog options) inserting into partitioned tables via dynamic partitioning if you'll require that.
... View more
03-07-2019
06:31 PM
Two things here: First, yes, Sqoop will copy only what data comes through the connection and the query, and will not duplicate data as part of its import process. The divided tasks are each fully re-done with no partial results kept around if there is a failure/retry/speculative execution during its run. However, keep in mind that Hive has no such constraints in its own architecture (no concept of a primary key). So after your import, it is upto your use of the table and the updates you make to it to maintain that 'effect'. You can consider using Kudu + Impala instead of Hive if the notion of primary key(s) is important to your use, although Sqoop doesn't offer a way to directly import data to it (You'll need to insert out of the Hive table into the Kudu one via Impala, after the Sqoop import to Hive is done).
... View more
03-07-2019
06:23 PM
You'll need to use lsof with a pid specifier (lsof -p PID). The PID must be your target RegionServer's java process (find via 'ps aux | grep REGIONSERVER' or similar). In the output, you should be able to classify the items as network (sockets) / filesystem (files) / etc., and the interest would be in whatever holds the highest share. For ex. if you see a lot more sockets hanging around, check their state (CLOSE_WAIT, etc.). Or if it is local filesystem files, investigate if those files appear relevant. If you can pastebin your lsof result somewhere, I can take a look.
... View more
03-07-2019
01:07 AM
1 Kudo
The heart beat messages are just signifying that the action is waiting for something within to complete. In your log's case, Sqoop is awaiting completion of the job it was able to launch: job_1551703829290_0013. Please check the status and error/etc. of the job job_1551703829290_0013 to see why it may have taken very long. If this is a small cluster, there's a good chance also that your configured resources for NodeManager (Memory/CPU) is inadequate to run two or more parallel jobs (Oozie action is one job, but it submits another and waits for the submitted one to complete, so each action is roughly/mostly 2 concurrent job executions). This can be fixed by adding more NM hosts, raising the resources on existing NM hosts or configuring job resource demands to be lower than their current values.
... View more
03-07-2019
12:47 AM
Dynamic Allocation [1] controls the number of parallel running executor containers in YARN, not the number of CPU vcore resource allocated to a single executor container. [1] - https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
... View more
03-07-2019
12:00 AM
Is there a chance the data was on an ephemeral device that has since wiped out, or something external ran a deletion command over its contents? From the error messages and your description it appears as if all the metadata (and perhaps data) content has been wiped clean, but nothing within the HDFS software does this unless explicitly asked to (such as a NameNode format request). Perhaps begin with the logs and command histories to see if anything was accidentally invoked from an external user?
... View more
03-06-2019
11:56 PM
YARN in secure mode requires locally available user accounts to fully isolate the task containers: https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_other_hadoop_security.html#topic_18_3 You'll need to make these accounts visible to your Linux hosts via SSSD or similar software.
... View more
03-06-2019
11:54 PM
The Python and cURL should both be hitting the same end-points. Is the schedule ID tested ditto in all your 3 tests? Its also possible that the failure may be unrelated to the mode of invocation. Could you share more logs of the failed step here?
... View more
03-06-2019
11:52 PM
We do not appear to have a document on this, but speaking from experience almost all of the ports follow a request response style between clients and servers, with the former making the connections, and never the opposite way.
... View more
03-06-2019
11:42 PM
1 Kudo
MapReduce jobs can be submitted with ease, as all they mostly require is the correct config on the classpath (such as under src/main/resources for Maven projects). Spark/PySpark greatly relies on its script tooling to submit to a remote cluster so it is a little more involved to achieve this. IntelliJ IDEA has a remote execution option in its run targets that can be configured to copy over the build jar and invoke any arbitrary command on an edge host. This can be combined with remote debugging perhaps to get equal experience as MR. Another option is to use a web interface based editor such as CDSW.
... View more
03-06-2019
11:34 PM
1 Kudo
The central problem is this: > 14:51:17.697 [main] WARN org.apache.hadoop.hive.common.LogUtils - hive-site.xml not found on CLASSPATH For Sqoop to discover your Hive MetaStore service or DB, it needs to be supplied the appropriate configuration. Please try adding your client hive-site.xml to the Oozie workflow lib to allow Sqoop's Hive invocation to discover your existing metastore correctly.
... View more
03-06-2019
09:14 PM
Looks like you'd identified an environment issue over the duplicate thread at https://community.cloudera.com/t5/Cloudera-Manager-Installation/SSL-incorrect-Message-Authentication-Code-Error/td-p/86564
... View more
03-06-2019
09:10 PM
If the 'bad node' in question has no running agent and has no roles assigned to it, then this API call will help: https://archive.cloudera.com/cm6/6.1.0/generic/jar/cm_api/swagger-html-sdk-docs/python/docs/HostsResourceApi.html#delete_host Otherwise the process, via APIs, is this: - Decommission the host and wait for decommission to complete (alternatively, when applicable, just stop all roles on host directly) - Delete each of the stopped roles that exist on the host from CM API by listing all service roles and filtering by the host reference data in each role - Use direct/indirect SSH scripting to stop the CM agent process on the host (this is outside of CM API control) - Delete the host from CM API
... View more
03-06-2019
07:52 PM
Were stats available for your source table before you performed the transforming insert? It may be a good idea to run a "COMPUTE STATS default.csv_table;" so the memory calculations are precise. P.s. Within CM, you can reconfigure the memory limit via the Impala - Configuration - "Impala Daemon Memory Limit" field (search mem_limit).
... View more
03-06-2019
07:30 PM
1 Kudo
> can we deploy the HttpFS role on more than one node? is it best practice? Yes, the HttpFs service is an end-point for REST API access to HDFS, so you can deploy multiple instances and also consider load balancing (might need sticky sessions for data read paging). > we can see that new logs are created on opt/hadoop/dfs/nn/current on the actine namenode on node01 but no new files . on the standby namenode no node02 - is it OK ?? Yes, this is normal. The new edit logs are redundantly written only when the NameNode is active. At all times the edits are primarily always written into the JournalNode directories.
... View more
03-06-2019
06:33 PM
It is not normal to see the file descriptors limit run out or run close to limit unless you have an overload problem of some form. I'd recommend checking via 'lsof' what is the major contributor towards the FD count for your RegionServer process - chances are it is avoidable (a bug, a flawed client, etc.). The number should be proportional to your total region store file counts and the number of connecting clients. While the article at https://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/ focuses on DN data transceiver threads in particular, the formulae at the end can be applied similarly to file descriptors in general too.
... View more
03-06-2019
05:20 PM
The issue appears to crop up when distributing certain configuration files to prepare for installing packages. Could you check or share what the failure is via the log files present under /tmp/scm_prepare_node.*/*?
... View more
03-06-2019
04:56 PM
We have a consolidated view of product compatibility/support matrix at https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html (the component named sections are what you are after)
... View more
03-06-2019
04:51 PM
Adding a new/replacement ZooKeeper Quorum Peer will require you to (roughly, and with downtime involved for simplicity): - Stop ZooKeeper service across all hosts - Install the ZooKeeper packages of the same/similar version as the rest of the peers, on the new host. - Reconfigure all ZooKeeper hosts' /etc/zookeeper/conf/zoo.cfg to replace and point at the new member hostname (if there is a change) - Create an appropriate myid file under the ZK storage directory (typically /var/lib/zookeeper/myid) that matches the ID specified in the config file from the previous step - Restart ZooKeeper service on all hosts
... View more
03-06-2019
04:47 PM
You can build out multiple clusters sharing the same KDC and Realm, as long as their machine hostnames are distinct. A service principal takes the form of USER/HOST@REALM, so this will avoid conflicts. This is also practiced in many environments. In this approach however, users on one cluster will immediately have authentication access to the other cluster, because the KDC Realm is common between the two. If that is not desirable, you'll need to run separate KDCs with distinct Realm names. In the former case (same Realm, multiple clusters), DNS discovery of the Realm would not be a problem as only a single one exists. In the latter case (one Realm per cluster), you'll likely need to make use of explicit [domain_realm] section specifiers in krb5.conf to direct clients to the right KDC for each cluster's service hostnames.
... View more
03-06-2019
04:42 PM
Double-ensure that the variant of OpenJDK you are trying to install is of x86_64 arch and not i686. Sometimes it is as easy as explicitly specifying it in the install request, like so: "yum install java-1.8.0-openjdk-devel.x86_64" but it depends on your repo vendor.
... View more
03-06-2019
04:39 PM
Are you using Cloudera Manager? If you are, do not manually customize contents under the /etc/hive/conf path as this is a symlink to a command generated directory that can be redeployed at any point (from UI or API actions on the cluster). Try storing your keytab at a different path. P.s. If you are using CM, it should be managing your keytabs for you, so you can avoid such steps. Is this configuration used to customize the entries of the keytab in a way that CM cannot?
... View more
03-06-2019
04:36 PM
The Java provided 'keytool' utility helps you generate a certificate pair and also store it into a JavaKeyStore (JKS) container format which the JVMs will expect it to be in. This is one reason why the documentation suggests to use it, as it reduces steps. You can certainly generate your cert. pair without using Java's 'keytool' utility (such as via openssl commands, etc.), and just use the utility to copy the existing certificates into a JKS container format file for JVMs to use. This is equally acceptable too. The 'root access' part of your question is a little confusing, so perhaps I've not gotten your problem right. You do not normally require root level privileges to create a certificate (although you may need it to alter existing, OS-supplied stores).
... View more
03-06-2019
04:30 PM
Currently Hive's connections to LDAP do not support the StartTLS extension [1]. This does make sense as a feature request however, could you log your request over at https://issues.apache.org/jira/projects/HIVE please? [1] - https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java#L52-L62
... View more
03-05-2019
09:46 PM
1 Kudo
> Clear Cache > This is the one I am not too sure what happens It appears to clear the cached entries within Hue frontend, so the metadata for assist and views is loaded again from its source (Impala, etc.). I don't see it calling a refresh on the tables, but it is possible I missed some implicit action. > Perform Incremental metadata Update > I assume this issues a refresh command for all tables within the current database which is been viewed? If no database is veiwed does it do it for everything? This will compare HMS listing against Impala's for the DB in context and run specific "INVALIDATE METADATA [[db][.table]];" for the missing ones in Impala. Yes, if no DB is in the context, it will equate to running a global "INVALIDATE METADATA;" > Invalidate All metadata and rebuild index This runs a plain "INVALIDATE METADATA;"
... View more