About Arun-

Arun- · ‎11-01-2018

Below are some FAQ's which helps you to quickly identify some important info for DPS-DLM deployment. Pre-req's for DPS and DLM DB version- postgres 9.3 to 9.6 OS - RHEL 7.0 and above Ambari - 2.6.2 HDP 2.6.5 Distcp should work b/w source and target clusters. Beacon user should be created in AD there is no choice of using custom user in this DLM 1.1 release the onboarded service user for your application should exists in AD and needs to be resolved(id <username> on both source and target clusters . docker version Required ports needs to be open b/w source and target clusters and also to access DPS UI where to install DLM and DPS software components? DLM engine needs to be installed as m-pack on both clusters using Ambari server DLM app is dockerized container needs to be installed on DPS host Which URL needs to be given to register cluster in DPS UI the Ambari URL integrated with knox http://<>:8443 I'm unable to see the DLM icon in DPS UI after enabling DLM component in DPS User needs to be part of Infra-admin role Verify DLM Engine install Verify that Beacon was added as a user to the HDFS superuser group. hdfs groups beacon The output should display HDFS (or value of the dfs.permissions.superusergroup config) as one of the groups. Beacon user should be part of ranger policies https://docs.hortonworks.com/HDPDocuments/DLM1/DLM-1.2.0/installation/content/dlm_verify_the_dlm_engine_installation.html Mostly used commands for for troubleshooting On DPS host use below commands docker ps -- check ports,containers and uptime docker images docker exec -it <docker-name> docker exec -it 029ec380bb3d /bin/ls -alrt /usr/dp-app/ docker logs --follow dp-app docker exec -it d6390b6c0c50 /bin/ls -alrt /usr/dp-app/ Required Machine config for DPS and DLM DPS runs on separate machine which will run all docker containers. <br>Master Node config is recommended for this host with atleast 64 GB of memory if you are using external database for same host consider more memory and CPU For hive replication the in target cluster beacon is auto creating deny policy in ranger ..is this expected behavior or bug in DLM 1.1? This is to prevent any writes from happening outside of replication to the target database the deny policy is only on the replication target database For Hive replication can we schedule job per table basis? No ,in this current DLM 1.1 release only database level is supported. Please upvote if its helpful.

Arun- · ‎08-27-2018

Q1.Does Hive LLAP supports stored procedures? UDF’s https://community.hortonworks.com/articles/117833/creating-custom-udf-and-adding-udf-jar-to-hive-lla.html Question on handling small files problem If ACID tables are not used then how to handle small files problem in Hive?. Is there any archival process to follow like creating HAR files? Alter Table/Partition Concatenate Version information In Hive release 0.8.0 RCFile added support for fast block level merging of small RCFiles using concatenate command. In Hive release 0.14.0ORC files added support fast stripe level merging of small ORC files using concatenate command. ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE; If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into larger files. In case of RCFile the merge happens at block level whereas for ORC files the merge happens at stripe level thereby avoiding the overhead of decompressing and decoding the data. Question on Mutations So if we need to apply a thousand mutations, this would be a thousand operations, rather than one bulk operation. Please refer for the lock section - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/lock-manager.html Question on cache eviction: Can LLAP be used to read more data than can fit into memory? Yes , it has eviction policy and stored the data in compressed format. Question on data transfer: Specific example, if a “select *” is performed on a very large table, can the application receive that data as a “stream” or does some component (LLAP or HiveServer2, etc) need to hold the entire dataset in memory? All the results are streamed to HDFS and the results are streamed from there . NO memory constraint Question on query result: Related, does LLAP send results back as they become available (like Hbase scanresults) or only once the query completes? – Returns the results once SQL completes Question on compaction: We may benefit from Hive’s ACID feature to handle “deltas”. Advantages seem to be: •it would allow the updated data to be available in queries before a compaction has taken place. “You can update the data. compaction should be transparent” •compaction implementation already exists, no need for bespoke implementation – Hive has inbuilt compaction technique [Major and Minor] Question on spark and hive llap integration: •Can LLAP be leveraged to serve data to Spark jobs efficiently? I.e., can LLAP inform Spark on the partitioning of the data it will provide? Or is it very course, plain jdbc, interface? LLAP Spark Context is in Tech Preview(TP). Question on cache eviction algorithm: Seems Hive Metastore does not cache much data. Which means each query for Metadata, which would include statistics, goes through the “datanucleus” ORM layer. Is this correct? LLAP has a metadata cache. Caching The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in process in Java objects; cached data is stored in the format described in the I/O section, and kept off-heap (see Resource management). Eviction policy. The eviction policy is tuned for analytical workloads with frequent (partial) table-scans. Initially, a simple policy like LRFU is used. The policy is pluggable. Caching granularity. Column-chunks are the unit of data in the cache. This achieves a compromise between low-overhead processing and storage efficiency. The granularity of the chunks depends on the particular file format and execution engine (Vectorized Row Batch size, ORC stripe, etc.). A bloom filter is automatically created to provide Dynamic Runtime Filtering. Question Hive LLAP on specific Nodes In Ambari, how to specify where to run llap daemon on specific node. Running Hive LLAP on specific Nodes using YARN Node Labels https://community.hortonworks.com/content/kbentry/170868/running-llap-on-specific-nodes-using-yarn-node-lab.html How fast I will know when LLAP query execution will fail?. If this execution mode is sethive. llap.execution.mode=only will fail immediately before submitting to LLAP How to cancel LLAP queries which are in RUNNING State? you should check yarn and see ifllaphas enough containers allocated. 1.Yarn top 2.Yarn application -kill <appid> How to modify LLAP Log Options There is a flag in Ambari-Hive config section UI

Arun- · ‎08-27-2018

HIVE LLAP - a one-page architecture overview https://community.hortonworks.com/articles/149894/llap-a-one-page-architecture-overview.html Hive - Understanding concurrent sessions + queue allocation + preemption https://community.hortonworks.com/articles/56636/hive-understanding-concurrent-sessions-queue-alloc.html Hive LLAP Dashboards https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/grafana_hive_llap_dashboards.html Hive LLAP Logs info https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html Monitoring LLAP metrics http://www.kartikramalingam.com/hive-llap/ Debugging Hive LLAP Query https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html Question on Hive LLAP benchmarks Please share if any Hive LLAP benchmarks? https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/ LLAP Tuning Here is an excellent article on LLAP tuning. https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html

Arun- · ‎10-12-2017

GetTCP - Connects over TCP to the provided endpoint(s). Received data will be written as content to the FlowFile ListenTCP- Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. Ref:- https://nifi.apache.org/docs.html

Arun- · ‎09-30-2017

Thanks to @Matt Clarke for resolving this Major issue. In a typical customer environment there is a challenge while deploying HDF Cluster & enabling LDAPS Authentication because of Username case. In Active directory userid exists as (Ex for Empid:- X1122) but When I have imported users in Ranger by setting lowercase=true all imported users are displayed like this in lower case (x1122) . I have created all required policies for kafka and nifi .verified smoke tests for Kafka and they are PASSED. But smoke tests for NiFi are FAILED because because NiFi respects only AD value(X1122) and there is no inbuilt intelligence todo a case conversion. All the NiFI ranger policies has userid as (x112233).So Ranger Nifi policies are not applicable in this scenario and ranger nifi plugin authorization is not working correctly. So,NiFi Ranger Authorization has Failed to access View NiFI UI under /flow ranger policy. NiFi does not have a option to change case sensitive of returned results, but with the ldap-provider there are two configuration options for "identity Strategy": 1. (default) USE_DN --> This strategy will use the users complete DN returned by LDAP upon successful authentication for authorization.<br> 2. USE_USERNAME --> This strategy will use the username as typed in the login screen for authorization upon successful authentication with LDAP. No matter what method of authentication is used, the value used above based on configuration is passed through and identity mapping patterns configured in NiFi and the result sent to the configured authorizer. That authorizer in your case is Ranger. We resolve this issue by using "USE_USERNAME" So as long as user logs in as all lowercase, it will work We also changed user search filter to: <property name="User Search Filter">(&(sAMAccountName={0})(memberOf=CN=hwx,OU=Groups,OU=Global,OU=XX,DC=XX,DC=XX)) </property> and proper search base needed to be: <property name="User Search Base">OU=Users,OU=XX,DC=XX,DC=XX</property>

Arun- · ‎09-16-2017

HIVE Beeline: ============ Binary mode > !connect 'jdbc:hive2://prod07.app.hwx.com:10000/;transportMode=binary' http mode beeline -u 'jdbc:hive2://prod07.app.hwx.com:10001/;transportMode=http;httpPath=cliservice' In HS2 HA Environment with zookeeper out auto-discovery mode !connect jdbc:hive2://prod09.app.hwx.com:2181,prod10.app.hwx.com:2181,prod11.app.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 In Kerberos Environment - Hive Beeline command !connect 'jdbc:hive2://prod07.app.hwx.com:10001/default;principal=hive/prod07.app.hwx.com@EXAMPLE.COM;transportMode=http;httpPath=cliservice' KNOX with Beeline !connect jdbc:hive2://knox101.app.hwx.com:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true knox with webhdfs curl -iku raj_ops -X GET https://knox101.app.hwx.com:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS if this article helps you.please up vote it.

Arun- · ‎04-27-2017

I have provided below the implementation steps for integrating KNOX with Loadbalancer assuming once your Loadbalancer is ready. SSL connection should terminate on Knox servers at Loadbalancer side Sticky session should be enabled. Currently the SSL connection terminates at load balancer side and internally it encrypts and loops through one of the KNOX channels. JKS file creation OPEN in IE browser the load balancer URL Example:-https://hadoop-knox.dev.XXXXXX.com/ Click on lock symbol click on view certificates and Certificate path choose Root click on view certificate--details--copytofile--base 509 format --save as .pem file choose intermediateIssuer CA click on view certificate--details--copytofile--base 509 format --save as .pem file choose loadbalncercert and click on details--copytofile--base 509 format --save as .pem file copy these 3 files into knox edge node. I have copied to certfiles folder /tmp/knoxhacerts/new/certfiles/lb-rootca.pem /tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem /tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem create new JKS file as below cp /usr/hdp/current/knox-server/data/security/keystores/gateway.jks /tmp/knoxhacerts/dev-knox-test-1.jks keytool -storepasswd -keystore /tmp/knoxhacerts/dev-knox-test-1.jks enter current master secret password then change the password by using new password. Import all the PEM encoded files to these JKS file. keytool -import -alias rootca-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/lb-rootca.pem keytool -import -alias intca-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem keytool -import -alias dev-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem Ca-cert chain for ODBC: Copy the contents of all below files by opening in a notepad editor to one merge-chainfile(merge-cacertchain.crt) by /tmp/knoxhacerts/new/certfiles/lb-rootca.pem /tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem /tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem Verification Step: Use SSLPoke to verify connectivity. Try the Java class SSLPoke to see if your truststore contains the right certificates. This will let you connect to a SSL service, send a byte of input, and watch the output. Download SSLPoke.class (https://confluence.atlassian.com/kb/files/779355358/779355357/1/1441897666313/SSLPoke.class) compile javac SSLPoke.java Execute the class as per the below, changing the URL and port appropriately. <JAVA_HOME>/bin/java SSLPoke jira.example.com 443 Failed Scenario: A failed connection would produce the below: 1 /usr/bin/java SSLPoke jira.example.com 443 2 sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target HAPPY Path: devenap02.dev.abc.net# java -Djavax.net.ssl.trustStore=/tmp/knoxhacerts/new/dev-knox-test-1.jks SSLPoke hadoop-knox.dev.XXXXXX.com 443 Successfully connected Pls upvote if this article helps.

Arun- · ‎04-27-2017

if you want to verify the Certificate contents of KNOX Server execute below command openssl s_client -showcerts -connect 127.0.0.1:8443 if developers want to connect to KNOX with SSL enabled copy cert contents from above command to knox.crt file and import to a Keystore by executing below command keytool -import -keystore myLocalTrustStore.jks -file knox.crt Now developers use as below beeline> !connect "jdbc:hive2://hadoop-knox.dev.XXXX.com:8443/default;transportMode=http; httpPath=gateway/default/hive;ssl=true;sslTrustStore=/tmp/knoxhacerts/new/myLocalTrustStore.jks;trustStorePassword=knoxdev" Hive JDBC jdbc:hive2://{gateway-host}:{gateway-port}/; ssl=true; sslTrustStore={gateway-trust-store-path}; trustStorePassword={gateway-trust-store-password}; transportMode=http; httpPath={gateway-path}/{cluster-name}/hive If you want to list the imported certs in a JKS file execute below command. keytool -v -list -keystore gateway.jks command to create new truststore myNewTrustSTore.jks keytool -import -alias knox -keystore ./myNewTrustStore.jks -file ./knox-cert.pem knox-cert.pem is the cert you saved knox.crt certificate in pem format if you want to change SSL certificate for KNOX http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/knox_ca_signed_certificates_production.html Pls upvote if this article helps.

Arun- · ‎03-13-2017

@suresh krish Follow the steps mentioned here. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_optional_install_a_new_mit_kdc.html Please refer to this article http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_kerberos_overview.html

Arun- · ‎02-03-2017

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/ranger_rest_api_create_policy.html

Online	Offline
Last Visited	‎08-31-2023 12:09 PM

Member Since	‎07-24-2019 09:28 AM
Last Visited	‎08-31-2023 12:09 PM
Posts	46
Kudos received	31

Cloudera Community

Re: Error Trying to get Basic Pig Syntax Running

Re: Hive Query does not run

Re: How to Set JAVA System Properties in OOZIE JAV...

Re: ClassNotFoundException: org.apache.oozie.clien...

Re: DISTCP fails from CHD4.2(Non HA+Non-secure) t...

DPS DLM FAQ's

Common LLAP questions answered

LLAP Cheat Sheet

Re: not able to understand difference between GetT...

How to handle username case conversion while Integ...

Smoke Tests for all HIVE connection modes

Integrating KNOX with LoadBalancer

KNOX Troubleshooting

Re: kerbores not started

Re: Is there a way to export ranger policies from ...