Member since
07-24-2019
46
Posts
31
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1446 | 01-30-2017 09:57 PM | |
9267 | 12-17-2016 12:11 AM | |
2727 | 07-06-2016 06:54 PM | |
2599 | 07-05-2016 05:41 PM | |
3205 | 06-16-2016 04:03 PM |
11-01-2018
09:02 PM
2 Kudos
Below are some FAQ's which helps you to quickly identify some important info for DPS-DLM deployment. Pre-req's for DPS and DLM DB version- postgres 9.3 to 9.6
OS - RHEL 7.0 and above
Ambari - 2.6.2
HDP 2.6.5
Distcp should work b/w source and target clusters.
Beacon user should be created in AD there is no choice of using custom user in this DLM 1.1 release
the onboarded service user for your application should exists in AD and needs
to be resolved(id <username> on both source and target clusters .
docker version
Required ports needs to be open b/w source and target clusters and also to access DPS UI
where to install DLM and DPS software components? DLM engine needs to be installed as m-pack on both clusters using Ambari server
DLM app is dockerized container needs to be installed on DPS host
Which URL needs to be given to register cluster in DPS UI the Ambari URL integrated with knox http://<>:8443
I'm unable to see the DLM icon in DPS UI after enabling DLM component in DPS User needs to be part of Infra-admin role Verify DLM Engine install Verify that Beacon was added as a user to the HDFS superuser group.
hdfs groups beacon
The output should display HDFS (or value of the dfs.permissions.superusergroup config) as one of the groups.
Beacon user should be part of ranger policies
https://docs.hortonworks.com/HDPDocuments/DLM1/DLM-1.2.0/installation/content/dlm_verify_the_dlm_engine_installation.html Mostly used commands for for troubleshooting On DPS host use below commands
docker ps -- check ports,containers and uptime
docker images
docker exec -it <docker-name>
docker exec -it 029ec380bb3d /bin/ls -alrt /usr/dp-app/
docker logs --follow dp-app
docker exec -it d6390b6c0c50 /bin/ls -alrt /usr/dp-app/ Required Machine config for DPS and DLM DPS runs on separate machine which will run all docker containers. <br>Master Node config is recommended for this host with atleast 64 GB of memory
if you are using external database for same host consider more memory and CPU For hive replication the in target cluster beacon is auto creating deny policy in ranger ..is this expected behavior or bug in DLM 1.1? This is to prevent any writes from happening outside of replication to the target database
the deny policy is only on the replication target database For Hive replication can we schedule job per table basis? No ,in this current DLM 1.1 release only database level is supported. Please upvote if its helpful.
... View more
08-27-2018
07:46 PM
6 Kudos
Q1.Does Hive LLAP supports stored procedures? UDF’s https://community.hortonworks.com/articles/117833/creating-custom-udf-and-adding-udf-jar-to-hive-lla.html Question on handling small files problem If ACID tables are not used then how to handle small files problem in Hive?. Is there any archival process to follow like creating HAR files? Alter Table/Partition Concatenate
Version information
In Hive release 0.8.0 RCFile added support for fast block level merging of small RCFiles using concatenate command. In Hive release 0.14.0ORC files added support fast stripe level merging of small ORC files using concatenate command. ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE; If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into larger files. In case of RCFile the merge happens at block level whereas for ORC files the merge happens at stripe level thereby avoiding the overhead of decompressing and decoding the data. Question on Mutations So if we need to apply a thousand mutations, this would be a thousand operations, rather than one bulk operation.
Please refer for the lock section - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/lock-manager.html Question on cache eviction: Can LLAP be used to read more data than can fit into memory?
Yes , it has eviction policy and stored the data in compressed format. Question on data transfer: Specific example, if a “select *” is performed on a very large table, can the application receive that
data as a “stream” or does some component (LLAP or HiveServer2, etc) need to hold the entire dataset in memory?
All the results are streamed to HDFS and the results are streamed from there . NO memory constraint Question on query result: Related, does LLAP send results back as they become available (like Hbase scanresults) or only once the query completes? –
Returns the results once SQL completes Question on compaction: We may benefit from Hive’s ACID feature to handle “deltas”. Advantages seem to be:
•it would allow the updated data to be available in queries before a compaction has taken place. “You can update the data. compaction should be transparent”
•compaction implementation already exists, no need for bespoke implementation – Hive has inbuilt compaction technique [Major and Minor] Question on spark and hive llap integration: •Can LLAP be leveraged to serve data to Spark jobs efficiently? I.e., can LLAP inform Spark on the partitioning of the data it will provide? Or is it very course, plain jdbc, interface?
LLAP Spark Context is in Tech Preview(TP). Question on cache eviction algorithm: Seems Hive Metastore does not cache much data. Which means each query for Metadata, which would include statistics, goes through the “datanucleus” ORM layer.
Is this correct?
LLAP has a metadata cache. Caching The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in process in Java objects; cached data is stored in the format described in the I/O section, and kept off-heap (see Resource management).
Eviction policy. The eviction policy is tuned for analytical workloads with frequent (partial) table-scans. Initially, a simple policy like LRFU is used. The policy is pluggable.
Caching granularity. Column-chunks are the unit of data in the cache. This achieves a compromise between low-overhead processing and storage efficiency. The granularity of the chunks depends on the particular file format and execution engine (Vectorized Row Batch size, ORC stripe, etc.).
A bloom filter is automatically created to provide Dynamic Runtime Filtering. Question Hive LLAP on specific Nodes In Ambari, how to specify where to run llap daemon on specific node. Running Hive LLAP on specific Nodes using YARN Node Labels
https://community.hortonworks.com/content/kbentry/170868/running-llap-on-specific-nodes-using-yarn-node-lab.html How fast I will know when LLAP query execution will fail?. If this execution mode is sethive. llap.execution.mode=only will fail
immediately before submitting to LLAP How to cancel LLAP queries which are in RUNNING State? you should check yarn and see ifllaphas enough containers allocated.
1.Yarn top
2.Yarn application -kill <appid> How to modify LLAP Log Options There is a flag in Ambari-Hive config section UI
... View more
08-27-2018
07:36 PM
4 Kudos
HIVE LLAP - a one-page architecture overview https://community.hortonworks.com/articles/149894/llap-a-one-page-architecture-overview.html Hive - Understanding concurrent sessions + queue
allocation + preemption https://community.hortonworks.com/articles/56636/hive-understanding-concurrent-sessions-queue-alloc.html Hive LLAP Dashboards https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/grafana_hive_llap_dashboards.html Hive LLAP Logs info https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html Monitoring LLAP metrics http://www.kartikramalingam.com/hive-llap/ Debugging Hive LLAP Query https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html Question
on Hive LLAP benchmarks Please share if any Hive
LLAP benchmarks? https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/ LLAP Tuning Here is an excellent article on
LLAP tuning. https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html
... View more
Labels:
10-12-2017
12:19 AM
1 Kudo
GetTCP - Connects over TCP to the provided endpoint(s). Received data will be written as content to the FlowFile ListenTCP- Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. Ref:- https://nifi.apache.org/docs.html
... View more
09-30-2017
10:18 AM
2 Kudos
Thanks to @Matt Clarke for resolving this Major issue. In a typical customer environment there is a challenge while deploying HDF Cluster & enabling LDAPS Authentication because of Username case. In Active directory userid exists as (Ex for Empid:- X1122)
but When I have imported users in Ranger by setting lowercase=true all imported users are displayed like this in lower case (x1122) . I have created all required policies for kafka and nifi .verified smoke tests for Kafka and they are PASSED.
But smoke tests for NiFi are FAILED because because NiFi respects only AD value(X1122) and there is no inbuilt intelligence todo a case conversion.
All the NiFI ranger policies has userid as (x112233).So Ranger Nifi policies are not applicable in this scenario and ranger nifi plugin authorization is not working correctly.
So,NiFi Ranger Authorization has Failed to access View NiFI UI under /flow ranger policy. NiFi does not have a option to change case sensitive of returned results, but with the ldap-provider there are two configuration options for "identity Strategy": 1. (default) USE_DN --> This strategy will use the users complete DN returned by LDAP upon successful authentication for authorization.<br>
2. USE_USERNAME --> This strategy will use the username as typed in the login screen for authorization upon successful authentication with LDAP. No matter what method of authentication is used, the value used above based on configuration is passed through and identity mapping patterns configured in NiFi and the result sent to the configured authorizer. That authorizer in your case is Ranger.
We resolve this issue by using "USE_USERNAME"
So as long as user logs in as all lowercase, it will work We also changed user search filter to:
<property name="User Search Filter">(&(sAMAccountName={0})(memberOf=CN=hwx,OU=Groups,OU=Global,OU=XX,DC=XX,DC=XX))
</property>
and proper search base needed to be:
<property name="User Search Base">OU=Users,OU=XX,DC=XX,DC=XX</property>
... View more
Labels:
09-16-2017
11:52 PM
2 Kudos
HIVE Beeline: ============ Binary mode > !connect 'jdbc:hive2://prod07.app.hwx.com:10000/;transportMode=binary'
http mode
beeline -u 'jdbc:hive2://prod07.app.hwx.com:10001/;transportMode=http;httpPath=cliservice' In HS2 HA Environment
with zookeeper out auto-discovery mode !connect jdbc:hive2://prod09.app.hwx.com:2181,prod10.app.hwx.com:2181,prod11.app.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 In Kerberos Environment - Hive Beeline command
!connect 'jdbc:hive2://prod07.app.hwx.com:10001/default;principal=hive/prod07.app.hwx.com@EXAMPLE.COM;transportMode=http;httpPath=cliservice' KNOX with Beeline
!connect jdbc:hive2://knox101.app.hwx.com:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true
knox with webhdfs curl -iku raj_ops -X GET https://knox101.app.hwx.com:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS if this article helps you.please up vote it.
... View more
04-27-2017
06:49 PM
1 Kudo
I have provided below the implementation steps for integrating KNOX with Loadbalancer assuming once your Loadbalancer is ready. SSL connection should terminate on Knox servers at Loadbalancer side Sticky session should be enabled. Currently the SSL connection terminates at load balancer side and internally it encrypts and loops through one of the KNOX channels.
JKS file creation
OPEN in IE browser the load balancer URL
Example:-https://hadoop-knox.dev.XXXXXX.com/
Click on lock symbol click on view certificates and Certificate path choose Root click on view certificate--details--copytofile--base 509 format --save as .pem file
choose intermediateIssuer CA click on view certificate--details--copytofile--base 509 format --save as .pem file
choose loadbalncercert and click on details--copytofile--base 509 format --save as .pem file copy these 3 files into knox edge node. I have copied to certfiles folder
/tmp/knoxhacerts/new/certfiles/lb-rootca.pem
/tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem
/tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem create new JKS file as below
cp /usr/hdp/current/knox-server/data/security/keystores/gateway.jks /tmp/knoxhacerts/dev-knox-test-1.jks keytool -storepasswd -keystore /tmp/knoxhacerts/dev-knox-test-1.jks enter current master secret password then change the password by using new password.
Import all the PEM encoded files to these JKS file.
keytool -import -alias rootca-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/lb-rootca.pem
keytool -import -alias intca-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem
keytool -import -alias dev-lb -keystore dev-knox-test-1.jks -file /tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem Ca-cert chain for ODBC: Copy the contents of all below files by opening in a notepad editor to one merge-chainfile(merge-cacertchain.crt) by
/tmp/knoxhacerts/new/certfiles/lb-rootca.pem
/tmp/knoxhacerts/new/certfiles/lb-intermediate-issuer.pem
/tmp/knoxhacerts/new/certfiles/hadoop-knox-dev-lb.pem Verification Step:
Use SSLPoke to verify connectivity. Try the Java class SSLPoke to see if your truststore contains the right certificates. This will let you connect to a SSL service, send a byte of input, and watch the output. Download SSLPoke.class (https://confluence.atlassian.com/kb/files/779355358/779355357/1/1441897666313/SSLPoke.class) compile javac SSLPoke.java Execute the class as per the below, changing the URL and port appropriately.
<JAVA_HOME>/bin/java SSLPoke jira.example.com 443 Failed Scenario: A failed connection would produce the below: 1 /usr/bin/java SSLPoke jira.example.com 443 2 sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
HAPPY Path: devenap02.dev.abc.net# java -Djavax.net.ssl.trustStore=/tmp/knoxhacerts/new/dev-knox-test-1.jks SSLPoke hadoop-knox.dev.XXXXXX.com 443 Successfully connected
Pls upvote if this article helps.
... View more
Labels:
04-27-2017
06:45 PM
1 Kudo
if you want to verify the Certificate contents of KNOX
Server execute below command openssl s_client -showcerts
-connect 127.0.0.1:8443 if developers want to connect to KNOX with SSL enabled
copy cert contents from above
command to knox.crt file and import to a Keystore by executing below command keytool -import -keystore
myLocalTrustStore.jks -file knox.crt Now developers use as below beeline> !connect "jdbc:hive2://hadoop-knox.dev.XXXX.com:8443/default;transportMode=http; httpPath=gateway/default/hive;ssl=true;sslTrustStore=/tmp/knoxhacerts/new/myLocalTrustStore.jks;trustStorePassword=knoxdev" Hive JDBC jdbc:hive2://{gateway-host}:{gateway-port}/; ssl=true; sslTrustStore={gateway-trust-store-path}; trustStorePassword={gateway-trust-store-password}; transportMode=http; httpPath={gateway-path}/{cluster-name}/hive If you want to list the imported certs in a JKS file
execute below command. keytool -v -list -keystore
gateway.jks command to create new truststore myNewTrustSTore.jks keytool -import -alias knox
-keystore ./myNewTrustStore.jks -file ./knox-cert.pem knox-cert.pem is the cert you
saved knox.crt certificate in pem format if you want to change SSL certificate for KNOX http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/knox_ca_signed_certificates_production.html Pls upvote if this article helps.
... View more
Labels:
03-13-2017
09:59 PM
@suresh krish Follow the steps mentioned here. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_optional_install_a_new_mit_kdc.html Please refer to this article http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_kerberos_overview.html
... View more
02-03-2017
07:49 PM
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/ranger_rest_api_create_policy.html
... View more