Member since
04-13-2016
422
Posts
150
Kudos Received
55
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
677 | 05-23-2018 05:29 AM | |
2570 | 05-08-2018 03:06 AM | |
625 | 02-09-2018 02:22 AM | |
1429 | 01-24-2018 08:37 PM | |
3311 | 01-24-2018 05:43 PM |
01-09-2020
02:45 PM
@Anibal_Linares Can you please execute below command manually see what exact issue it's facing 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://p-ods-admin-02.transbank.local:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases; I think is issue with Hive Client, try to reinstall on the node where this command is executing
... View more
01-09-2020
02:31 PM
@Selene Have each account for each service is more secure because each account have it's own unix groups and privileges. Think of this way if one account got issue, complete ecosystem will in problem and secondly account id's like hdfs, yarn have there own privileges of permissions to execute few commands which can't be shared. Think of today's modernized applications how each of them are running as macro services. Technically speaking we can do it but you need to rewrite alot of code if you prefer to do. My suggestion go with service account.
... View more
01-09-2020
02:18 PM
@Sai2222 Yes, it can be change globally by adding it in the Oozie properties. export PIG_HEAPSIZE=2096 My suggestion of making change global for one job failure is not a good reason because it unnecessarily take some memory even though it's not required in each execution.
... View more
01-03-2020
02:40 PM
@kumar993498 : A mazon Athena uses Presto, supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. By compressing, partitioning, and using columnar formats you can improve performance and reduce your costs. following SerDes: Apache Web Logs: "org.apache.hadoop.hive.serde2.RegexSerDe" CSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" TSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" Custom Delimiters: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" Parquet: "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe" Orc: "org.apache.hadoop.hive.ql.io.orc.OrcSerde" JSON: “org.apache.hive.hcatalog.data.JsonSerDe” OR org.openx.data.jsonserde.JsonSerDe Hope this helps you.
... View more
01-03-2020
11:05 AM
@Jason4Ever : Please if your server is able to connect with internet by doing some PING commands
... View more
12-18-2019
01:16 PM
1 Kudo
@crisbcw : I can think of one option which I have used around 15months ago: If you are using Ranger, from ranger audit logs we can get the tables which are used for last n days and compare with tables with existing tables.
... View more
12-17-2019
09:10 AM
@HadoopHelp Below links will help to achieve your request, please go check and make appropriate configurations. https://community.cloudera.com/t5/Community-Articles/How-to-Configure-Authentication-with-WASB/ta-p/246004 https://hadoop.apache.org/docs/stable/hadoop-azure/index.html#Protecting_the_Azure_Credentials_for_WASB_with_Credential_Providers If you want to move data with HDFS instead of using blob storage, you can configure cross cluster configuration and provide network access on VNET.
... View more
12-12-2019
04:14 PM
@mdh_raghavendra: I think there is no way to update the existing key with length because the data which is using that 128 length will be invalidate. For best practice you have create a new key with 256 length and copy the data though NN to use the newly created 256 length. Hope this helps you. Thanks, Sridhar.
... View more
12-12-2019
03:12 PM
@PentaReddy: I guess table is not getting dropped. Please run repair statement and the run select statement to retrive the data. MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
... View more
04-04-2019
05:10 AM
@Bharath Kumar: Yes, you can create no-login them in AD. Technically, they should be login accounts if you are planning to run some service. That may vary based on the senario
... View more
06-21-2018
04:22 PM
@vishal dutt Yup, you need to create them on all the nodes. If you are using the LDAP, please bind those nodes to LDAP so that if you do id janu all the nodes you should able to see her id.
... View more
06-21-2018
03:20 PM
@vishal dutt When you run the 1st query it's not triggering any MapReduce program. It's directly reading the data but when you define a logic it's triggering mapreduce program to perform aggregations. When it's running mapreduce program, user id need to be present on all the Resource Managers. Hope this helps you and let me know if you need any further information.
... View more
06-06-2018
06:44 PM
@John Adams While running ambari-server setup --jdbc-db=mysql --jdbc-driver=/path/to/mysql/mysql-connector-java.jar you need to find where mysql-connector-java.jar is located and point to that directory. Below URL will help you. https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-administration/content/using_ambari_with_mysql.html Database Requirements
Ambari requires a relational database to store information about the cluster configuration and topology. If you install HDP Stack with Hive or Oozie, they also require a relational database. The following table outlines these database requirements: Component Databases Description Ambari PostgreSQL 9.1.13+,9.3, 9.4*** MariaDB 10.2.9* MySQL 5.7**** Oracle 11gr2 Oracle 12c** By default, Ambari installs an instance of PostgreSQL on the Ambari Server host. Hortonworks supports any version of Postgres that is automatically installed during the Ambari Server installation. Optionally, you can use an existing instance of PostgreSQL, MySQL or Oracle and the versions of supported existing database instances are listed here. Support Versions: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_support-matrices/content/ch_matrices-ambari.html Hope this helps.
... View more
06-06-2018
05:43 PM
@Josh Nicholson
If you are storing your Ranger Hive logs to HDFS and running doAs=false, you can build a hive table on to of ranger-hive logs and start querying. Example: select requser,count(*) from ranger_audit_event_json_tmp where TO_DATE(evttime)>='2018-05-10' group by requser; Above query gives you number queries ran by with respective to each user from 2018-05-10. Couple of links for creating tables: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/amb_infra_arch_n_purge_command_line_operations.html https://community.hortonworks.com/articles/60802/ranger-audit-in-hive-table-a-sample-approach-1.html This worked for me, hope this helps you
... View more
05-29-2018
09:26 PM
1 Kudo
@Venkat Metastore connection options:
Argument
Description
--meta-connect <jdbc-uri>
Specifies the JDBC connect string
used to connect to the metastore
By default, a private metastore is instantiated in $HOME/.sqoop. If you have configured a hosted metastore with
the sqoop-metastore tool,
you can connect to it by specifying the --meta-connect argument. This is a JDBC connect string just
like the ones used to connect to databases for import. In conf/sqoop-site.xml, you can
configure sqoop.metastore.client.autoconnect.url with
this address, so you do not have to supply --meta-connect to use a remote metastore. This parameter
can also be modified to move the private metastore to a location on your
filesystem other than your home directory. If you configure sqoop.metastore.client.enable.autoconnect with
the value false, then
you must explicitly supply --meta-connect. Hope this helps
... View more
05-23-2018
07:36 PM
@Mike Wong Yes
... View more
05-23-2018
03:20 PM
@Mike Wong When you restart the Services, it should automatically get updated if you have added the disk to the same mount.
... View more
05-23-2018
05:49 AM
@Bharath N
Try to perform the following steps on the failed DataNode: Get the list of DataNode directories from /etc/hadoop/conf/hdfs-site.xml using the following command: $ grep -A1 dfs.datanode.data.dir /etc/hadoop/conf/hdfs-site.xml
<name>dfs.datanode.data.dir</name>
<value>/data0/hadoop/hdfs/data,/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,
/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,
/data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data</value> Get datanodeUuid by grepping the DataNode log: $ grep "datanodeUuid=" /var/log/hadoop/hdfs/hadoop-hdfs-datanode-$(hostname).log | head -n 1 |
perl -ne '/datanodeUuid=(.*?),/ && print "$1\n"'
1dacef53-aee2-4906-a9ca-4a6629f21347 Copy over a VERSION file from one of the <dfs.datanode.data.dir>/current/ directories of a healthy running DataNode: $ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/VERSION ./ Modify the datanodeUuid in the VERSION file with the datanodeUuid from the above grep search: $ sed -i.bak -E 's|(datanodeUuid)=(.*$)|\1=1dacef53-aee2-4906-a9ca-4a6629f21347|' VERSION Blank out the storageID= property in the VERSION file: $ sed -i.bak -E 's|(storageID)=(.*$)|\1=|' VERSION Copy this modified VERSION file to the current/ path of every directory listed in dfs.datanode.data.dir property of hdfs-site.xml: $ for i in {0..9}; do cp VERSION /data$i/hadoop/hdfs/data/current/; done Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644: $ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/VERSION; done
$ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/VERSION; done One more level down, there is a different VERSION file located under the Block Pool current folder at: /data0/hadoop/hdfs/data/current/BP-*/current/VERSION This file does not need to be modified -- just place then in the appropriate directories. Copy over this particular VERSION file from a healthy DataNode into the current/BP-*/current/ folder for each directory listed in dfs.datanode.data.dir of hdfs-site.xml: $ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/BP-*/current/VERSION ./VERSION2
$ for i in {0..9}; do cp VERSION2 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644: $ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
$ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done Restart DataNode from Ambari. The VERSION file located at <dfs.datanode.data.dir>/current/VERSION will have its storageID repopulated with a regenerated ID. If any data is not an issue (say, for example, the node was previously in a different cluster, or was out of service for an extended time), then
delete all data and directories in the dfs.datanode.data.dir (keep that directory, though), restart the data node daemon or servic
... View more
05-23-2018
05:43 AM
@SH Kim Did you try to do a graceful shutdown of region servers and Datanodes decommissioning? as you are using very less number of nodes, always it better to have more than 50% availability.
... View more
05-23-2018
05:29 AM
1 Kudo
@Ruslan Fialkovsky Yes, you can use both desks. But it will not fix your problem of 1st using SSD and next HDD. Both will work in similar fashion.
... View more
05-23-2018
05:23 AM
@vishal dutt Spark driver is not able to find the sqljdbc.jar in class path. When using spark-submit , the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. Directory expansion does not work with --jars . Else 1) Provide the spark.driver.extraClassPath =/usr/hdp/hive/lib/mysql-connector-java.jar 2) Provide the spark.executor.extraClassPath = /usr/hdp/hive/lib/mysql-connector-java.jar. 3) Add Sqljdbc.jar to the Spark Classpath or add it using -jar option. Hope this helps you.
... View more
05-21-2018
07:46 PM
@Jorge Florencio
Hope this article would help you: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-upgrade-ppc/content/upgrading_log_rotation_configuration.html If you are planning to change through log4j, even below article is good. https://community.hortonworks.com/articles/8882/how-to-control-size-of-log-files-for-various-hdp-c.html Hope this helps you.
... View more
05-14-2018
09:14 PM
@Alpesh Virani Please try to use Ranger. You should be able to do that: Hope this link helps you : https://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/
... View more
05-14-2018
08:58 PM
1 Kudo
@Lokesh
Mukku
Seems like root user doesn't have HDFS home directory. Please create HDFS home for root user and try again. It should work. Below are steps to create home directory: sudo -u hdfs hadoop fs -mkdir /user/root sudo -u hdfs hadoop fs -chown root /user/root
... View more
05-08-2018
03:06 AM
@Sim kaur <property>
<name>hive.spark.client.connect.timeout</name>
<value>1000ms</value>
<description>
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec,ns/nsec), which is msec if not specified. Timeout for remote Spark driver in connecting back to Hive client.
</description>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>90000ms</value>
<description>
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified. Timeout for handshake between Hive client and remote Spark driver. Checked by
both processes.
</description>
</property> You can add the above properties in hive-site.xml. As the Spark will refer the hive-site.xml file, it will automatically gets updated in spark config. Hope this helps you.
... View more
04-19-2018
08:50 PM
With HIVE-13670 Till today we need to remember the complete Hive Connection String either you are using direct 1000 port or ZK connection string. After the above Jira we can optimize that by setting up the environment variable(/etc/profile) on the Edge nodes. export BEELINE_URL_HIVE="<jdbc url>" Example: export BEELINE_URL_HIVE="jdbc:hive2://<ZOOKEEPER QUORUM>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" Now just type beeline -u HIVE Even we can setup multiple connection strings just by setting different naming connections like BEELINE_URL_BATCH, BEELIVE_URL_LLAP. Hope this helps you.
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- Hive
- hive-jdbc
- hiveserver2
- optimization
Labels:
04-06-2018
02:55 PM
@Sami Ahmad Using Phoenix:
Use Salting to increase read/write performance Salting can significantly increase read/write performance by pre-splitting the data into multiple regions. Although Salting will yield better performance in most scenarios. Example: CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=16 Note: Ideally for a 16 region server cluster with quad-core CPUs, choose salt buckets between 32-64 for optimal performance.
Per-split table Salting does automatic table splitting but in case you want to exactly control where table split occurs with out adding extra byte or change row key order then you can pre-split a table. Example: CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')
Use multiple column families Column family contains related data in separate files. If you query use selected columns then it make sense to group those columns together in a column family to improve read performance. Example: Following create table DDL will create two column faimiles A and B. CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR) Article: https://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
... View more
04-05-2018
02:19 PM
@pk reddy Seems like it's unable to assign resources. Please check resource utilization.
... View more
04-05-2018
01:41 PM
@pk reddy Can you see multiple outputs? Can you please provide logs? Once you run a job in tez, the session/container with Application master will be there for sometime, that doesn't mean it's running all the time.
... View more