Member since
11-12-2018
192
Posts
177
Kudos Received
32
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
596 | 04-26-2024 02:20 AM | |
789 | 04-18-2024 12:35 PM | |
3453 | 08-05-2022 10:44 PM | |
3184 | 07-30-2022 04:37 PM | |
6922 | 07-29-2022 07:50 PM |
12-25-2018
07:41 AM
2 Kudos
Please can you attach full error logs to debug further, if issue persist..
... View more
12-04-2018
11:45 AM
2 Kudos
I can suggest, for 20 kafka machines you can go with 3 zookeeper servers
... View more
12-04-2018
12:41 PM
hdp-select status | grep -i hdfs
hadoop-hdfs-client - 2.6.4.0-91
hadoop-hdfs-datanode - 2.6.4.0-91
... View more
11-26-2018
02:13 PM
2 Kudos
@vamsi valiveti Shuffling is the process of transferring data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). Shuffling can start even before the map phase has finished, to save some time. That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%. Sorting saves time for the reducer, helping it easily distinguish when a new reduce task should start. It simply starts a new reduce task, when the next key in the sorted input data is different than the previous, to put it simply. Each reduce task takes a list of key-value pairs, but it has to call the reduce() method which takes a key-list(value) input, so it has to group values by key. It's easy to do so, if input data is pre-sorted (locally) in the map phase and simply merge-sorted in the reduce phase (since the reducers get data from many mappers). A great source of information for these steps is this Yahoo tutorial. A nice graphical representation of this is the following: Note that shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)). Then, the MapReduce job stops at the map phase, and the map phase does not include any kind of sorting (so even the map phase is faster) Ref Please accept the answer you found most useful
... View more
11-24-2018
06:58 AM
Hi, @Jagadeesan A S thanks that was it. However, as I'm using the Sandbox I discovered I can only change the settings in Ambari. Each time I changed hdfs-site.xml it was overwritten when I restarted.
... View more
11-25-2018
10:25 AM
3 Kudos
@raja reddy
You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.
Following are the high-level steps involved in a Hive migration
Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html
Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> ) https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html
Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
Suppose if clusters are Kerberized then you can refer below links for distcp.
https://community.hortonworks.com/content/supportkb/151079/configure-distcp-between-two-clusters-with-kerbero.html
Note: There's no need for export because you can directly copy the data from HDFS between both clusters. Please accept the answer you found most useful
... View more
11-22-2018
11:07 AM
I created /etc/hive/conf/beeline-hs2-connection.xml and it worked. Thanks
... View more
12-01-2018
10:48 AM
3 Kudos
@Gulshan Agivetova You can force Ambari Server to start by skipping this check with the following option: ambari-server start --skip-database-check
... View more
11-22-2018
10:52 AM
using PUT command, need to submit the curl twice. There is "negotiate" curl command which does the same in single submission. curl --negotiate -u : -L "http://namenode:50070/webhdfs/v1/user/username/余宗阳视频审核稿-1024.docx?op=CREATE&user.name=username" -T 余宗阳视频审核稿-1024.docx
... View more
11-27-2018
06:03 AM
3 Kudos
@Amit Mishra We can configure Knox with other authentication options too other than LDAP. Here is the link to the list of supported authentication providers for Knox (i.e., LDAP, PAM, Kerberos) https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_security/content/authentication_providers.html https://knox.apache.org/books/knox-1-1-0/user-guide.html#HadoopAuth+Authentication+Provider Please accept the answer you found most useful
... View more
- « Previous
- Next »