Member since
01-19-2017
3681
Posts
633
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1609 | 06-04-2025 11:36 PM | |
| 2071 | 03-23-2025 05:23 AM | |
| 984 | 03-17-2025 10:18 AM | |
| 3734 | 03-05-2025 01:34 PM | |
| 2572 | 03-03-2025 01:09 PM |
07-30-2019
11:00 PM
1 Kudo
@Haijin Li To be able to help, it's a good idea to always share your cluster config and version. I can see you are referencing HDP 2.3 documentation which is obsolete and personally I am wondering why you are running this version? Some parameters like hive.users.in.admin.role are not default values in hive_site.xml , so you will need to add them in Custom hive-site these are considered custom site values see attached screenshot Haijin2.png Authorization: SQL-Standard Based (SQLStdAuth) Custom values The above UI is specific to HDP 3.1.0.0 running Ambari Version 2.7.3.0 but even with earlier versions of Ambari you can filter see the ARROW HTH
... View more
07-30-2019
09:05 PM
@Ray Teruya Start-all-services-from-Ambari Start all services. Use Ambari UI > Services > Start All to start all services at once. In Ambari UI > Services you can start, stop, and restart all listed services simultaneously. In Services, click ... and then click Start All. The first place to check for start failures or success in /var/logs/zookeeper/zookeeper.log or zookeeper-zookeeper-server-[hostname].out According to HWX documentation make sure to manually start the Hadoop services in this prescribed order 1. How do I check what services need to be "up and running" before restarting the next one? Is there any place where I can see the dependency? The above gives you the list and order of dependency 2. Do I need 2 ZooKeeper servers up and running? The first one is running in localhost but the 2nd one runs in a different machine. If I actually need them both, how can I check what was wrong in the second one? If you are not run an HA configuration a single zookeeper suffice, but if you want to emulate a production environment with many data nodes to enable [HA Namenode or RM] you MUST have at least 3 zookeepers to avoid the split-brain phenomenon Hope that helps
... View more
07-28-2019
09:15 PM
2 Kudos
@Figo C The reason is by design NiFi as a client communicates with HDFS Namenode on port 8020 and it returns the location of the files using the data node which is a private address. Now that both your HDF and HDF are sandboxes I think you should switch both to host-only-adapter your stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP instead of 127.0.0.1. That causes the minReplication issue, etc. Change the HDP and HDF sandbox VM network settings from NAT to Host-only Adapter. Here are the steps: 1. Shutdown gracefully the HDF sandbox 2. Change Sandbox VM network from NAT to Host-only Adapter It will automatically pick your LAN or wireless save the config. 3. Restart Sandbox VM 4. Log in to the Sandbox VM and use ifconfig command to get its IP address, in my case 192.168.0.45 5. Add the entry in /etc/hosts on my host machine, in my case: 192.168.0.45 sandbox.hortonworks.com 6. Check connectivity by telnet: telnet sandbox.hortonworks.com 8020 7. Restart NiFi (HDF) By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior. If the above still fails make the below changes in the hdfs-site.xml that NiFi is using set dfs.client.use.datanode.hostname to true in your <property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property> Hope that helps
... View more
07-26-2019
08:46 PM
@Figo C Can you check running status/logs of datanode/namenode and copy-paste it here. Did ou add these 2 files to your nifi config Core-site.xml and hdfs-site.xml
... View more
07-24-2019
11:08 PM
1 Kudo
@Michael Bronson Here is an HCC doc that could help uninstall completely HDP for clean Fresh Install Hope that helps
... View more
07-24-2019
11:00 PM
1 Kudo
@jessica moore This API call should do the magic, remember to substitute the values with your actual cluster values curl -u {ambari-username}:{ambari-password} -H "X-Requested-By: ambari" -X GET http://{ambari-host}:{ambari-port}/api/v1/clusters/{clustername}/services Hope that helps please revert
... View more
06-29-2019
06:48 PM
@Hamilton Castro The simple and clear answer is "YES" !! HDFS Snapshots are read-only point-in-time copies of the file system. They can be taken on any level of the file system. Snapshot is valuable as a backup or for Business continuity plans as a Disaster recovery option. The concept of a snapshot can be considered Point-in-Time [PIT] backup, which is wrong if you had a 5TB the snapshot will not be the same size, an HDFS snapshot is not a full copy of the data, rather a copy of the metadata at that point in time. Blocks in data nodes are not copied: the snapshot files record the block list and the file size. There is no data copying (more accurately a new record in the inode). It's only on modifications (appends and truncates for HDFS) that record any data. The snapshot data is computed by subtracting the modifications from the current data. The modifications are recorded in chronological order, so that the current data can be accessed directly. To take snapshots, the HDFS directory has to be set as a snapshot table. If there are snapshots in a snapshottable directory, the directory cannot be deleted nor renamed. So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. Copying data between clusters or storage systems, copying a snapshotted file is no different than copying a regular file - they both will copy the same way, with bytes and with metadata. There's no "copy only metadata" operation.
... View more
06-24-2019
10:15 PM
1 Kudo
@chethan mh The same HWX document categorically states "If you are running an earlier HDF version, upgrade to at lease HDF 3.1.0, and then proceed to the HDF 3.3.0 upgrade." only HDF 3.3.x and HDF 3.2.x can be directly upgraded to 3.4.0. So you will have a 3 step migration first to at least 3.1.0 then HDF 3.3.0 then finally to 3.4.0. If the effort the try export NiFi can export/import flows via templates. You can save your flow as a template (xml file), and import the template from a file as well. If you want to save the entire flow you have in the system, you can also find that in nifi/conf/flow.xml.gz on your nifi box. This is not a template but would be able to drop into a clean NiFi instance. Follow this link to nifi procedure Please revert
... View more
06-24-2019
09:18 PM
1 Kudo
@Michael Bronson The simple answer is NO ,HDP 3.1 supports only Ambari versi 2.7.3 according to the official HWX support matrix see screenshots below. During the process of upgrading to Ambari 2.7.3 and HDP 3.1.0, additional components will be added to your cluster, and deprecated services and views will be removed. Ambari 2.6.x to Ambari 2.7.3 The Ambari 2.6.x to Ambari 2.7.3 upgrade will remove the following views: Hive View 1.5,
Hive View 2 Hue To Ambari View Migration
Slider
Storm
Tez
Pig The Ambari Pig View is deprecated in HDP 3.0 and later. Ambari does not enable Pig View. To enable Pig View in HDP 3.0 and later, you need to contact Hortonworks support for instructions that include how to install WebHCat using an Ambari management pack. HDP 2.6.x to HDP 3.1.0 The HDP 2.6.x to HDP 3.1.0 upgrade will add the following components if YARN is deployed in the cluster being upgraded: YARN Registry DNS YARN Timeline Service V2.0 Reader The HDP 2.6.x to HDP 3.1.0 upgrade will remove the following services: Flume
Mahout
Falcon
Spark 1.6
Slider
WebHCat 2.6.4.PNG HDP3.1.PNG
... View more
06-15-2019
05:52 PM
@choppadandi vamshi krishna With Hive 3.0 you have hive and druid storage options too and the orc is the most common. I haven't tested and can't confirm whether it's possible to create an MV over Avro and refresh it a regular interval would work You can also use the rebuild option to refresh the MV when scripting run the rebuild which will overwrite the previous mv before querying it so you have an updated ALTER MATERIALIZED VIEW mv REBUILD; You also have the Druid storage org .apache.hadoop.hive.druid.DruidStorageHandler or you can rebuild an MV like every 5 minutes but you should take into account that every rebuild will take longer than the previous due to the addition of data in the source table. HTH
... View more