About YangWang

maheshchimmiri · ‎04-12-2020

Below is the very good article for differences between hadoop 2.x and Hadoop 3.x Difference Between Hadoop 2 and Hadoop 3

YangWang · ‎10-12-2017

Solution: Go to Ambari Hive Configs page. Then find Custom hive-site Add following arguments: hive.downloaded.resources.dir=/tmp/hive/${hive.session.id}_resources Then save the modification and do a Rolling Restart of related component. We can delete all ages {session_id}_resources folder under /tmp, and start a hive client, run some SQL, then check the new session folder will be generated in /tmp/hive/ folder.

YangWang · ‎10-03-2017

Thank you @Wynner

YangWang · ‎09-29-2017

@vsubramanian hdfs dfs -ls /user/hdfs The above can direct show hidden files, you can directly see the below, for example drwx------ - hdfs hdfs 0 2017-07-13 02:00 /user/hdfs/.Trash drwxr-xr-x - hdfs hdfs 0 2017-04-06 14:21 /user/hdfs/.hiveJars drwxr-xr-x - hdfs hdfs 0 2017-06-29 09:12 /user/hdfs/.sparkStaging drwxr-xr-x - hdfs hdfs 0 2017-04-24 15:54 /user/hdfs/SSP00805 where you can see the file start with a dot.

oguzeren · ‎07-15-2019

Hi everyone! I installed the ambari server and ambari-agent to my laptop (SuSE Tumbleweed 2019) to set up a local single-node Hadoop cluster. (Server and agent versions 2.1.2.1-418) Server starts without issues, and I can open the console at http://localhost:8080 Agent says it started, but it can't connect the server, here's what the log says : ERROR 2019-07-15 11:06:06,862 NetUtil.py:77 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)ERROR 2019-07-15 11:06:06,862 NetUtil.py:78 - SSLError: Failed to connect. Please check openssl library versions.Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details. I have python (2.7.16-2.1) installed. Also python3 is installed, but the executable defaults to version 2.7 : ls -l /usr/bin/python lrwxrwxrwx 1 root root 9 Jun 24 22:12 /usr/bin/python -> python2.7 I would implement this "verify=disable" solution mentioned above, but there is no file named cert-verification.cfg in my system. There is no python dir under etc at all ! So I think maybe python under SuSE might have a different configuration than other distributions. I google'd a lot, but couldn't find a solution for this yet. Can someone guide me thru this issue please ?

YangWang · ‎06-07-2017

Well, it's a good question. I will first explain what is Field Grouping and how Window Bolt works. Then I will give my suggestion on how to solve this problem. 1) Field Grouping and Window Bolt When you use field grouping based some key, storm can guarantee the same key tuple flow into the same bolt. But with window bolt(BaseWindowedBolt), in your case average last 3 event, no matter how many bolt you have, storm will not guarantee the 3 successive key flow into the same bolt within window. I will explain it with the following example: First we define a window bolt, with tumbling window using 3 tuples. Then we connect this window bolt to upstream spout with fieldGrouping using "key". In your case, we use some abbreviation for convenient D stand for deviceid, S stand for siteid, P stand for ip, T stand for timestamp, V stand for val You want the same device_site_ip(D_S_P) data flow into the same bolt within same window, we can combine D_S_P as key, use K stand for it, then we assume the following 15 test data: 1 K1, T1, V1 2 K2, T2, V2 3 K3, T3, V3 4 K1, T4, V4 5 K1, T5, V5 6 K1, T6, V6 7 K3, T7, V7 8 K2, T8, V8 9 K3, T9, V9 10 K4, T10, V10 11 K4, T11, V11 12 K1, T12, V12 13 K4, T13, V13 14 K3, T14, V14 15 K2, T15, V15 Then the core code we want to use as follow: BaseWindowedBolt winBolt = new winBolt().withTumblingWindow(new BaseWindowedBolt.Count(3)); builder.setBolt("winBolt", winBolt, boltCount).fieldsGrouping("spout", new Fields("key")); We assume there are 2 winBolt(Bolt1 and Bolt2), with field grouping and window bolt. Field Grouping will calculate hash code of Key, and put the same hash code key into the same bolt. Let say hash code for K1 is H1, hash code for K2 is H2, hash code for K3 is H3, hash code for K4 is H4. Then all H1 and H3 tuple flow into Bolt1, and H2 and H4 tuple flow into Bolt2. In Bolt1, we have 9 data as follow: 1 K1, T1, V1 3 K3, T3, V3 4 K1, T4, V4 5 K1, T5, V5 6 K1, T6, V6 7 K3, T7, V7 9 K3, T9, V9 12 K1, T12, V12 14 K3, T14, V14 In Bolt2, we have 6 data as follow: 2 K2, T2, V2 8 K2, T8, V8 10 K4, T10, V10 11 K4, T11, V11 13 K4, T13, V13 15 K2, T15, V15 The above result inside each bolt is the result of Field Grouping As for window bolt functionality, with 3 tuple Tumbling Window, you will see the following happens: In Bolt1: First window we got is: 1 K1, T1, V1 3 K3, T3, V3 4 K1, T4, V4 Second window we got is: 5 K1, T5, V5 6 K1, T6, V6 7 K3, T7, V7 Third window we got is: 9 K3, T9, V9 12 K1, T12, V12 14 K3, T14, V14 In Bolt2: First window we got is: 2 K2, T2, V2 8 K2, T8, V8 10 K4, T10, V10 Second window we got is: 11 K4, T11, V11 13 K4, T13, V13 15 K2, T15, V15 Now you can see, it is explain why it's not we expectation that same key into same bolt in same window 2) My suggestion how to solve this problem Now I have 2 method to solve this. The first one is simpler, which only use storm. We don't need Window Bolt, and just use Field Grouping. Guarantee the same key flow into the same bolt worker. Then in the down stream bolt, we use a HashMap, where key is (D_S_P) pair and the data structure of value is a fixed size queue(you should implemented yourself). The fixed size queue memories the last N tuple of this key. Each time there is new tuple flow into this bolt worker, we use it's key to find the fixed size queue of this key and add it into the queue, and if necessary remove the oldest tuple in this queue. After update this key, then we can calculate the current average of the last N tuple, and emit it. The disadvantage of this approach the HashMap is in memory, there is no data persistence. Each time we relaunch the topology we need to construct the HashMap and accumulate the fixed size queue for each key. The second one is to use HBase as a storage. In storm, we don't need Field Grouping and Window Bolt. Just Shuffle grouping is enough for it. For each tuple, after some parsing, we put it into HBase Table. The row key of this Table is (D_S_P) key, and we set the version number of this table as N. So this HBase Table will keep the last N tuple of this key. Then get this N tuple of the current key from HBase, and calculate the average of it, and emit it.

Online	Offline
Last Visited	‎03-12-2020 08:31 AM

Member Since	‎04-03-2019 05:11 PM
Last Visited	‎03-12-2020 08:31 AM
Posts	21
Kudos received	9

Cloudera Community

Re: How to run beeline command against file on HDF...

Re: How to change hive session folder from /tmp to...

Re: Show hidden files in HDFS

Re: can kafka multiple nodes work by itself withou...

Re: Storm filtering kafka events for calculating m...

Re: What is the differences between Hadoop 1, Hado...

Re: How to change hive session folder from /tmp to...

Re: NiFi cannot restart after run out of disk

Re: Show hidden files in HDFS

Re: Ambari agent- [SSL: CERTIFICATE_VERIFY_FAILED]...

Re: Storm filtering kafka events for calculating m...