Member since
04-03-2019
21
Posts
8
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3967 | 10-13-2017 10:56 AM | |
2851 | 10-12-2017 11:45 AM | |
9810 | 09-29-2017 08:32 AM | |
1876 | 09-27-2017 04:37 AM | |
1324 | 06-07-2017 03:23 PM |
10-24-2017
10:35 AM
you are welcome
... View more
10-24-2017
10:11 AM
@hanzhi Zhang You could try to add a quote mark on the column name, and the Create DDL statement would be like: CREATE TABLE yourtablename ("data.addtime" VARCHAR NOT NULL PRIMARY KEY, "data._dir" VARCHAR, "data.end_time" VARCHAR, "data.file" VARCHAR, "data.fk_log" VARCHAR, "data.host" VARCHAR, "data.r" VARCHAR, "data.size" VARCHAR, "data.start_time" VARCHAR);
... View more
10-24-2017
08:35 AM
@Erkan ŞİRİN You could try this: /usr/maven/apache-maven-3.5.0/bin/mvn clean package -Dhttp.proxyHost=<your proxy ip> -Dhttp.proxyPort=<your proxy port> -DskipTests It's works for me.
... View more
10-24-2017
08:30 AM
@Austin Hackett Both using Ambari or manual steps with command line are valid.
... View more
10-14-2017
05:01 AM
@viswanath Can you post your sql statement the select query
... View more
10-13-2017
10:56 AM
@Daniel Perry Beeline expects the HQL file to be local file system. So If your HQL file is in HDFS, you should first download it to local file system, then feed it to Beeline.
... View more
10-12-2017
11:45 AM
Solution: Go to Ambari Hive Configs page. Then find Custom hive-site Add following arguments: hive.downloaded.resources.dir=/tmp/hive/${hive.session.id}_resources Then save the modification and do a Rolling Restart of related component. We can delete all ages {session_id}_resources folder under /tmp, and start a hive client, run some SQL, then check the new session folder will be generated in /tmp/hive/ folder.
... View more
10-12-2017
11:39 AM
I found there are 10,000+ folder owned by hive:hadoop under /tmp locally, folder name is {session_id}_resources(eg. 0003f01d-07e5-4caa-bb4f-61e62d35f426_resources). And the folder is empty, can I delete them? And how can I config to store them in another folder? They have been there 1 month ago. No one use these folder anymore. I think it is created by hive session. I found JIRA describe similar bug: https://issues.apache.org/jira/browse/HIVE-4546, but this bug has been fixed from hive version 0.12.0. The hive version here is 1.2.1.2.6. It shouldn't have this issue again. Any idea on how to prevent leaving these hive session folder? Or is there a way to have these in a subfolder, e.g. /tmp/hive_session/0003f01d-07e5-4caa-bb4f-61e62d35f426_resources/
... View more
Labels:
- Labels:
-
Apache Hive
10-03-2017
02:56 PM
NIFI run out of disk, when I try to restart NiFi, I got the following error in nifi-app.log: ERROR [main] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@2f6b3859 unexpectedly reached End-of-File when reading from Partition-88 for Transaction ID 175735849; assuming crash and ignoring this transaction I found this article on the similar issue,https://community.hortonworks.com/questions/135687/nifi-not-getting-started-in-cluster.html This article suggest to delete flowfile_repository to restore it. Do we have more smooth solution, which do not need to delete flowfile_repository and content_repository? Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-02-2017
01:07 PM
this one works for me, and don't need to regenerate key for ambari server and other ambari agents
... View more
10-01-2017
05:33 PM
When I try on the HDPCA practice exam image, I get the same issue. And I have check the other node like node2, when use service ntpd start
Starting ntpd: ntpd: error while loading shared libraries: libm.so.6: cannot open shared object file: Permission denied
[FAILED]
It will get the same error. So, just ignore it.
... View more
09-29-2017
09:07 AM
@Mateusz Grabowski You don't need to remove the ambari-agent. You can create a dedicated user for ambari agent, like "ambari" user. Then grant sudo permission to that user, and config the ambari agent property in the /etc/ambari-agent/conf/ambari-agent.ini file, as illustrated below: run_as_user=ambari There is a reference for how to configure an ambari agent for non-root https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-security/content/how_to_configure_an_ambari_agent_for_non-root.html
... View more
09-29-2017
08:32 AM
2 Kudos
@vsubramanian hdfs dfs -ls /user/hdfs The above can direct show hidden files, you can directly see the below, for example drwx------ - hdfs hdfs 0 2017-07-13 02:00 /user/hdfs/.Trash
drwxr-xr-x - hdfs hdfs 0 2017-04-06 14:21 /user/hdfs/.hiveJars
drwxr-xr-x - hdfs hdfs 0 2017-06-29 09:12 /user/hdfs/.sparkStaging
drwxr-xr-x - hdfs hdfs 0 2017-04-24 15:54 /user/hdfs/SSP00805
where you can see the file start with a dot.
... View more
09-29-2017
08:21 AM
this one also helpful: http://data-flair.training/blogs/hadoop-2-x-vs-hadoop-3-x-comparison/
... View more
09-29-2017
08:02 AM
@Eon kitex The HDFS can try to recover the under replicated blocks automatically. If after a few days it doesn't fix itself, I suggest you can run the "setrep" command again. And can you check all your data nodes and all disks on them are healthy and running, such as disk have enough space?
... View more
09-29-2017
07:47 AM
@Riddhi Sam I found these two article, one compare for Hadoop1 Vs Hadoop2, and one compare for Hadoop2 VS Hadoop3 Hope it can answer your question.
... View more
09-28-2017
02:06 PM
@Robin Dong That why I suggest you to use HDF, which you can only install zookeeper and kafka.
... View more
09-28-2017
10:36 AM
@Robin Dong The HDP is also free, and ambari agent don't consume much resource. Feel free to use it.
... View more
09-27-2017
04:37 AM
3 Kudos
Hi @Robin Dong Yes, you definitely can install Kafka itself(yes, also need zookeeper) as a cluster. You can check this Kafka Multi Broker Doc as a reference. As for HDP cluster, I think you have some misunderstanding of Hortonworks Data Platform(HDP) cluster. The Kafka is already a cluster. And Zookeeper also works as a cluster. The HDP is a Hadoop Distribution, and it use Ambari to help you manage the different components in your cluster in a single page. And HDP can be highly costumed, you can only install Kafka and Zookeeper when you install the cluster. It's very convenient when HDP use Ambari to install those components. So indeed you can install the Kafka and Zookeeper manually, I suggest you install them with HDP, because it quite easy and it can automatically help you integrate Kafka and Zookeeper together. And with ambari view, you can see many different metrics of Kafka and Zookeeper which can help you to check the health of your cluster. If you more emphasis on the Data Stream. I suggest you to try Hortonworks Data Flow(HDF) . Because the main components in HDF is Kafka/Storm/Zookeeper/NiFi. And also you can tailor HDF by yourself. Cheers,
... View more
06-07-2017
03:23 PM
3 Kudos
Well, it's a good question. I will first explain what is Field Grouping and how Window Bolt works. Then I will give my suggestion on how to solve this problem. 1) Field Grouping and Window Bolt When you use field grouping based some key, storm can guarantee the same key tuple flow into the same bolt. But with window bolt(BaseWindowedBolt), in your case average last 3 event, no matter how many bolt you have, storm will not guarantee the 3 successive key flow into the same bolt within window. I will explain it with the following example: First we define a window bolt, with tumbling window using 3 tuples. Then we connect this window bolt to upstream spout with fieldGrouping using "key". In your case, we use some abbreviation for convenient D stand for deviceid, S stand for siteid, P stand for ip, T stand for timestamp, V stand for val You want the same device_site_ip(D_S_P) data flow into the same bolt within same window, we can combine D_S_P as key, use K stand for it, then we assume the following 15 test data: 1 K1, T1, V1
2 K2, T2, V2
3 K3, T3, V3
4 K1, T4, V4
5 K1, T5, V5
6 K1, T6, V6
7 K3, T7, V7
8 K2, T8, V8
9 K3, T9, V9
10 K4, T10, V10
11 K4, T11, V11
12 K1, T12, V12
13 K4, T13, V13
14 K3, T14, V14
15 K2, T15, V15 Then the core code we want to use as follow: BaseWindowedBolt winBolt = new winBolt().withTumblingWindow(new BaseWindowedBolt.Count(3));
builder.setBolt("winBolt", winBolt, boltCount).fieldsGrouping("spout", new Fields("key")); We assume there are 2 winBolt(Bolt1 and Bolt2), with field grouping and window bolt. Field Grouping will calculate hash code of Key, and put the same hash code key into the same bolt. Let say hash code for K1 is H1, hash code for K2 is H2, hash code for K3 is H3, hash code for K4 is H4. Then all H1 and H3 tuple flow into Bolt1, and H2 and H4 tuple flow into Bolt2. In Bolt1, we have 9 data as follow: 1 K1, T1, V1
3 K3, T3, V3
4 K1, T4, V4
5 K1, T5, V5
6 K1, T6, V6
7 K3, T7, V7
9 K3, T9, V9
12 K1, T12, V12
14 K3, T14, V14 In Bolt2, we have 6 data as follow: 2 K2, T2, V2
8 K2, T8, V8
10 K4, T10, V10
11 K4, T11, V11
13 K4, T13, V13
15 K2, T15, V15
The above result inside each bolt is the result of Field Grouping As for window bolt functionality, with 3 tuple Tumbling Window, you will see the following happens: In Bolt1: First window we got is: 1 K1, T1, V1
3 K3, T3, V3
4 K1, T4, V4 Second window we got is: 5 K1, T5, V5
6 K1, T6, V6
7 K3, T7, V7 Third window we got is: 9 K3, T9, V9
12 K1, T12, V12
14 K3, T14, V14 In Bolt2: First window we got is: 2 K2, T2, V2
8 K2, T8, V8
10 K4, T10, V10 Second window we got is: 11 K4, T11, V11
13 K4, T13, V13
15 K2, T15, V15 Now you can see, it is explain why it's not we expectation that same key into same bolt in same window 2) My suggestion how to solve this problem Now I have 2 method to solve this. The first one is simpler, which only use storm. We don't need Window Bolt, and just use Field Grouping. Guarantee the same key flow into the same bolt worker. Then in the down stream bolt, we use a HashMap, where key is (D_S_P) pair and the data structure of value is a fixed size queue(you should implemented yourself). The fixed size queue memories the last N tuple of this key. Each time there is new tuple flow into this bolt worker, we use it's key to find the fixed size queue of this key and add it into the queue, and if necessary remove the oldest tuple in this queue. After update this key, then we can calculate the current average of the last N tuple, and emit it. The disadvantage of this approach the HashMap is in memory, there is no data persistence. Each time we relaunch the topology we need to construct the HashMap and accumulate the fixed size queue for each key. The second one is to use HBase as a storage. In storm, we don't need Field Grouping and Window Bolt. Just Shuffle grouping is enough for it. For each tuple, after some parsing, we put it into HBase Table. The row key of this Table is (D_S_P) key, and we set the version number of this table as N. So this HBase Table will keep the last N tuple of this key. Then get this N tuple of the current key from HBase, and calculate the average of it, and emit it.
... View more