Member since
05-16-2016
785
Posts
114
Kudos Received
39
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2328 | 06-12-2019 09:27 AM | |
| 3579 | 05-27-2019 08:29 AM | |
| 5725 | 05-27-2018 08:49 AM | |
| 5243 | 05-05-2018 10:47 PM | |
| 3113 | 05-05-2018 07:32 AM |
12-16-2017
06:16 PM
I'm loocking by best practices for architecture and naming hdfs file path names for naming taxonomy considering the user are analytical users who implement data preparation and data modeling process? I appreciate to share tips to desing a service on HDFS with overwrite strategy enough to get easy and friendly data model for train analytical and statistical models process in an modeling as a service. For instance to get files with +3000 columns and storage more than 48 months of history. any tip to manage huge volumen of data.
... View more
12-14-2017
07:04 PM
1 Kudo
I dont kow if you have a custom trigger or a built in trigger for health test. Is the health test showing warning or critical or bad ? either way the test is to find the data locality in the host . " Make sure that Impala Daemon is co-located with a DataNode, and that the IP address of each Impala Daemon matches the IP address of its co-located DataNode" Please make sure if you have enabled the below properties in hdfs-site.xml <property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
<value>10000</value>
</property> Reference https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_config_performance.html#config_performance
... View more
12-09-2017
06:51 AM
hello everyone. i had solved the problem. the cause is MySql database encoding, run the following command solved the problem "alter database hive character set latin1" . dropping the "hive" database and regenerate "hive" database. Thanks !.
... View more
12-05-2017
09:05 AM
Can anyone please suggest me where should I start? Well I started it off with Apache Hadoop binary on Ubuntu Then moved on to manual installation of Cloudera hadoop finally landed in Cloudera manager . to start with I would suggest you from apache hadoop manual deployment and then side by side Cloudera Quickstart to explore other cdh eco systesm like hive , impala , many more . Do you recommend taking training courses with cloud era in order to eventually build an career in this area? That totally depends on to be frank to bring up a cluster i mean single node cluster :)) it took me a good20 days because i am from Java J2ee development guy had to roll up my sleeves to vmare , Linux Os then hang on to hadoop. Cloudera community is pretty active so we have your back on any troubleshooting :))) Welcome to the Cloudera Hadoop Community .
... View more
12-05-2017
08:58 AM
did you try runining the query in the hive shell or beeline ? was your hiveserver2 up and runining during the query execution time ? you may want to take peek in this current settings in your cluster hive.server2.session.check.interval
hive.server2.idle.operation.timeout
hive.server2.idle.session.timeout
... View more
11-27-2017
08:12 PM
below is the example let me know if that works for you. You can use Range or hash partition , also we can perform Range as well as hash partition together or just hash partition by using bucket . Below is the table that has primary key that coloum id we are using for partition (that is a good practice ) CREATE TABLE customersDetails (
state STRING,
PRIMARY KEY (state, name)
)
PARTITION BY RANGE (state)
(
PARTITION VALUE = 'al',
PARTITION VALUE = 'ak',
PARTITION VALUE = 'wv',
PARTITION VALUE = 'wy'
)
STORED AS KUDU;
... View more
11-12-2017
10:26 PM
Also some character case issue, you should follow flume official document to do configure. http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
... View more
10-31-2017
01:39 AM
1 Kudo
It looks like the instructions I had posted in my earlier post, are too vague!. So I thought of adding some screen shots to make it easier to understand. 1. From Cloudera Manager HOME, select the 'HDFS'. 2. Once inside the HDFS home page, click on 'Instances' menu. 3. Once the instances that make up the HDFS component of the CDH are listed, click on the Role Type 'NameNode'. We do NOT have to select 'NameNode', using the check box (this is what i was doing initially), but just click on the 'NameNode' itself. 4. Once the NameNode instance's page comes up, select the 'Action' menu to find the 'FORMAT' option!. Click on it to perform the format of the 'NameNode'.
... View more
10-19-2017
07:15 PM
there are couple of places that needsd tuining in the query level 1 . stats for the table is must for good performance 2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller 3. you can also use HINTS to imporve query performance. 4. hive table's file format is big a factor 5. choosing when to use paritioning vs bucketing. 6.allocate good memory to hiveserver2 and metastore 7.heapsize 8 .load balancer on the host https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bfd_pr
... View more
10-15-2017
02:05 PM
I had a problem in my /etc/hosts file. Fixed that and now hostname -f is fine and I'm able to go past the issue with 8022 port. Cautionary note: don't mess with existing lines in /etc/hosts file if you need to edit this file. Add to the file , don't update.
... View more