About csguna

jarteaga · ‎12-16-2017

I'm loocking by best practices for architecture and naming hdfs file path names for naming taxonomy considering the user are analytical users who implement data preparation and data modeling process? I appreciate to share tips to desing a service on HDFS with overwrite strategy enough to get easy and friendly data model for train analytical and statistical models process in an modeling as a service. For instance to get files with +3000 columns and storage more than 48 months of history. any tip to manage huge volumen of data.

csguna · ‎12-14-2017

I dont kow if you have a custom trigger or a built in trigger for health test. Is the health test showing warning or critical or bad ? either way the test is to find the data locality in the host . " Make sure that Impala Daemon is co-located with a DataNode, and that the IP address of each Impala Daemon matches the IP address of its co-located DataNode" Please make sure if you have enabled the below properties in hdfs-site.xml <property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hdfs-sockets/dn</value> </property> <property> <name>dfs.client.file-block-storage-locations.timeout.millis</name> <value>10000</value> </property> Reference https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_config_performance.html#config_performance

eniac27427 · ‎12-09-2017

hello everyone. i had solved the problem. the cause is MySql database encoding, run the following command solved the problem "alter database hive character set latin1" . dropping the "hive" database and regenerate "hive" database. Thanks !.

csguna · ‎12-05-2017

Can anyone please suggest me where should I start? Well I started it off with Apache Hadoop binary on Ubuntu Then moved on to manual installation of Cloudera hadoop finally landed in Cloudera manager . to start with I would suggest you from apache hadoop manual deployment and then side by side Cloudera Quickstart to explore other cdh eco systesm like hive , impala , many more . Do you recommend taking training courses with cloud era in order to eventually build an career in this area? That totally depends on to be frank to bring up a cluster i mean single node cluster :)) it took me a good20 days because i am from Java J2ee development guy had to roll up my sleeves to vmare , Linux Os then hang on to hadoop. Cloudera community is pretty active so we have your back on any troubleshooting :))) Welcome to the Cloudera Hadoop Community .

csguna · ‎12-05-2017

did you try runining the query in the hive shell or beeline ? was your hiveserver2 up and runining during the query execution time ? you may want to take peek in this current settings in your cluster hive.server2.session.check.interval hive.server2.idle.operation.timeout hive.server2.idle.session.timeout

csguna · ‎11-27-2017

below is the example let me know if that works for you. You can use Range or hash partition , also we can perform Range as well as hash partition together or just hash partition by using bucket . Below is the table that has primary key that coloum id we are using for partition (that is a good practice ) CREATE TABLE customersDetails ( state STRING, PRIMARY KEY (state, name) ) PARTITION BY RANGE (state) ( PARTITION VALUE = 'al', PARTITION VALUE = 'ak', PARTITION VALUE = 'wv', PARTITION VALUE = 'wy' ) STORED AS KUDU;

Harry · ‎11-12-2017

Also some character case issue, you should follow flume official document to do configure. http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

clouderakruzer · ‎10-31-2017

It looks like the instructions I had posted in my earlier post, are too vague!. So I thought of adding some screen shots to make it easier to understand. 1. From Cloudera Manager HOME, select the 'HDFS'. 2. Once inside the HDFS home page, click on 'Instances' menu. 3. Once the instances that make up the HDFS component of the CDH are listed, click on the Role Type 'NameNode'. We do NOT have to select 'NameNode', using the check box (this is what i was doing initially), but just click on the 'NameNode' itself. 4. Once the NameNode instance's page comes up, select the 'Action' menu to find the 'FORMAT' option!. Click on it to perform the format of the 'NameNode'.

csguna · ‎10-19-2017

there are couple of places that needsd tuining in the query level 1 . stats for the table is must for good performance 2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller 3. you can also use HINTS to imporve query performance. 4. hive table's file format is big a factor 5. choosing when to use paritioning vs bucketing. 6.allocate good memory to hiveserver2 and metastore 7.heapsize 8 .load balancer on the host https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bfd_pr

CVora · ‎10-15-2017

I had a problem in my /etc/hosts file. Fixed that and now hostname -f is fine and I'm able to go past the issue with 8022 port. Cautionary note: don't mess with existing lines in /etc/hosts file if you need to edit this file. Add to the file , don't update.

Online	Offline
Last Visited	‎10-28-2024 06:24 AM

Member Since	‎05-16-2016 09:33 PM
Last Visited	‎10-28-2024 06:24 AM
Posts	785
Kudos received	112

Cloudera Community

Re: Kerberos / Sentry Integration

Re: How to upgrade Hive from 2.1 to 3.0 via CDH 6....

Re: How does nameservice id works for HA, how does...

Re: What license does the express edition fall und...

Re: Sqoop2 over Sqoop1 in CDH6

Best practice architecture and naming hdfs path na...

Re: Impala Assignment locality concerning

Re: Hive Metastore fails to start for a newly inst...

Re: Career Change! Advice please :-)

Re: Getting connection reset exception when the hi...

Re: Error during CREATE KUDU table using IMPALA

Re: multiple sources of flume agent

Re: Got error java.io.IOException: NameNode is not...

Re: Adding nodes will improve performance ?

Re: CDH installation: Failed to create HDFS direct...