About sluangsay

Ak99 · ‎06-09-2021

you can try below set parameters set hive.vectorized.execution.reduce.enabled=false; and set hive.vectorized.execution.enabled=true;

sluangsay · ‎05-19-2016

Other very good ways to load data into HDFS is using Flume or Nifi. "Hadoop fs put" is good but it has some limitation or lack of flexibility that might make it difficult to use it in a production environment. If you look at the documentation of the Flume HDFS sink for instance ( http://flume.apache.org/FlumeUserGuide.html#hdfs-sink ), you'll see that Flume lets you define how to rotate the files, how to write the file names etc. Other options can be defined for the source (your local text files) or for the channel. "Hadoop fs put" is more basic and doesn't offer those possibilities.

rexson_raymond · ‎07-18-2019

Hive Import is Complete, but the next line gives the INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory. I have made changes in the sqoop query as i am fetching data from Oracle. When i log into hive the database doesn't has any tables, Please provide appropriate solution to this. Hopeful to hear from you guys.

sluangsay · ‎03-18-2016

@Robin Dong As mentioned by Ancil, you might want to have a script to do the sqoop download in parallel. And you need to control quite well how big is your parallelism. Above all if you want to avoid the typical "No more spool space in...". Here's a script to do that: https://community.hortonworks.com/articles/23602/sqoop-fetching-lot-of-tables-in-parallel.html Another problem I saw in Teradata, is that it is some data types are not supported when you try to directly insert the data into Hive from Sqoop. So the solution I took was the traditional one: 1) Sqoop to HDFS. 2) Build external tables on top of them 3) Create ORC file and then insert the data or the external tables

aervits · ‎02-02-2016

@vbhoomireddy are you still having issues with this? Can you accept the best answer or provide your own solution?

Jagatheeshr · ‎11-11-2015

@Sourygna Luangsay. We used syslogtcp for our project. Which is struggling for between 500-1000 events / seconds. Looks like multiport_syslogtcp uses Apache Mina (https://mina.apache.org/) having High-performance asynchronous TCP library, which provides better throughput on multicore machines even when using single TCP port.

Kkalita · ‎06-16-2016

I got it working on Ambari 2.2.1 1.Create mount points: #mkdir /hadoop/hdfs/data1 /hadoop/hdfs/data2 /hadoop/hdfs/data3 #chown hdfs:hadoop /hadoop/hdfs/data1 /hadoop/hdfs/data2 /hadoop/hdfs/data3 (**We are using the configuration for test purpose only, so no disks are mounted.) 2.Login to Ambari > HDFS>setting 3.Add datanode directories as shown below: Datanode>datanode directories: [DISK]/hadoop/hdfs/data,[SSD]/hadoop/hdfs/data1,[RAMDISK]/hadoop/hdfs/data2,[ARCHIVE]/hadoop/hdfs/data3 Restart hdfs hdfs service. Restart all other afftected services. Create a directory /cold # su hdfs [hdfs@hdp-qa2-n1 ~]$ hadoop fs -mkdir /cold Set COLD storage policy on /cold [hdfs@hdp-qa2-n1 ~]$ hdfs storagepolicies -setStoragePolicy -path /cold -policy COLD Set storage policy COLD on /cold 5. Run get storage policy: [hdfs@hdp-qa2-n1 ~]$ hdfs storagepolicies -getStoragePolicy -path /cold The storage policy of /cold: BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}

nsabharwal · ‎10-21-2015

@cliu@hortonworks.com This is very helpful benchmarks posted by Amplab. Click

iranna40 · ‎06-28-2019

Hi, Thanks for the Script, it solves most of my automation problems where i need to compare hive tables. few things i am trying to modify and its not working for me. we have cluster with hive installed on multi node (load balancer is enabled for HS2) , and we are using beeline (instead of hive cli) to execute queries to get data locally. as cluster is enabled for load balancer, it is executing two queries in two different nodes and local data is now in two nodes and script not able to get the actual data and failing. not sure how to make it work only on single node using beeline. Cluster is kerbrose, sentry and hs2 enabled for load balancer

Online	Offline
Last Visited	‎05-30-2016 01:32 PM

Member Since	‎09-29-2015 07:44 AM
Last Visited	‎05-30-2016 01:32 PM
Posts	67
Kudos received	45

Cloudera Community

Re: Data Processing Using Pig from local to HDFS

Re: "Number of reduce tasks is set to 0 since ther...

Re: Sqoop import : composite primary key and textu...

Re: can we create a facts and dimensional tables i...

Re: Hive QL - Aggregating within a group

Re: Solution for "Hive Runtime Error while process...

Re: Load data to HDFS & Data Transformation with S...

Re: Using Sqoop to fetch many tables in parallel

Re: we are going to extract data from teradata to ...

Re: Spark 1.5 with Zeppelin - but sc.version print...

Re: Benchmarks for Flume sink multiport_syslogtcp

Re: How to configure storage policy in Ambari?

Re: Can you please advise about how best to use th...

Re: Create a Hive Script to Validate Tables