Member since
09-29-2015
67
Posts
45
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1966 | 05-25-2016 10:24 AM | |
11958 | 05-19-2016 11:24 AM | |
8419 | 05-13-2016 10:09 AM | |
3102 | 05-13-2016 06:41 AM | |
9027 | 03-25-2016 09:15 AM |
06-09-2021
06:42 AM
you can try below set parameters set hive.vectorized.execution.reduce.enabled=false; and set hive.vectorized.execution.enabled=true;
... View more
05-19-2016
07:09 AM
Other very good ways to load data into HDFS is using Flume or Nifi. "Hadoop fs put" is good but it has some limitation or lack of flexibility that might make it difficult to use it in a production environment. If you look at the documentation of the Flume HDFS sink for instance ( http://flume.apache.org/FlumeUserGuide.html#hdfs-sink ), you'll see that Flume lets you define how to rotate the files, how to write the file names etc. Other options can be defined for the source (your local text files) or for the channel. "Hadoop fs put" is more basic and doesn't offer those possibilities.
... View more
07-18-2019
07:12 AM
Hive Import is Complete, but the next line gives the INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory. I have made changes in the sqoop query as i am fetching data from Oracle. When i log into hive the database doesn't has any tables, Please provide appropriate solution to this. Hopeful to hear from you guys.
... View more
03-18-2016
11:06 AM
@Robin Dong As mentioned by Ancil, you might want to have a script to do the sqoop download in parallel. And you need to control quite well how big is your parallelism. Above all if you want to avoid the typical "No more spool space in...". Here's a script to do that: https://community.hortonworks.com/articles/23602/sqoop-fetching-lot-of-tables-in-parallel.html Another problem I saw in Teradata, is that it is some data types are not supported when you try to directly insert the data into Hive from Sqoop. So the solution I took was the traditional one: 1) Sqoop to HDFS. 2) Build external tables on top of them 3) Create ORC file and then insert the data or the external tables
... View more
02-02-2016
02:28 AM
@vbhoomireddy are you still having issues with this? Can you accept the best answer or provide your own solution?
... View more
11-11-2015
04:08 PM
1 Kudo
@Sourygna Luangsay. We used syslogtcp for our project. Which is struggling for between 500-1000 events / seconds. Looks like multiport_syslogtcp uses Apache Mina (https://mina.apache.org/) having High-performance asynchronous TCP library, which provides better throughput on multicore machines even when using single TCP port.
... View more
06-16-2016
10:29 AM
1 Kudo
I got it working on Ambari 2.2.1 1.Create mount points: #mkdir /hadoop/hdfs/data1 /hadoop/hdfs/data2
/hadoop/hdfs/data3 #chown hdfs:hadoop /hadoop/hdfs/data1 /hadoop/hdfs/data2
/hadoop/hdfs/data3 (**We are using
the configuration for test purpose only, so no disks are mounted.) 2.Login to Ambari > HDFS>setting 3.Add datanode directories as shown
below: Datanode>datanode
directories: [DISK]/hadoop/hdfs/data,[SSD]/hadoop/hdfs/data1,[RAMDISK]/hadoop/hdfs/data2,[ARCHIVE]/hadoop/hdfs/data3
Restart hdfs hdfs service. Restart all other afftected services. Create a directory
/cold # su hdfs [hdfs@hdp-qa2-n1 ~]$
hadoop fs -mkdir /cold Set COLD storage policy
on /cold [hdfs@hdp-qa2-n1 ~]$
hdfs storagepolicies -setStoragePolicy -path /cold -policy COLD Set storage policy
COLD on /cold 5. Run get storage
policy: [hdfs@hdp-qa2-n1 ~]$
hdfs storagepolicies -getStoragePolicy -path /cold The storage policy of
/cold: BlockStoragePolicy{COLD:2,
storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
... View more
10-21-2015
09:47 AM
2 Kudos
@cliu@hortonworks.com This is very helpful benchmarks posted by Amplab. Click
... View more
06-28-2019
05:53 AM
Hi, Thanks for the Script, it solves most of my automation problems where i need to compare hive tables. few things i am trying to modify and its not working for me. we have cluster with hive installed on multi node (load balancer is enabled for HS2) , and we are using beeline (instead of hive cli) to execute queries to get data locally. as cluster is enabled for load balancer, it is executing two queries in two different nodes and local data is now in two nodes and script not able to get the actual data and failing. not sure how to make it work only on single node using beeline. Cluster is kerbrose, sentry and hs2 enabled for load balancer
... View more