Member since
08-15-2016
189
Posts
63
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5699 | 01-02-2018 09:11 AM | |
3034 | 12-04-2017 11:37 AM | |
2159 | 10-03-2017 11:52 AM | |
21600 | 09-20-2017 09:35 PM | |
1623 | 09-12-2017 06:50 PM |
01-27-2017
09:02 AM
1 Kudo
@Jacqualin jasmin Please try this from within beeline client:
0 jdbc:hive2://> !run /tmp/test.hql The file does not need to be local to the Hiveserver2, it needs to exist on the node where you run Beeline. check: 0 jdbc:hive2://> !help
too, for many usefull special commands in Beeline
... View more
01-27-2017
08:38 AM
@Reddy Please don't forget to mark the question as answered if it is answered.
... View more
01-26-2017
06:38 PM
@Joshua Petree Don't forget to mark the question as answered, if it is answered
... View more
01-25-2017
09:34 PM
Yes, just enter those OS level paths ( /mnt/data1,/mnt/data2,/mnt/data3 ) as comma separated value in the box for Datadir ('dfs.datadir.data.dir') on the HDFS config page on Ambari. HFDS is just a logical layer on top of the OS level filesystem, so you just hand Ambari/Hadoop the locations on the native OS filesystem where to 'host' HDFS.
... View more
01-25-2017
03:15 PM
1 Kudo
@Joshua Petree It is doable, no problem. You would have to mount the 4 disks to the OS anyway. So mount the OS disk to / and the other 3 HD's to /hadoop/hdfs/data1, /hadoop/hdfs/data2 and /hadoop/hdfs/data3. In Ambari you can set the OS level local folders to be used as HDFS storage like in the screenprint. property = 'dfs.datanode.data.dir'
... View more
01-25-2017
02:57 PM
@Priyan S Maybe it is because you did not set up pre-emption on Yarn? Without pre-emption the order in which the jobs were submitted to Q1,Q2 and Q41 is determining the capacity allocations. It may be that since the jobs on Q1 and Q2 were submitted first, they both grabbed the max allocation of their respective queues (40%). When the job for Q41 comes along there is just no more than the remaining 20% for Q4 and/or 100% for any of its subleafs Q41,42,43,44. I don't get why Q41 is only getting 10% and not 20%. You can look upon pre-emption as a way to help restore the state in which all queues get at least their minimum configured allocation, even though the missing part for one queue A. operating under its minimum might be used by another queue B. operating above its minimum (since it grabbed the excess capacity, up to its maximum, because it was still available at that time). Without pre-emption queue A. would have to wait for queue B. to release capacity of finished jobs in B. With pre-emption Yarn will actively free up resources of B. to allocate to A. in the process it might even kill job parts in queue B. to do so.
... View more
01-25-2017
02:31 PM
1 Kudo
In my opinion it is best to still regard Hive as an analytical DB. With the ACID (updates) and streaming features the community is stretching the tool to things it wasn't designed for. These are not to be used at very large scale and very large loads. ACID and streaming will put tremendous strain on the Hive metastore. In the end the native storage model of Hive is still based on streaming through whole HDFS files, even with ORC. Without true indexes Hive will never be a real good match for high transactional workloads. Doing large analytical sweeps/scans through data is still at odds with high speed random read/write/update/delete. But that is not bad, there are just other components in HDP to do the other jobs right.
... View more
01-25-2017
01:42 PM
1 Kudo
@Reddy Because it is an external table there is no one-liner to do it. That is probably the whole point of having external tables So you need to do ALTER TABLE some.table DROP PARTITION (part="some") PURGE;
and hdfs dfs -rm -R /path/to/table/basedir
I put the 'PURGE' in there intentionally. It would work for non-external tables, but just not for external tables.
... View more
01-25-2017
12:02 PM
With the help of the remarks by @Aaron Dossett I found a solution to this. Knowing that Storm does not mark the hdfs file currently being written to, and the .addRotationAction not robust enough in extreme cases I turned to a low level solution. HDFS can report the files on a path that are open for write: hdfs fsck <storm_hdfs_state_output_path> -files -openforwrite or alternatively you can just list only NON open files on a path: hdfs fsck <storm_hdfs_state_output_path> -files The output is quite verbose but you can use sed or awk to get closed/completed files from there. (Java HDFS api has similar hooks, this is just for CLI level solution)
... View more
01-17-2017
12:03 PM
@Arun Mahadevan @Aaron Dossett @Sriharsha Chintalapani I am kind of confused right now. So let me rephrase what I got so far in my own words: Whereas Trident can have strong exactly-once semantics for persisting stream aggregates and tuples making it to any HDFS file, the action of rotating the file itself is not protected by these same strong guarantees? Or is the rotation protected by exactly-once but not the .addRotationAction attached to it? It is just not clear in the documentation: https://github.com/apache/storm/tree/master/external/storm-hdfs#hdfs-bolt-support-for-trident-api Suppose, the file rotation is exactly-once then it could work to have the syncpolicy set to the exact same size limit as the size-based rotation policy. That way the files will only be visible to HDFS clients (synced) when that size limit is met.
... View more