Member since
08-12-2016
39
Posts
7
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1322 | 03-27-2017 01:36 PM | |
875 | 03-21-2017 07:44 PM | |
5104 | 03-21-2017 11:31 AM |
01-05-2018
10:33 AM
whatever I tried, all yarn applications ended up in the default queue these are the things I tried: 1) setting property in the titan-hbase-solr.properties (none of the following worked)
mapred.job.queue.name=myqueue mapreduce.job.queue.name=myqueue mapred.mapreduce.job.queue.name=myqueue 2) setting property in the gremlin shell gremlin> graph = TitanFactory.open("/usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties") gremlin> mgmt = graph.openManagement() gremlin> desc = mgmt.getPropertyKey("desc2") gremlin> mr = new MapReduceIndexManagement(graph) gremlin> mgmt.set('gremlin.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: hadoop
gremlin> mgmt.set('hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: hadoop
Display stack trace? [yN] n gremlin> mgmt.set('titan.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: titan
Display stack trace? [yN] n gremlin> mgmt.set('mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: mapred
Display stack trace? [yN] n
gremlin>
gremlin> mgmt.set('mapreduce.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: mapreduce
Display stack trace? [yN] n gremlin> mgmt.set('gremlin.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: mapred
Display stack trace? [yN] n gremlin> mgmt.set('gremlin.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: hadoop
Display stack trace? [yN] n
gremlin>
... View more
Labels:
- Labels:
-
Apache YARN
08-22-2017
04:00 PM
Thanks for your response, @bkosaraju, can you give me an example of any of these options you mentioned?
... View more
08-22-2017
01:16 PM
1 Kudo
For testing purposes I want to create very large number, let's say 1 million empty directories in hdfs. What I tried to do is use `hdfs dfs -mkdir`, to create 8K directories and repeat this in a for loop. for i in {1..125}
do
dirs=""
for j in {1..8000}; do
dirs="$dirs /user/d$i.$j"
done
echo "$dirs"
hdfs dfs -mkdir $dirs
done
Apparently it takes hours to create 1M folders this way. My question is, what would be the fastest way to create 1M empty folders?
... View more
Labels:
- Labels:
-
Apache Hadoop
03-28-2017
11:53 AM
What do you exactly mean by "if an user arun is trying to access hdfs"? Are you trying to access a file/folder with the "hadoop fs" command while you are logged into linux as user "arun"?
... View more
03-27-2017
01:36 PM
Not sure if I got your question right, the CreateEvent will contain the HDFS path. The name of the file is part of the HDFS path (comes after the last '/'). I hope this answers your question. see the inotify patch here, https://issues.apache.org/jira/secure/attachment/12665452/HDFS-6634.9.patch#file-8 /**
* Sent when a new file is created (including overwrite).
*/
public static class CreateEvent extends Event {
public static enum INodeType {
FILE, DIRECTORY, SYMLINK;
}
private INodeType iNodeType;
private String path;
private long ctime;
private int replication;
private String ownerName;
private String groupName;
private FsPermission perms;
private String symlinkTarget;
... View more
03-23-2017
03:54 PM
1 Kudo
This question is too board in this form. You need to understand this: if you want to get advise on which solution (computing engine) to choose, you should give a descrption first on what you are trying to accomplish, what kind of problem are you trying to solve, what is the nature of your workload.
... View more
03-23-2017
03:49 PM
If I understand correctly, you say you have large tables (3 million records) return a query like this relatively fast: Select * from example_table Limit 10 or Where serial = “SomeID” but when you run similar query against an external table stored on AWS S3, it performs badly. Did you try to copy table data file to hdfs, and then create an external table on the hdfs file? I bet that could make a big difference in the performance. I assume the difference is because in case the table data is stored on S3, hive first needs to copy the data from S3 onto a node where hive runs and the speed of that operation will depend on network bandwidth available.
... View more
03-23-2017
02:56 PM
1) store PDF files in HDFS It would be possible to store your individual PDF files in HDFS and have the HDFS path as an additional field, stored in the Solr index. What you need to consider here, HDFS is best at storing small number of very large files, so it is not effective to store large number of relatively small PDF files in HDFS. 2) store PDF files in HBase It would also be possible to store the PDF files in a object store, like HBase. This is an option that is definitely feasible and I have seen several real life implementation of this design. In this case, you would store the HBase id in the Solr index. 3) store PDF files in the Solr index itself I think it is also possible to store the original PDF file in the Solr index as well. You would use a BinaryField type and you would set the stored property to true. (Note that you could even accomplish the same with older version of Solr, lacking the BinaryField type. In this case, you would have to convert your PDF into text (e.g. with base64 encoding) then store this text value in a stored=true field. Upon retrieval, you would convert it back to PDF). Without an estimation on the number of PDF files and the average size of a PDF, it would be hard to choose the best design. It could be also in important factor if you want to update your documents frequently or you just add to to the index once and then they won't change anymore.
... View more
03-22-2017
10:18 AM
have you copied the jar file to the hdfs? if you run this command, what is the result? hadoop fs -ls /path/to/your/spark.jar
... View more
03-21-2017
07:44 PM
1 Kudo
linux username/password: root/hadoop this will also help: https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
... View more