Member since
08-12-2016
39
Posts
7
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
765 | 03-27-2017 01:36 PM | |
406 | 03-21-2017 07:44 PM | |
2915 | 03-21-2017 11:31 AM |
01-05-2018
10:33 AM
whatever I tried, all yarn applications ended up in the default queue these are the things I tried: 1) setting property in the titan-hbase-solr.properties (none of the following worked)
mapred.job.queue.name=myqueue mapreduce.job.queue.name=myqueue mapred.mapreduce.job.queue.name=myqueue 2) setting property in the gremlin shell gremlin> graph = TitanFactory.open("/usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties") gremlin> mgmt = graph.openManagement() gremlin> desc = mgmt.getPropertyKey("desc2") gremlin> mr = new MapReduceIndexManagement(graph) gremlin> mgmt.set('gremlin.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: hadoop
gremlin> mgmt.set('hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: hadoop
Display stack trace? [yN] n gremlin> mgmt.set('titan.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: titan
Display stack trace? [yN] n gremlin> mgmt.set('mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: mapred
Display stack trace? [yN] n
gremlin>
gremlin> mgmt.set('mapreduce.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root]: mapreduce
Display stack trace? [yN] n gremlin> mgmt.set('gremlin.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: mapred
Display stack trace? [yN] n gremlin> mgmt.set('gremlin.hadoop.mapred.job.queue.name', 'myqueue') Unknown configuration element in namespace [root.gremlin]: hadoop
Display stack trace? [yN] n
gremlin>
... View more
Labels:
- Labels:
-
Apache YARN
09-14-2017
12:08 PM
1 Kudo
1) Start Atlas in debug mode first you want to add extra JVM options in the startup script , so in atlas_start.py replace this line DEFAULT_JVM_OPTS="-Dlog4j.configuration=atlas-log4j.xml -Djava.net.preferIPv4Stack=true -server" with this DEFAULT_JVM_OPTS="-Dlog4j.configuration=atlas-log4j.xml -Djava.net.preferIPv4Stack=true -server -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=54371,server=y,suspend=y "
Now, when you start Atlas, it will hang until you connect with the debugger (because of the suspend=y). 2) connect from Eclipse remote debugger Make sure you have imported the Atlas project into Eclipse based on this document: http://atlas.apache.org/EclipseSetup.html Then create a new debug configurations under the following menu: /Run/Debug Configurations... Make sure the port is set to the same above (54371) and connection type is Standard (socket attach) Use Eclipse JDT launcher.
... View more
08-22-2017
04:00 PM
Thanks for your response, @bkosaraju, can you give me an example of any of these options you mentioned?
... View more
08-22-2017
01:16 PM
1 Kudo
For testing purposes I want to create very large number, let's say 1 million empty directories in hdfs. What I tried to do is use `hdfs dfs -mkdir`, to create 8K directories and repeat this in a for loop. for i in {1..125}
do
dirs=""
for j in {1..8000}; do
dirs="$dirs /user/d$i.$j"
done
echo "$dirs"
hdfs dfs -mkdir $dirs
done
Apparently it takes hours to create 1M folders this way. My question is, what would be the fastest way to create 1M empty folders?
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
04-20-2017
12:57 PM
it is not clear, what is being asked here. @manyatha reddy, could you please be more specific?
... View more
03-28-2017
11:53 AM
What do you exactly mean by "if an user arun is trying to access hdfs"? Are you trying to access a file/folder with the "hadoop fs" command while you are logged into linux as user "arun"?
... View more
03-28-2017
11:49 AM
Assuming that you will want to connect in direct, binary transport mode, in a nonsecure env, this is how the jdbc connection string should look like (this is what you have tried): jdbc:hive2://<host>:<port>/<db> If your hiveserver runs on m1.hdp.local:10000, and the database name is default, then the connection string you tried should have worked: jdbc:hive2://m1.hdp.local:10000/default Since it did not work, I suppose you hiveserver has different port or runs on an other host. You should be able to check these in Ambari. Please consult to these docs on the various connection modes and corresponding connection strings. https://community.hortonworks.com/articles/4103/hiveserver2-jdbc-connection-url-examples.html https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLs
... View more
03-27-2017
01:36 PM
Not sure if I got your question right, the CreateEvent will contain the HDFS path. The name of the file is part of the HDFS path (comes after the last '/'). I hope this answers your question. see the inotify patch here, https://issues.apache.org/jira/secure/attachment/12665452/HDFS-6634.9.patch#file-8 /**
* Sent when a new file is created (including overwrite).
*/
public static class CreateEvent extends Event {
public static enum INodeType {
FILE, DIRECTORY, SYMLINK;
}
private INodeType iNodeType;
private String path;
private long ctime;
private int replication;
private String ownerName;
private String groupName;
private FsPermission perms;
private String symlinkTarget;
... View more
03-27-2017
12:56 PM
I tried both Virtualbox and Docker on the same Macbook, Docker used less resources and was faster.
... View more
03-23-2017
04:20 PM
From the error code, it seems that the problem is, you do not have hive-cli-*.jar in the Oozie sharelib folder. Could you also check the error message and post that as well?
... View more
03-23-2017
04:09 PM
1 Kudo
You should have posted the code you are trying to run, and how you are trying to run (how you submit the job into Spark). Without that it is harder to get an answer. From the error I can see you are trying to run wordcount on this file: hdfs:%20/user/midhun/f.txt Have you tried something like this? hdfs dfs -put /user/midhun/f.txt
spark-submit --class com.cloudera.sparkwordcount.SparkWordCount \
--master local --deploy-mode client --executor-memory 1g \
--name wordcount --conf "spark.app.id=wordcount" \
sparkwordcount-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://namenode_host:8020/user/midhun/f.txt 2
... View more
03-23-2017
03:56 PM
can you please post the whole stack trace of the error message? it seems like you omitted some meaningful parts
... View more
03-23-2017
03:54 PM
1 Kudo
This question is too board in this form. You need to understand this: if you want to get advise on which solution (computing engine) to choose, you should give a descrption first on what you are trying to accomplish, what kind of problem are you trying to solve, what is the nature of your workload.
... View more
03-23-2017
03:49 PM
If I understand correctly, you say you have large tables (3 million records) return a query like this relatively fast: Select * from example_table Limit 10 or Where serial = “SomeID” but when you run similar query against an external table stored on AWS S3, it performs badly. Did you try to copy table data file to hdfs, and then create an external table on the hdfs file? I bet that could make a big difference in the performance. I assume the difference is because in case the table data is stored on S3, hive first needs to copy the data from S3 onto a node where hive runs and the speed of that operation will depend on network bandwidth available.
... View more
03-23-2017
03:36 PM
Saurabh, it is not possible to answer your question because it contains insufficient information. Please update your post and add the following info: - the configuration that is supposed to tell Falcon to run your shell script, 10 times. - exactly what you see in the oozie web UI
... View more
03-23-2017
03:18 PM
You can use a load balancer that is built in Solr.
In case you have only one shard, you can specify a list of replicas to choose from for a single shard (for load balancing purposes) by using the pipe symbol (|): http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted|localhost:7500/solr/gettingstarted Consult the docs for more comlex scenarios (like more shards): https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
... View more
03-23-2017
03:10 PM
What kind of client do you use? (the client application uses what kind of interface to connect to Solr)
... View more
03-23-2017
03:02 PM
This error message does not look lit it would be specific to NIFI in any way. I would try to consult to EC2 support about this error and see what they say as potential cause, based on the error message. Then I would come back here and ask about how NIFI could do that
... View more
03-23-2017
02:56 PM
1) store PDF files in HDFS It would be possible to store your individual PDF files in HDFS and have the HDFS path as an additional field, stored in the Solr index. What you need to consider here, HDFS is best at storing small number of very large files, so it is not effective to store large number of relatively small PDF files in HDFS. 2) store PDF files in HBase It would also be possible to store the PDF files in a object store, like HBase. This is an option that is definitely feasible and I have seen several real life implementation of this design. In this case, you would store the HBase id in the Solr index. 3) store PDF files in the Solr index itself I think it is also possible to store the original PDF file in the Solr index as well. You would use a BinaryField type and you would set the stored property to true. (Note that you could even accomplish the same with older version of Solr, lacking the BinaryField type. In this case, you would have to convert your PDF into text (e.g. with base64 encoding) then store this text value in a stored=true field. Upon retrieval, you would convert it back to PDF). Without an estimation on the number of PDF files and the average size of a PDF, it would be hard to choose the best design. It could be also in important factor if you want to update your documents frequently or you just add to to the index once and then they won't change anymore.
... View more
03-23-2017
02:07 PM
By the way is it a final decision that you will store the data in HBase? Kevin, you wrote: > The web application allows the user to specify certain filters that are mandatory, as well as optional. The result coming from HBase are precalculated aggregates based on the filters of the application. I am afraid this is not sufficient information to design your HBase tables. Can you please elaborate on what kind of data will be stored in the db? What kind if filters will be applied, what kind of aggregate calculations will be done?
... View more
03-22-2017
10:18 AM
have you copied the jar file to the hdfs? if you run this command, what is the result? hadoop fs -ls /path/to/your/spark.jar
... View more
03-21-2017
07:44 PM
1 Kudo
linux username/password: root/hadoop this will also help: https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
... View more
03-21-2017
04:48 PM
First things first: it was not clear to me, if you can build your project and create a jar file by running a maven command? Once you can build a jar file out of your project, your pom.xml is fine. To check your pom.xml, run this command: mvn package If it returns en error, please update your original post and include the mvn error
... View more
03-21-2017
03:42 PM
@mayki wogno I understand that you want to construct a dashboard, that is perfectly fine. If you need further help with that, then I recommend you to close this thread and ask a new, more specific question in a new thread. You see, your original question in this thread has already been answered: yes it is possible and you need to use the Counters API. This community page can work out the best if the questions are specific and new questions are asked in new thread. So I kindly ask you to put some effort into implementing your dashboard and if you overcome an other problem, share it in a new thread.
... View more
03-21-2017
02:34 PM
@Nitin Kaushik, this question seems to be a duplicate, please consult to my answer given to the your similar question https://community.hortonworks.com/answers/89919/view.htm
... View more
03-21-2017
02:12 PM
why do I have this feeling that someone will just come and say: 'go for the druid'
... View more
03-21-2017
02:10 PM
have you tried to install a client of which version matches the server version?
... View more
03-21-2017
02:02 PM
The cause of this error is the JVM cannot find the javax.net.ssl.trustStore required for SSL, or it does not contain the required certificates. This means you will need to properly configure truststore for your nifi installation. Consult the following posts for further help with that https://community.hortonworks.com/articles/886/securing-nifi-step-by-step.html https://batchiq.com/nifi-configuring-ssl-auth.html
... View more
03-21-2017
01:46 PM
@vnandigam , are you running literally this code? event1.saveAsTextFile(""); Because, the saveAsTextFile expects one argument to be passed, a path to the file, if you pass an empty string as path, you should not expect your rdd data to be saved into any file. http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaRDD.html#saveAsTextFile(java.lang.String)
... View more
03-21-2017
01:35 PM
Just to elaborate on the answer of @Rob, consult the Counters section in the api docs: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html. You will probably want to create a Counter that is incremented by the ingest/delete processor and then you can query the counter value via the REST API. See also this post: https://pierrevillard.com/2017/02/07/using-counters-in-apache-nifi/
... View more