Member since
08-12-2016
39
Posts
7
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
765 | 03-27-2017 01:36 PM | |
407 | 03-21-2017 07:44 PM | |
2915 | 03-21-2017 11:31 AM |
01-07-2018
07:03 PM
1 Kudo
@pbarna I worked on this as part of Atlas project. I realized that solution is not as simple given the numerous dependencies involved. Can you please tell me the version of Titan and Hadoop you are using? I attempted a similar exercise for using Titan 0.5.4 and Hadoop 2.6.3. My problem was to initiate Titan Index Repair job. This facility is built-in to Titan API. It uses MapReduce to initiate the repair. With some help, I realized that adding properties to yarn-site.xml and hbase-site.xml actually help. When you do update the properties in these files, be sure to use the <final>true</final> so that your settings override the default and take effect. Example: <property> <name>mapreduce.local.map.tasks.maximum</name> <value>10</value> <final>true</final> </property> Due to various reason I ended up writing a groovy script to achieve this. I can get into details if you are interested. My script is here. Please feel free to reach out if you think this was useful. Thanks @Nixon Rodrigues for letting me know about this question.
... View more
08-23-2017
09:05 AM
1 Kudo
@pbarna I think the Java API should be the fastest. FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf);
class DirectoryThread extends Thread {
private int from;
private int count;
private static final String basePath = "/user/d";
public DirectoryThread(int from, int count) {
this.from = from;
this.count = count;
}
@Override
public void run() {
for (int i = from; i < from + count; i++) {
Path path = new Path(basePath + i);
try {
fs.mkdirs(path);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
long startTime = System.currentTimeMillis();
int threadCount = 8;
Thread threads[] = new Thread[threadCount];
int total = 1000000;
int countPerThread = total / threadCount;
for (int j = 0; j < threadCount; j++) {
Thread thread = new DirectoryThread(j * countPerThread, countPerThread);
thread.start();
threads[j] = thread;
}
for (Thread thread : threads) {
thread.join();
}
long endTime = System.currentTimeMillis();
System.out.println("Total: " + (endTime - startTime) + " milliseconds"); Obviously, use as many threads as you can. But still, this takes 1-2 minutes, I wonder how @bkosaraju could "complete in few seconds with your code"
... View more
04-20-2017
12:57 PM
it is not clear, what is being asked here. @manyatha reddy, could you please be more specific?
... View more
09-14-2017
12:08 PM
1 Kudo
1) Start Atlas in debug mode first you want to add extra JVM options in the startup script , so in atlas_start.py replace this line DEFAULT_JVM_OPTS="-Dlog4j.configuration=atlas-log4j.xml -Djava.net.preferIPv4Stack=true -server" with this DEFAULT_JVM_OPTS="-Dlog4j.configuration=atlas-log4j.xml -Djava.net.preferIPv4Stack=true -server -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=54371,server=y,suspend=y "
Now, when you start Atlas, it will hang until you connect with the debugger (because of the suspend=y). 2) connect from Eclipse remote debugger Make sure you have imported the Atlas project into Eclipse based on this document: http://atlas.apache.org/EclipseSetup.html Then create a new debug configurations under the following menu: /Run/Debug Configurations... Make sure the port is set to the same above (54371) and connection type is Standard (socket attach) Use Eclipse JDT launcher.
... View more
03-29-2017
10:33 AM
THX pbarna, I have checked all, the host which hiveserver2 runs on is m1.hdp.local and the port is also 10000. Maybe I should reinstall the hive services if i want to recover the jdbc url of hive from zookeeper mode to direct mode.
... View more
03-24-2017
05:54 AM
I am talking about following highlighted one. I have following retry in falcon but in oozie it is showing 20 run also my shell script is running 20 times. I have tested script by adding echo statement in start of the script. <retry policy="periodic" delay="minutes(30)" attempts="10"/>
... View more
03-21-2017
07:44 PM
1 Kudo
linux username/password: root/hadoop this will also help: https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
... View more
03-23-2017
04:44 PM
awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/
... View more
03-21-2017
11:31 AM
1 Kudo
You cannot add a description / comment to a Phoenix table in the same way as you would add in Oracle/TD, that is: no equivalent to the `COMMENT ON TABLE` SQL command, neither in the Phoenix SQL, nor in the underlying Hbase api. You can accomplish your goal by using external tools to keep tab on your table metadata.
... View more
03-22-2017
11:46 AM
@pbarna i tested firefox and i did make the changes you specified. issue here is not about the error message, it is how can we make http authentication against ad credentials to access any UI. I was able to access UI if i login with local credentials, i want to get the same when i login with my ad credentials when i login to domain pc.
... View more
05-11-2017
05:02 PM
1 Kudo
Yes Go for druid ! I want to start with disclaimer i am a druid committer. First i want to point that as an engineer i don't believe that there is a single query engine that can be always be better that all the other solutions, it is all relative to the use case you want to solve. Now let's get to why Druid and not OpenTSDB for real-time stream application ? Therefore the use case keyword here is real time streaming applications. Well for the simple reasons are: Druid has native ingestion and indexing support with almost all the rising real time stream processing technologies (eg kafka, rabitMQ, spark, storm, flink, apex, ... and the list goes on and on). This integration is production tested at a very very large scale (eg Yahoo-Flurry or Metamarket) where we have more than 1 million events per second through real-time ingestion. Druid out of the box has support for lambda architecture. Druid can ingest data directly from Kafka with the guaranty of exactly once delivery semantic. In my opinion those are the key element to look for when i am building realtime streaming application. To my limited knowledge i am not aware if there is any integration or production use cases with real time streams and OpenTSDB.
... View more
03-20-2017
04:58 PM
Thanks for your input. I am slightly confused by your response. Is there way to populate the REMARKS metadata via api?
... View more