Member since
08-12-2016
39
Posts
7
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1443 | 03-27-2017 01:36 PM | |
954 | 03-21-2017 07:44 PM | |
5390 | 03-21-2017 11:31 AM |
01-07-2018
07:03 PM
1 Kudo
@pbarna I worked on this as part of Atlas project. I realized that solution is not as simple given the numerous dependencies involved. Can you please tell me the version of Titan and Hadoop you are using? I attempted a similar exercise for using Titan 0.5.4 and Hadoop 2.6.3. My problem was to initiate Titan Index Repair job. This facility is built-in to Titan API. It uses MapReduce to initiate the repair. With some help, I realized that adding properties to yarn-site.xml and hbase-site.xml actually help. When you do update the properties in these files, be sure to use the <final>true</final> so that your settings override the default and take effect. Example: <property> <name>mapreduce.local.map.tasks.maximum</name> <value>10</value> <final>true</final> </property> Due to various reason I ended up writing a groovy script to achieve this. I can get into details if you are interested. My script is here. Please feel free to reach out if you think this was useful. Thanks @Nixon Rodrigues for letting me know about this question.
... View more
08-23-2017
09:05 AM
1 Kudo
@pbarna I think the Java API should be the fastest. FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf);
class DirectoryThread extends Thread {
private int from;
private int count;
private static final String basePath = "/user/d";
public DirectoryThread(int from, int count) {
this.from = from;
this.count = count;
}
@Override
public void run() {
for (int i = from; i < from + count; i++) {
Path path = new Path(basePath + i);
try {
fs.mkdirs(path);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
long startTime = System.currentTimeMillis();
int threadCount = 8;
Thread threads[] = new Thread[threadCount];
int total = 1000000;
int countPerThread = total / threadCount;
for (int j = 0; j < threadCount; j++) {
Thread thread = new DirectoryThread(j * countPerThread, countPerThread);
thread.start();
threads[j] = thread;
}
for (Thread thread : threads) {
thread.join();
}
long endTime = System.currentTimeMillis();
System.out.println("Total: " + (endTime - startTime) + " milliseconds"); Obviously, use as many threads as you can. But still, this takes 1-2 minutes, I wonder how @bkosaraju could "complete in few seconds with your code"
... View more
03-21-2017
07:44 PM
1 Kudo
linux username/password: root/hadoop this will also help: https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
... View more
03-23-2017
04:44 PM
awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/
... View more
03-21-2017
11:31 AM
1 Kudo
You cannot add a description / comment to a Phoenix table in the same way as you would add in Oracle/TD, that is: no equivalent to the `COMMENT ON TABLE` SQL command, neither in the Phoenix SQL, nor in the underlying Hbase api. You can accomplish your goal by using external tools to keep tab on your table metadata.
... View more