About bpgergo

amestry · ‎01-07-2018

@pbarna I worked on this as part of Atlas project. I realized that solution is not as simple given the numerous dependencies involved. Can you please tell me the version of Titan and Hadoop you are using? I attempted a similar exercise for using Titan 0.5.4 and Hadoop 2.6.3. My problem was to initiate Titan Index Repair job. This facility is built-in to Titan API. It uses MapReduce to initiate the repair. With some help, I realized that adding properties to yarn-site.xml and hbase-site.xml actually help. When you do update the properties in these files, be sure to use the <final>true</final> so that your settings override the default and take effect. Example: <property> <name>mapreduce.local.map.tasks.maximum</name> <value>10</value> <final>true</final> </property> Due to various reason I ended up writing a groovy script to achieve this. I can get into details if you are interested. My script is here. Please feel free to reach out if you think this was useful. Thanks @Nixon Rodrigues for letting me know about this question.

gnovak · ‎08-23-2017

@pbarna I think the Java API should be the fastest. FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf); class DirectoryThread extends Thread { private int from; private int count; private static final String basePath = "/user/d"; public DirectoryThread(int from, int count) { this.from = from; this.count = count; } @Override public void run() { for (int i = from; i < from + count; i++) { Path path = new Path(basePath + i); try { fs.mkdirs(path); } catch (IOException e) { e.printStackTrace(); } } } } long startTime = System.currentTimeMillis(); int threadCount = 8; Thread threads[] = new Thread[threadCount]; int total = 1000000; int countPerThread = total / threadCount; for (int j = 0; j < threadCount; j++) { Thread thread = new DirectoryThread(j * countPerThread, countPerThread); thread.start(); threads[j] = thread; } for (Thread thread : threads) { thread.join(); } long endTime = System.currentTimeMillis(); System.out.println("Total: " + (endTime - startTime) + " milliseconds"); Obviously, use as many threads as you can. But still, this takes 1-2 minutes, I wonder how @bkosaraju could "complete in few seconds with your code"

bpgergo · ‎03-21-2017

linux username/password: root/hadoop this will also help: https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/

vbonthu · ‎03-23-2017

awesome @Ken Jiiii hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/

bpgergo · ‎03-21-2017

You cannot add a description / comment to a Phoenix table in the same way as you would add in Oracle/TD, that is: no equivalent to the `COMMENT ON TABLE` SQL command, neither in the Phoenix SQL, nor in the underlying Hbase api. You can accomplish your goal by using external tools to keep tab on your table metadata.

Online	Offline
Last Visited	‎08-21-2019 06:12 AM

Member Since	‎08-12-2016 08:27 AM
Last Visited	‎08-21-2019 06:12 AM
Posts	39
Kudos received	7

Cloudera Community

Re: Is there a way that inotify createEvent can ca...

Re: About installation of HDP sandbox

Re: Is there way to add comment to a phoenix table...

Re: How can yarn queue be specified for Titan mapr...

Re: What is the fastest way to create large number...

Re: About installation of HDP sandbox

Re: Minimal executable jar based on Scala code pac...

Re: Is there way to add comment to a phoenix table...