About asirna

asirna · ‎06-20-2018

@Saurabh Ambre, Glad to know that the previous issue is resolved. It is always good to create a separate thread for each issue. Please create a new issue for this issue so that the main thread doesn't get deviated. Also in the new question , put the complete stack trace and attach the pom.xml file. Feel free to tag me in the question. Please Accept the above answer.

asirna · ‎06-19-2018

@Saurabh Ambre, Try adding these below 2 lines and see if it works. Configuration conf = new Configuration(); conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); . Please "Accept" the answer if this works. This will be helpful for other community users. . -Aditya

asirna · ‎06-19-2018

@Anpan K, HDFS audit is different from ranger hdfs audit. Ranger HDFS audit works only when "Ranger HDFS" plugin is enabled. In case you want audits when Ranger HDFS plugin is not enabled, then you use hdfs audits. If ranger hdfs plugin is enabled, then you use ranger hdfs audits. Ranger audits are stored in HDFS and also can be stored in Solr. To put audits to Solr, enable "Audits to Solr" plugin from Ambari. You can see the logs in Ranger UI if the audits are logged to Solr , it cannot read audits from HDFS. . -Aditya

asirna · ‎06-19-2018

@Sandeep Ahuja, textFile() partitions based on the number of HDFS blocks the file uses. If the file is only 1 block, then RDD is initialized with minimum of 2 partitions. If you want to increase the minimum no of partitions then you can pass an argument for it like below files = sc.textfile("hdfs://user/cloudera/csvfiles",minPartitions=10) If you want to check the no of partitions, you can run the below statement files.getNumPartitions() Note: If you set the minPartitions to less than the no of HDFS blocks, spark will automatically set the min partitions to the no of hdfs blocks and doesn't give any error. . Please "Accept" the answer if this helps or revert back for any questions. . -Aditya

asirna · ‎06-15-2018

@Sayantan Dash, This is just a Warning message and shouldn't be the problem. Can you check if there are some other error logs.

asirna · ‎06-15-2018

@JAy PaTel, You cannot directly write the output of echo to hdfs file. Instead you can do like below echo "`date` hi" > /tmp/output ; hdfs dfs -appendToFile /tmp/output /tmp/abc.txt . -Aditya

asirna · ‎06-15-2018

@Alex Witte, According to your question, you want to transform it to the below format Col1 Col2 1 [agakhanpark,science centre,sunnybrookpark,laird,leaside,mountpleasant,avenue] 2 [agakhanpark,wynford,sloane,oconnor,pharmacy,hakimilebovic,goldenmile,birchmount] I have changed your code little bit and was able to achieve it. Please check this code and the pyspark execution output from pyspark.sql.types import * data_schema = [StructField('id', IntegerType(), False),StructField('route', StringType(),False)] final_struc = StructType(fields=data_schema) df = sqlContext.read.option("delimiter", "|").csv('/user/hrt_qa/a.txt',schema=final_struc) df.show() from pyspark.sql.functions import udf def str_to_arr(my_list): my_list = my_list.split(",") return '[' + ','.join([str(elem) for elem in my_list]) + ']' str_to_arr_udf = udf(str_to_arr,StringType()) df = df.withColumn('route_arr',str_to_arr_udf(df["route"])) df = df.drop("route") df.show() >>> from pyspark.sql.types import * >>> data_schema = [StructField('id', IntegerType(), False),StructField('route', StringType(),False)] >>> final_struc = StructType(fields=data_schema) >>> df = sqlContext.read.option("delimiter", "|").csv('/user/hrt_qa/a.txt',schema=final_struc) >>> df.show() +---+--------------------+ | id| route| +---+--------------------+ | 1|agakhanpark,scien...| | 2|agakhanpark,wynfo...| +---+--------------------+ >>> >>> >>> from pyspark.sql.functions import udf >>> def str_to_arr(my_list): ... my_list = my_list.split(",") ... return '[' + ','.join([str(elem) for elem in my_list]) + ']' ... >>> str_to_arr_udf = udf(str_to_arr,StringType()) >>> df = df.withColumn('route_arr',str_to_arr_udf(df["route"])) >>> df = df.drop("route") >>> df.show() +---+--------------------+ | id| route_arr| +---+--------------------+ | 1|[agakhanpark,scie...| | 2|[agakhanpark,wynf...| +---+--------------------+ . Please "Accept" the answer if this helps. . -Aditya

asirna · ‎06-13-2018

@Basil Paul, Looks like your Namenode is not running. Start the NameNode first and then try starting HBase master. Also, it is better to give proper hostnames instead of hbase1, hbase2 etc. . Please "Accept" the answer If this helps you. This will be really useful for other community users -Aditya

asirna · ‎06-11-2018

@Sami Ahmad, When you run "select count(*) from emp" , the size of rst (ResultSet) will be only 1. So rst.getString(2) will give IndexOutOfBoundsException. Remove rst.getString(2) when you run select count(*) from emp and it will work properly. . Please "Accept" the answer if this works for you. . -Aditya

asirna · ‎05-29-2018

@bigdata.neophyte, Yes. It is possible to install the cluster without services like HDFS, Yarn , MR etc. However Ambari recommends you to install SmartSense and Ambari Metrics which you can delete after installation or use blueprints to install the cluster. . -Aditya

Online	Offline
Last Visited	‎05-28-2019 04:37 PM

Member Since	‎11-07-2016 08:16 AM
Last Visited	‎05-28-2019 04:37 PM
Posts	637
Kudos received	253

Cloudera Community

Re: Ranger "Test Connection" button REST API?

Re: knox PUT request failing

Re: spark 2 jobs with OOZIE on HDP 2.6.5 hangs

Re: Ambari access through private hostname on aws ...

Re: Restart Resource Manager does not work, can no...

Re: Unable to Connect HDFS through java, my hadoop...

Re: Unable to Connect HDFS through java, my hadoop...

Re: hdfs audit and ranger hdfs audit

Re: SPARK number of partitions/tasks while reading...

Re: while running spark submit on yarn getting iss...

Re: [Closed] : How to store output of shell script...

Re: Pyspark can't show() a CSV with an array

Re: Hbase : Caused by: java.net.ConnectException: ...

Re: select count(*) command failing in Phoenix

Re: Minimum Services required for Ambari based dep...