About arunak

arunak · ‎08-24-2016

Hi All, I am trying to sync my Directory users from IPA server to Ambari. I have been using these instructions However, I am not certain what need to be the value of Distinguished name attribute. Provided I have the following structure uid=u1,ou=ou11,ou=o1,dc=example,dc=com uid=u2,ou=ou12,ou=o1,dc=example,dc=com uid=u3,ou=ou21,ou=02,dc=example,dc=com uid=u4,ou=ou22,ou=02,dc=example,dc=com

arunak · ‎08-22-2016

@Artem Ervits, guess its a problem with the relative vs absolute path. My observation is that in local mode, all store commands targeted towards the present working directory works fine, but for absolute paths, it requires file:/// prefix.

arunak · ‎08-22-2016

Hi @Vaibhav Kumar, could you please try the file:/// prefix in pig -x local mode. STORE Relation_name INTO 'file:///home/vaibhav/Desktop/Output' using PigStorage(',')

arunak · ‎08-10-2016

You are welcome @Pedro Rodgers

arunak · ‎08-10-2016

++ You can group by multiple columns or even by all

arunak · ‎08-10-2016

Group is used to collect data having the same key. It is not mandatory to have an aggregation to be performed along with group. For a better understanding, let us consider a file with ID,Name and Age as below 1,John,23 2,James,24 3,Alice,30 4,Bob,23 5,Bill,24 If we have the below script applied on the file, loading the file and grouping it by age, we get all the data associated to one age into one single group. details = LOAD 'file' USING PigStorage(',') as (id:int, name:chararray, age:int); grouped_data = GROUP details by age; dump grouped_data; Output being (23,{(1,John,23),(4,Bob,23)}) (24,{(2,James,24),(5,Bill,24)}) (30,{(3,Alice,30)}) Further more, if you describe the schema of the grouped data, you would see as below describe grouped_data; grouped_data: {group: int,details: {(id: int,name: chararray,age: int)}} You can explore more here

arunak · ‎08-08-2016

You can do something like val schemaString = "id,fruitName,isAvailable,unitPrice" val fields = schemaString.split(",") .map(fieldName => StructField(fieldName, StringType, nullable = true)) val schema = StructType(fields)

arunak · ‎08-08-2016

Well, schema is somewhat like the header. say id, fruitName, isAvailable, unitPrice in your case. You can specify the schema programmatically. Have a quick reference here

arunak · ‎08-05-2016

Hi @Alex Raj Row is org.apache.spark.sql.Row. You need to add the import statement.

arunak · ‎08-04-2016

Under the assumption that the file is Text and each line represent one record, you could read the file line by line and map each line to a Row. Then you can create a data frame form the RDD[Row] something like sqlContext.createDataFrame(sc.textFile("<file path>").map { x => getRow(x) }, schema) I have the below basic definition for creating the Row from your line using substring. But you can use your own implementation. def getRow(x : String) : Row={ val columnArray = new Array[String](4) columnArray(0)=x.substring(0,3) columnArray(1)=x.substring(3,13) columnArray(2)=x.substring(13,18) columnArray(3)=x.substring(18,22) Row.fromSeq(columnArray) } If the records are not delimited by a new line, you may need to use a FixedLengthInputFormat and read the record one at a time and apply the similar logic as above. The fixedlengthinputformat.record.length in that case will be your total length, 22 in this example. Instead of textFile, you may need to read as sc.newAPIHadoopRDD

Online	Offline
Last Visited	‎01-10-2020 08:56 AM

Member Since	‎05-17-2016 11:59 AM
Last Visited	‎01-10-2020 08:56 AM
Posts	190
Kudos received	46

Cloudera Community

Re: Composed delimiter , multidilimiter in Hive !!...

Re: How to put running log of Apahce NiFi into Spl...

Re: How to extract Text from JSON

Re: How to expand a single row with a start and en...

Re: Enabling LZO compression using NiFi PutHDFS

IPA ldap Ambari Sync

Re: PIG Store Command not working in local mode

Re: PIG Store Command not working in local mode

Re: Why should we group using Apache PIG

Re: Why should we group using Apache PIG

Re: Why should we group using Apache PIG

Re: how to read fixed length files in Spark

Re: how to read fixed length files in Spark

Re: how to read fixed length files in Spark

Re: how to read fixed length files in Spark