Member since
05-17-2016
190
Posts
46
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1326 | 09-07-2017 06:24 PM | |
1731 | 02-24-2017 06:33 AM | |
2463 | 02-10-2017 09:18 PM | |
6946 | 01-11-2017 08:55 PM | |
4501 | 12-15-2016 06:16 PM |
08-24-2016
06:50 PM
Hi All,
I am trying to sync my Directory users from IPA server to Ambari. I have been using these instructions However, I am not certain what need to be the value of Distinguished name attribute. Provided I have the following structure uid=u1,ou=ou11,ou=o1,dc=example,dc=com
uid=u2,ou=ou12,ou=o1,dc=example,dc=com
uid=u3,ou=ou21,ou=02,dc=example,dc=com
uid=u4,ou=ou22,ou=02,dc=example,dc=com
... View more
Labels:
- Labels:
-
Apache Ambari
08-22-2016
02:40 PM
1 Kudo
@Artem Ervits, guess its a problem with the relative vs absolute path. My observation is that in local mode, all store commands targeted towards the present working directory works fine, but for absolute paths, it requires file:/// prefix.
... View more
08-22-2016
01:25 PM
Hi @Vaibhav Kumar, could you please try the file:/// prefix in pig -x local mode. STORE Relation_name INTO 'file:///home/vaibhav/Desktop/Output' using PigStorage(',')
... View more
08-10-2016
06:28 PM
1 Kudo
Group is used to collect data having the same key. It is not mandatory to have an aggregation to be performed along with group. For a better understanding, let us consider a file with ID,Name and Age as below 1,John,23
2,James,24
3,Alice,30
4,Bob,23
5,Bill,24
If we have the below script applied on the file, loading the file and grouping it by age, we get all the data associated to one age into one single group. details = LOAD 'file' USING PigStorage(',') as (id:int, name:chararray, age:int);
grouped_data = GROUP details by age;
dump grouped_data; Output being (23,{(1,John,23),(4,Bob,23)})
(24,{(2,James,24),(5,Bill,24)})
(30,{(3,Alice,30)}) Further more, if you describe the schema of the grouped data, you would see as below describe grouped_data;
grouped_data: {group: int,details: {(id: int,name: chararray,age: int)}} You can explore more here
... View more
08-08-2016
03:44 PM
You can do something like val schemaString = "id,fruitName,isAvailable,unitPrice"
val fields = schemaString.split(",")
.map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
... View more
08-08-2016
12:26 PM
Well, schema is somewhat like the header. say id, fruitName, isAvailable, unitPrice in your case. You can specify the schema programmatically. Have a quick reference here
... View more
08-05-2016
06:22 PM
Hi @Alex Raj
Row is org.apache.spark.sql.Row. You need to add the import statement.
... View more
08-04-2016
06:54 PM
3 Kudos
Under the assumption that the file is Text and each line represent one record, you could read the file line by line and map each line to a Row. Then you can create a data frame form the RDD[Row] something like sqlContext.createDataFrame(sc.textFile("<file path>").map { x => getRow(x) }, schema) I have the below basic definition for creating the Row from your line using substring. But you can use your own implementation. def getRow(x : String) : Row={
val columnArray = new Array[String](4)
columnArray(0)=x.substring(0,3)
columnArray(1)=x.substring(3,13)
columnArray(2)=x.substring(13,18)
columnArray(3)=x.substring(18,22)
Row.fromSeq(columnArray)
}
If the records are not delimited by a new line, you may need to use a FixedLengthInputFormat and read the record one at a time and apply the similar logic as above. The fixedlengthinputformat.record.length in that case will be your total length, 22 in this example. Instead of textFile, you may need to read as sc.newAPIHadoopRDD
... View more