Member since
03-28-2017
38
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8477 | 06-27-2017 06:32 AM |
03-06-2024
02:16 AM
1 Kudo
Hi @Sidhartha Could you please try the following sample def convertDatatype(datatype: String): DataType = {
val convert = datatype match {
case "string" => StringType
case "short" => ShortType
case "int" => IntegerType
case "bigint" => LongType
case "float" => FloatType
case "double" => DoubleType
case "decimal" => DecimalType(38,30)
case "date" => TimestampType
case "boolean" => BooleanType
case "timestamp" => TimestampType
}
convert
}
val input_data = List(Row(1l, "Ranga", 27, BigDecimal(306.000000000000000000)), Row(2l, "Nishanth", 6, BigDecimal(606.000000000000000000)))
val input_rdd = spark.sparkContext.parallelize(input_data)
val hiveCols="id:bigint,name:string,age:int,salary:decimal"
val schemaList = hiveCols.split(",")
val schemaStructType = new StructType(schemaList.map(col => col.split(":")).map(e => StructField(e(0), convertDatatype(e(1)), true)))
val myDF = spark.createDataFrame(input_rdd, schemaStructType)
myDF.printSchema()
myDF.show()
val myDF2 = myDF.withColumn("new_salary", col("salary").cast("double"))
myDF2.printSchema()
myDF2.show()
... View more
01-26-2021
07:29 AM
I'm also getting the same exception while fetching the database details by using pyspark from pyspark.sql import HiveContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark import SparkConf, SparkContext conf = (SparkConf().set("spark.kryoserializer.buffer.max", "512m")) sc = SparkContext(conf=conf) hive_context = HiveContext(sc) sqlContext = SQLContext(sc) sqlContext.sql("use default") above code I can execute till the 6 line, from 7th line I'm getting the exception. in my cluster by default PostgreSQL database came, after that, I manually installed Mysql, not integrated with Cloudera, because of this I'm getting this exception? can help me with this, please.
... View more
07-04-2019
09:55 PM
1 Kudo
Hi @Sidhartha, The error means that the destination table base.customers already has the partition source_name=ORACLE, as the error indicated: Partition already exists [customers(source_name=ORACLE)] Can you run : SHOW PARTITIONS base.customers; to confirm? If yes, try to drop it and then run EXCHANGE PARTITION again. Cheers Eric
... View more
04-09-2019
07:41 AM
1 Kudo
There is something very unusual happening here. Based on your outputs, values are not only ending up in the wrong columns, but you are even getting different values! In the 'correct' record, you have 5686.76, and in the 'wrong' record you have -5686.76. My first guess was that there is a mistake in how you send data to the appropriate columns, but I don't see how that can explain a minus sign changing position. To troubleshoot something like this, it is really important to dig into the details. I would therefore recommend you to bring your question down to a 'Minimal reproducible example'. Eliminating any complexity that is not causing unexpected results. For example: You show a load command to get data into spark, consider replacing it with an actual string (and make sure to check whether the string allows you to reproduce the problem). You also show 2 writes, but if we have the exact input and code to reproduce the problem the correct answer is probably not relevant. Also, you use some code to list columns, consider hardcoding it first. As mentioned, really try to take out all complexity untill we land on a minimal amount that still reproduces the problem. Hopefully you will already see the answer once you have eliminated all the distractions, and if not you will have a fully trimmed down version, which you can use to update your question here!
... View more
09-03-2018
12:20 AM
1 Kudo
Hi, I also faced similar issues while applying regex_replace() to only strings columns of a dataframe. The trick is to make regEx pattern (in my case "pattern") that resolves inside the double quotes and also apply escape characters. val pattern="\"\\\\[\\\\x{FFFD}\\\\x{0}\\\\x{1F}\\\\x{81}\\\\x{8D}\\\\x{8F}\\\\x{90}\\\\x{9D}\\\\x{A0}\\\\x{0380}\\\\]\"" val regExpr = parq_out_df_temp.schema.fields.map(x => if(x.dataType == StringType){"regexp_replace(%s, $pattern,'') as %s".format(x.name,x.name)} else x.name) val parq_out=parq_out_df_temp.selectExpr(regExpr:_*) This worked fine for me !!
... View more
07-18-2018
01:29 AM
1 Kudo
According to the error, it is looking for java 7 installed by cloudera. You should define JAVA_HOME={path_to_your_jdk8_installation} in bashrc.
... View more
03-20-2018
08:11 AM
1 Kudo
The error complains about the value of "hadoop.security.authentication". You have set it to "Kerberos" while the accepted values are "simple" and "kerberos" (all letters in lowercase).
... View more
01-01-2018
10:30 PM
The files will not be in a specific order. Is this a solution: Load all the files into Spark & create a dataframe out of it and then split this main dataframe into smaller ones by using the delimiter("...") which is present at the end of each file. Once this is done, map the dataframes by checking if the third line of each file contains the words: "SEVERE: Error" and group/merge them together. Similarly following the approach for the other cases and finally have three separate dataframes exclusice for each case. Is this approach viable or is there any better way I can follow.
... View more
07-04-2017
11:58 PM
What user are you runinig the spark ? is this path that you are referring is in hdfs or local /home/cloudera/partfile perform this and let me know if the files are getting listed hadoop fs -ls /home/cloudera/
... View more
06-27-2017
06:32 AM
The mistake I made was Case class should be outside the main and inside the object In this line: val partfile = sparkSession.read.text("partfile").as[String] , I used read.text("..") to get a file into Spark where we can use read.textFile("...")
... View more