Member since
01-24-2017
69
Posts
2
Kudos Received
0
Solutions
06-15-2019
12:55 AM
@diebestetest wrote: Hi, Could you please share the Entire console logs for further analysis? Thanks Arun Sorry not familiar with the topic.
... View more
05-31-2019
10:32 AM
Its a problem with permissions, you need to let spark let know about local dir, following code then works: def xmlConvert(spark): etl_time = time.time() df = spark.read.format('com.databricks.spark.xml').options(rowTag='HistoricalTextData').load(
'/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/train/') df = df.withColumn("TimeStamp", df["TimeStamp"].cast("timestamp")).groupBy("TimeStamp").pivot("TagName").sum(
"TagValue").na.fill(0) df.repartition(1).write.csv( path="/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/result/", mode="overwrite", header=True, sep=",") print("Time taken to do xml transformation: --- %s seconds ---" % (time.time() - etl_time))
if __name__ == '__main__': spark = SparkSession \
.builder \
.appName('XML ETL') \
.master("local[*]") \
.config('job.local.dir', '/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \
.config('spark.driver.memory','64g') \
.config('spark.debug.maxToStringFields','200') \
.config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \
.getOrCreate() print('Session created')
try: xmlConvert(spark)
finally: spark.stop()
... View more
04-06-2017
09:36 PM
Hi Jordan, Yes, Cloudera also recommended increasing the heap size and after I did it couple weeks ago, I did not see any more crashes. It is rather surprising though that the default configuration causes crashes. That raises the question how optimal or even acceptable other parameters are and how to tune them. Thank you, Igor
... View more
03-02-2017
10:26 PM
And a lot of other components have TLS in Security section ... Are those mandatory or only needed for Kerberos?
... View more
02-24-2017
05:22 PM
Hi @IgorYakushin, To add to what @mbigelow mentioned, you can enable Kerberos without using TLS to secure communication between your agents and Cloudera Manager, but that would allow the kerberos keytabs to be transmitted from Cloudera Manager to your agents in the clear (risking a malicious party gaining access to your ketyab). Most of the security you will likley need is taken care of by inabling TLS for Agent communication in this section: Configuring TLS Encryption for Cloudera Manager Agents This will encrypt communication when the agent gets the keytabs and other files from CM. If you want more security by having the agents do verification of Cloudera Manager's certificate signer and hostname, then you can configure your trust file for each agent (to trust the CM signer). In summary, you don't need to have TLS enabled to enable Kerberos. If you need to protect the keytabs, enable TLS Encryption for Agents. If you need higher security by having the agents trust the signer of the Cloudera Manager server certificate, you can proceed with the other steps: https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html#topic_3 Ben
... View more
02-23-2017
01:32 PM
Thank you Ben. That worked. Apparently I was reading older version of the documention where Step 3 (letting CM know where truststore is) is not mentioned. I tried to put it on the same web page where keystore was and that did not work. I did not realize there is yet another page to specify truststore. Igor
... View more
02-01-2017
12:59 PM
Hello Igor, To create a partition in Linux, you’d need to ‘fdisk’ it first. In your example, (sdb) is the disk, so you’d need to to create the partition (sdb1): fdisk /dev/sdb After that, you’d need to format the new partition into an ext4: mkfs.ext4 /dev/sdb1 Make sure you are mount it correctly in /etc/fstab, just like I stated in my first response, ‘mount -a’ command is a good way to examine your fstab entries. In regards to the HDFS block size, the block division in HFDS is just logically built over the physical blocks of the ext4 filesystem; HDFS blocks are large compared to disk blocks, and the reason for this is to minimize the cost of seeks. If the block is large enough, the time it takes to transfer the data from the disk can be significantly longer than the time to seek to the start of the block. If there are any additional questions, please let me know. Thanks, Laith
... View more