About helmi_khalifa

helmi_khalifa · ‎01-17-2021

Hi @vjain , To configure the BuckeCache in the descripption there is a two JVM properties. Which one to use please? : HBASE_OPTS or HBASE_REGIONSERVER_OPTS In the hbase-env.sh file for each RegionServer, or in the hbase-env.sh file supplied to Ambari, set the -XX:MaxDirectMemorySize argument forHBASE_REGIONSERVER_OPTS to the amount of direct memory you wish to allocate to HBase. In the configuration for the example discussed above, the value would be 241664m. (-XX:MaxDirectMemorySize accepts a number followed by a unit indicator; m indicates megabytes.) HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize=241664m" Thanks, Helmi KHALIFA

helmi_khalifa · ‎02-22-2020

Hi, I am facing the same problem. Do you find a solution to your problem ? Best, Helmi Khalifa

helmi_khalifa · ‎11-14-2019

hi @avengers If it works for you, would you be kind enough to accept the answer please ? Best, Helmi KHALIFA

helmi_khalifa · ‎11-08-2019

Hi @avengers , U will need to share variables between two zeppelin interpreters and i dont think that we can do it between spark and sparkSQL. I find an easier way by using sqlContext inside the same interpreter %spark: %spark val df = spark.read.format("csv").option("header", "true") .option("inferSchema", "true").load("/somefile.csv") df.createOrReplaceTempView("csvTable"); val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val resultat = sqlContext.sql("select * from csvTable lt join hiveTable rt on lt.col = rt.col") resultat.show() I tried it and it works ! Best, Helmi KHALIFA

helmi_khalifa · ‎11-06-2019

Hi @av , Here the links for the Hive and Spark interpreter doc's : https://zeppelin.apache.org/docs/0.8.2/interpreter/hive.html https://zeppelin.apache.org/docs/0.8.2/interpreter/spark.html Best, Helmi KHALIFA

helmi_khalifa · ‎11-05-2019

Hi @Rak ; here the script : CREATE EXTERNAL TABLE IF NOT EXISTS sample_date (sc_code string, ddate timestamp, co_code DECIMAL, high DECIMAL, low DECIMAL, open DECIMAL, close DECIMAL, volume DECIMAL, no_trades DECIMAL, net_turnov DECIMAL, dmcap DECIMAL, return DECIMAL, factor DECIMAL, ttmpe DECIMAL, yepe DECIMAL, flag string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/lab/itim/ccbd/helmi/sampleDate' tblproperties('skip.header.line.count'='1'); ALTER TABLE sample_date SET SERDEPROPERTIES ("timestamp.formats"="MM/DD/YYYY"); Could you accept the answer please ? Best, Helmi KHALIFA

helmi_khalifa · ‎11-04-2019

Hi @Ra You have to change the name and column type as youscan see in red below : sc_code string ddate date co_code double high double low double open double close double volume double no_trades double net_turnov double dmcap double return double factor double ttmpe double yepe double flag string I tried it and it works well for me. Best, Helmi KHALIFA

helmi_khalifa · ‎10-31-2019

Hi @Rak , Can you show us some sample the 5 first rows of your csv file, please ? Best, Helmi KHALIFA

helmi_khalifa · ‎10-25-2019

Hi @RNN The best solution is to convert the Monthes to integers like: -Oct- => -10- -Dec- =>-12- So that is what i tested as you can see my file below: $ hdfs dfs -cat /lab/helmi/test_timestamp_MM.txt 1,2019-10-14 20:00:01.027898 2,2019-12-10 21:00:01.023 3,2019-11-25 20:00:01.03 4,2019-01-06 20:00:01.123 Create a Hive table : hive> CREATE EXTERNAL TABLE ttime(id int, t string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/lab/helmi/'; hive> select * from ttime; OK 1 2019-10-14 20:00:01.027898 2 2019-12-10 21:00:01.023 3 2019-11-25 20:00:01.03 4 2019-01-06 20:00:01.123 Time taken: 0.566 seconds, Fetched: 4 row(s) Finally i created another table with the right format: hive> create table mytime as select id, from_utc_timestamp(date_format(t,'yyyy-MM-dd HH:mm:ss.SSSSSS'),'UTC') as datetime from ttime; Best, Helmi KHALIFA

helmi_khalifa · ‎09-24-2019

HI @hadoopguy Yes there is an impact you will have longer processing time and the operations will be queued. You have to carefully handle the timeout in your jobs. Best, @helmi_khalifa

Online	Offline
Last Visited	‎09-09-2024 05:41 AM

Member Since	‎08-05-2016 07:49 AM
Last Visited	‎09-09-2024 05:41 AM
Posts	52
Kudos received	1

Cloudera Community

Re: How to Load data from hdfs Multi level directo...

Re: Optimizing HBase I/O for Large Scale Hadoop Im...

Re: What is expected behavior of spark.streaming.b...

Re: Can I use SparkSQL on a cluster using Hive on ...

Re: Can I use SparkSQL on a cluster using Hive on ...

Re: Can I use SparkSQL on a cluster using Hive on ...

Re: csv imported from local its not taking date va...

Re: csv imported from local its not taking date va...

Re: csv imported from local its not taking date va...

Re: Hive - Convert Formatted String to Timestamp

Re: How to move all the regions of a region server...