Created 08-12-2021 10:45 PM
Hello Team,
I have the following file to load on Kudu.
1829;BN=0;UNIT=VOLUME_ALL;IN=0;TC=0;TCC=0;CT=;FU=1000001;CU=54274;FB=61701;FL=ugw9811_3828500385_360_27153
0=5126742111750858;U=23059268534;SI=6;SG=1;SR=7;SN=BROWSING;SC=BROWSING;BS=60342256;BR=2581143;TU=2021-04-27 14:02:47;TF=2021-04-27 00:00:00;TA=2021-04-27 14:02:47;TB=2021-04-27 00:00:00;TE=2021-04-27 14:02:47;TS=1619517767;D=16292;R=151;E=0;UDR_cu=0;UDR_fb=BROWSING;DCM=0;UP=Prepaid;ST=BROWSING;MSISDN=23059268534;APN=orange;SGSN=196.192.13.113;GGSN=196.192.13.113;IMSI=617010014925066;BU1=23292;BN=0;UNIT=VOLUME_ALL;IN=0;TC=0;TCC=62923399;CT=;FU=1000000;CU=3586;FB=61701;FL=ugw9811_3828490275_312_8799
0=5126752111750858;U=23059268534;SI=6;SG=1;SR=7;SN=BROWSING;SC=BROWSING;BS=0;BR=0;TU=2021-04-27 14:02:47;TF=2021-04-27 00:00:00;TA=2021-04-27 14:02:47;TB=2021-04-27 00:00:00;TE=2021-04-27 14:02:47;TS=1619517767;D=16292;R=151;E=0;UDR_cu=0;UDR_fb=BROWSING;DCM=0;UP=Prepaid;ST=BROWSING;MSISDN=23059268534;APN=orange;SGSN=196.192.13.113;GGSN=196.192.13.113;IMSI=617010014925066;BU1=21829;BN=0;UNIT=VOLUME_ALL;IN=0;TC=0;TCC=0;CT=;FU=1000001;CU=3586;FB=61701;FL=ugw9811_3828490275_312_8799
How can I proceed using Spark SQL?
Table structure on Kudu:
CREATE EXTERNAL TABLE cdr.mobile_datadbs ( id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, msisdn STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, serviceid INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, servicegroup INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, servicerev INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, servicename STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, serviceclass STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, downlink INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, uplink INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, storedtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, firstaccesstime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, lastaccesstime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, servicebegintime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, sessionendtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, cdrcreatedtime BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, duration BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, hitsperreq BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, errors1 BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, udrcu INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, udrfb STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, status1 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, userprofile STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, servicetype STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, subsmsisdn STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, apn STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, sgsnaddress STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, ggsnaddress STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, imsi STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, bonusunit STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, bn INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, unit STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, instatus INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, totalcost BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, totalcharge BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, terminationcause BIGINT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, firstrequestedurl STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, cellidinfo STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, idpname STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, failureslist STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, PRIMARY KEY (id) ) PARTITION BY HASH (id) PARTITIONS 16 STORED AS KUDU TBLPROPERTIES ('external.table.purge'='TRUE', 'kudu.master_addresses'='rb-hadoop-03.mtg.local,rb-hadoop-04.mtg.local,rb-hadoop-05.mtg.local') |
Created 08-14-2021 01:46 AM
Hi Roshan,
Thanks for raise question in Cloudera community !
Firstly you need to check what is the current kudu version.
Notes:
Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10. Note that Spark 1 is no longer supported in Kudu starting from version 1.6.0. So in order to use Spark 1 integrated with Kudu, version 1.5.0 is the latest to go to.
And please refer to below articles for some code examples.
[1] CDP 7.2.10 documents:
https://docs.cloudera.com/runtime/7.2.10/kudu-development/topics/kudu-integration-with-spark.html
[2] Kudu official site:
https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark
If it helps please accept the solution and click thumbs up.
Created 08-14-2021 01:46 AM
Hi Roshan,
Thanks for raise question in Cloudera community !
Firstly you need to check what is the current kudu version.
Notes:
Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10. Note that Spark 1 is no longer supported in Kudu starting from version 1.6.0. So in order to use Spark 1 integrated with Kudu, version 1.5.0 is the latest to go to.
And please refer to below articles for some code examples.
[1] CDP 7.2.10 documents:
https://docs.cloudera.com/runtime/7.2.10/kudu-development/topics/kudu-integration-with-spark.html
[2] Kudu official site:
https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark
If it helps please accept the solution and click thumbs up.
Created 08-19-2021 04:55 AM
@roshanbim, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,