Member since
03-04-2019
67
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1552 | 03-18-2020 01:42 AM | |
970 | 03-11-2020 01:09 AM | |
1172 | 12-16-2019 04:17 AM |
03-18-2020
03:48 AM
Hi @Skodeto . i think ,your question is not visible with us. please update question again with proper explanation with log. Thanks HadoopHelp
... View more
03-18-2020
02:57 AM
Hi @SrinuY . where you want to perform this operation ,please specify tools? Thanks HadoopHelp
... View more
03-18-2020
02:55 AM
Hi @prakashpunj . below error is coming in Hue GUI? if yes then you don't have access for Hue GUI. Cannot access: //. The HDFS REST service is not available. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". you need to contact with your Hadoop admin team to take the Hue access control. Thanks HadoopHelp
... View more
03-18-2020
02:49 AM
Hi @ManjunathK . I think HDFS level authentication is required for current user XXX. Please check HDFS files level access/Permission for current user .(used file in map process) Thanks HadoopHelp
... View more
03-18-2020
02:39 AM
Hi @Logica . I think you need keep hive-site.xml file into spark - Please follow the below steps for running the hive query or accessing the hive table through pyspark- https://acadgild.com/blog/how-to-access-hive-tables-to-spark-sql Thanks HadoopHelp
... View more
03-18-2020
01:42 AM
Hi @Logica . please check whether database is selected or not for running the query- below is code for reading hive table - from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import HiveContext
sc= SparkContext('local','example')
hc = HiveContext(sc)
tf1 = sc.textFile("/user/BigData/nooo/SparkTest/train.csv")
#print(tf1.show(10))
#here reading hive table from pyspark
#print(data)
#data=tf1.top(10)
#print(data)
hc.sql("use default") #selected db here
spf = hc.sql("SELECT * FROM tempaz LIMIT 100")
print(spf.show(5)) Thanks HadoopHelp
... View more
03-12-2020
07:26 AM
Hi @adv52C . yet there is no CDH or CDP hadoop Cluster available but you can use as trial from below link- https://cloudxlab.com/ Thanks HadoopHelp
... View more
03-11-2020
01:09 AM
Hi . I think below link is helpful to you , just try IT- https://community.cloudera.com/t5/Support-Questions/Quickstart-VM/m-p/290564#M214948 Thanks HadoopHelp
... View more
03-03-2020
06:06 AM
Dear all.
I created one hive temp tables as below -
but how can we identified that tables is temp tables or not.
CREATE temporary TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
i used below command to described but not getting any info-
describe formatted employee; col_name data_type comment
1 # col_name data_type comment
2 NULL NULL
3 eid int
4 name string
5 salary string
6 destination string
7 NULL NULL
8 # Detailed Table Information NULL NULL
9 Database: h7 NULL
10 OwnerType: USER NULL
11 Owner: **** NULL
12 CreateTime: Tue Mar 03 08:50:28 EST 2020 NULL
13 LastAccessTime: UNKNOWN NULL
14 Retention: 0 NULL
15 Location: hdfs:*********** NULL
16 Table Type: MANAGED_TABLE NULL
17 Table Parameters: NULL NULL
18 comment Employee details
19 NULL NULL
20 # Storage Information NULL NULL
21 SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe NULL
22 InputFormat: org.apache.hadoop.mapred.TextInputFormat NULL
23 OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat NULL
24 Compressed: No NULL
25 Num Buckets: -1 NULL
26 Bucket Columns: [] NULL
27 Sort Columns: [] NULL
28 Storage Desc Params: NULL NULL
29 field.delim \t
30 line.delim \n
Thanks
HadoopHelp
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
03-03-2020
03:21 AM
Hi @pateljay . i am sharing dummy python code . that code is taking new file from some dir and moving to another directory . import time
import os
import shutil
SECONDS_IN_DAY = 120 #inseconds
src="C:\\Users\Ramesh.kumar\Desktop\SourceData"
dst = "C:\\Users\Ramesh.kumar\Desktop\newDataIdentified"
Data ="C:\\Users\Ramesh.kumar\Desktop\MoveToLinux"
now = time.time()
before = now - SECONDS_IN_DAY
print("time interval is:",before)
def last_mod_time(fname):
return os.path.getmtime(fname)
for fname in os.listdir(src):
src_fname = os.path.join(src, fname)
data_fname = os.path.join(Data, fname)
if last_mod_time(src_fname) > before:
dst_fname = os.path.join(dst, fname)
print("-------------------------------------------------------------------------------")
print("data is going to from A TO B")
print("-------------------------------------------------------------------------------")
shutil.copy(src_fname, dst_fname)
shutil.move(dst_fname,data_fname)
print("-------------------------------------------------------------------------------")
print("new data has been moved")
print("-------------------------------------------------------------------------------")
Note- if you are using hue or any job scheduler then you can easily achieve that. Shell-script dummy - echo "............................................................BANGALORE................................................................................."
echo "............................................................This is CCDA Job.........................................................................."
echo "........................................................Spark Job is going To START..................................................................."
echo "........................................................Spark is Ready to RUN........................................................................."
spark-submit python1.py
echo ".......................................................Spark job has been completed..................................................................."
echo ".......................................................Now PIG Query is going To RUN.................................................................."
pig -f ccd_cit_py.pig
echo ".......................................................PIG Query Completed here......................................................................."
echo "....................................................Now Going to create Hive Stage Table.............................................................."
echo ".....................................................Ready to create hive stage Table................................................................."
hive -f ccda_pig_hive_cit1.sql
echo "......................................................hive stage Table created........................................................................"
echo "......................................................Now Going Create Final Table...................................................................."
echo ".......................................................Ready to Create Final Table...................................................................."
hive -f ccda_pig_hive_cit_final.sql
echo "-------------------------------------------------------Final Table has been Created-------------------------------------------------------------------"
echo "--------------------------------------------------------demography table id going to create here-------------------------------------------------------"
hive -f DemoGrapghy.sql
echo "---------------------------------------------------------demoGraphy Table has been Created------------------------------------------------------------"
echo "-------------------------------------------------------HRK_CCDFile Table is going to create-----------------------------------------------------------"
echo "-----------------------------------------------------HRK_CCDFile Table has been Created---------------------------------------------------------------"
echo "......................................................ALL Table is created on Hive.................................................................."
echo "..................................................Now Going To Delete PIG INPUTPATH from HDFS........................................................."
echo "......................................................Ready To Delete PIG INPUTPATH..................................................................."
hdfs dfs -rmr /user/root/BigDataTest/CCDACIT/CCDAPYOUTPUTDATA/A
echo ".......................................................PIG INPUTPATH Deleted.........................................................................."
echo "......................................................................................................................................................"
echo "....................................................Now Going to Delete PIG OUTPUTPATH from HDFS......................................................"
echo "........................................................Ready To Delete PIG OUTPUTPATH................................................................"
echo ".............................................................................................................................................."
echo "............................................................BANGALORE................................................................................."
echo "............................................................This is CCDA Job.........................................................................."
echo "........................................................Spark Job is going To START..................................................................."
echo "........................................................Spark is Ready to RUN........................................................................."
spark-submit python1.py
echo ".......................................................Spark job has been completed..................................................................."
echo ".......................................................Now PIG Query is going To RUN.................................................................."
pig -f ccd_cit_py.pig
echo ".......................................................PIG Query Completed here......................................................................." Note - you need to add script as per your requirements . reference link- https://unix.stackexchange.com/questions/24952/script-to-monitor-folder-for-new-files Thanks HadoopHelp
... View more
03-02-2020
11:22 PM
Hi @pateljay . A- you can use shell-script to identify newly file with current directory and load it. B-or after first time pig load completion ,remove Tthat files into another directory or delete directly if not required. C-or write python or java code that identify the timestamp for newly files and load that file with pig. This link may be helpful- https://stackoverflow.com/questions/12630584/load-multiple-files-in-pig Thanks HadoopHelp
... View more
03-02-2020
10:50 PM
Hello all. is really hdp file size is 29 gb? https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html i started downloading this hdp file from above link and showing file size is 29 gb approx? HadoopHelp
... View more
03-02-2020
10:46 PM
Hello all. is really hdp file size is 29 gb? https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html i started downloading this hdp file from above link and showing file size is 29 gb approx? sorry this is not a part of this subject headlines but yes i need comment ? HadoopHelp
... View more
02-26-2020
05:34 AM
1 Kudo
Hi Mat. Please Try with - https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html Thanks HadoopHelp
... View more
02-19-2020
07:01 AM
Hi @pj1111 . please check though impala supported matrix - https://docs.cloudera.com/documentation/other/Matrix/topics/pcm_impala.html https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_string_functions.html Thanks HadoopHelp
... View more
02-12-2020
11:06 PM
Hi @vignesh_radhakr . you can simply access your hive by using below :- URL:- conn = hive.Connection(host="masterIP", port=10000, username="cdh123")
note:- MasterIP need to pass with port 10000 Thanks HadoopHelp
... View more
02-12-2020
05:18 AM
Hi . any solution you found for same . i having the same issue the accessing the hive through python. Thanks HadoopHelp
... View more
02-03-2020
07:17 AM
Hi @mike_bronson7 . I think you need to continue here:- https://community.cloudera.com/t5/Support-Questions/how-to-know-if-any-service-in-ambari-cluster-need-to-restart/td-p/228707 Thanks HadoopHelp
... View more
02-03-2020
06:28 AM
Hi @pdev . you can find all configure Details here in below path(cdh 5.14.0.):- /etc/cloudera-scm-agent/config.ini Thanks HadoopHelp
... View more
02-03-2020
06:18 AM
Hi @Deng2717 . Please check carefully as below Image:- i am able to see as above. for more Details :- https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_oozie_sqoop_jdbc.html Thanks HadoopHelp
... View more
01-06-2020
11:42 PM
Hi @EricL . Thank you very much... but the same use case is here:- this is my output data structure from a JSON we required:- {
"name": [
{
"use": "official", //here "tab1.use" is column and value
"family": "family",//here "tab1.family" is column and value
"given": [ //this column we need to create and add value from "tab1.fn&ln"
"first1", //here "first1" is coming from tab1.fname
"last1" //here "last1"is coming from tab1.lname
]
},
{
"use": "usual", //here "tab2.use" is column and value
"given": [ //here we need to create column with fn&ln
"first1 last1" //here "first1 last1" is coming from tab1.fname &tab1.lname
]
}
]
}
here we want to create a column(name) from above columns :-
above data is JSON structure but i want in Hive with table columns.
then further we can convert the same into JSON in my use cases.
Note :- structure is matter here.
Thanks
HadoopHelp
... View more
01-06-2020
07:53 AM
Dear All. i having some issue with combining two columns fields into single fields as STruct type. below code i tried but still getting the same issue? create table dummy_TBL2 (id int,fname string ,lname string) ;
insert into dummy_TBL2 (id,fname,lname) values(1,'bhau','anna') ; //dummy side
now below show the struct table:-
create table data_TBL2 (id int, name struct<fname:string,lname:string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;//in this table i took name as struct type
___inserting the data from dummy into struct type-----
insert into table data_TBL2
select id,
name(fname,lname)
from dummy_TBL2 limit 1; Thanks HadoopHelp
... View more
Labels:
12-18-2019
02:22 AM
hI @cjervis . Please use below Download link:- https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html I jest verified Thanks HadoopHelp
... View more
12-18-2019
02:06 AM
Hi @Suvrat . Please use below Download link:- https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html JUST I VERIFIED Note :- right now some issue is coming from your ways i checked all the way. Thanks HadoopHelp
... View more
12-18-2019
01:22 AM
Hi @bandarusridhar1 . Thanks but your shared link is not related with my requirement . i wanted to move or copy data from Cloudera Hadoop HDFS to HDInsight HDFS (wasb://xyz). Your last comment line that shown is good i think but how to implement no idea? Thanks HadoopHelp
... View more
12-17-2019
07:54 AM
Hi All. Here is all steps for doing same!!! Link :- https://www.oreilly.com/library/view/hadoop-with-python/9781492048435/ch01.html Thanks HadoopHelp
... View more
12-17-2019
07:48 AM
Hi All . here is more Details about above :- https://community.cloudera.com/t5/Support-Questions/HDInsight-Vs-HDP-Service-on-Azure-Vs-HDP-on-Azure-IaaS/m-p/166424 Thanks HadoopHelp
... View more
12-17-2019
07:27 AM
Dear All.
this issue is facing from a long time.
i want to move some of the Cloudera Hadoop HDFS data to Azure HDInsight HDFS?
But i found one solution for that (Azure Data BOX) but that is not suitable in my use case.
So any idea or suggestions appreciated.
how can we copy or move Cloudera Hadoop HDFS data to Azure HDInsight HDFS and Vice versa ?
Thanks
HadoopHelp
... View more
Labels:
- Labels:
-
Apache Hadoop
12-17-2019
02:28 AM
Hi . Please try with below :- https://stackoverflow.com/questions/21370431/how-to-access-hive-via-python Thanks HadoopHelp
... View more