Member since
01-24-2019
49
Posts
4
Kudos Received
0
Solutions
11-21-2019
11:32 PM
Perfect its worked
... View more
02-14-2019
02:58 AM
@ujvala reddy Reason is The first Week of Year is the first week with 4 or more days in the new year. First day of week is Monday and last day of week is Sunday Refer to this thread for more details regards to this week of year.
... View more
09-28-2018
07:21 AM
If First day of week should start from Monday, change the subtraction/addition date to 1900-01-08 --First day of the week as Monday
select date_sub('2018-09-12',pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-10 |
+-------------+--+
--Last day of the week as Sunday
select date_add('2018-09-12',6 - pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-16 |
+-------------+--+
... View more
07-24-2018
04:09 PM
@Gayathri Devi I've created a database in MariaDB and exported a hive table using sqoop on my lab setup. This worked well for me; [sqoop@jsneep-lab ~]$ sqoop export --connect jdbc:mysql://172.3.2.1/export --username mariadb --password mariadb --table exported --direct --export-dir /apps/hive/warehouse/drivers Make sure you have /usr/share/java/mysql-connector-java.jar present on your system, this gave me trouble initially.
... View more
04-02-2018
05:19 PM
1 Kudo
Do you have a target variable that you can predict? Or do you have logic that will allow you to convert a "low" CPU value into a target variable? Spark has a wide variety of models that are available for classification modeling: https://spark.apache.org/docs/latest/mllib-classification-regression.html If you are interested in seeing which factor is contributing to a specific instance, I would recommend starting with a logistic regression model as that will provide more explanatory power -- providing more insight into which factor is contributing to a particular CPU failure
... View more
03-28-2018
06:06 AM
3 Kudos
@Gayathri Devi, This prediction depends on the date you have. You may have labelled or unlabelled data based on which you have different algorithms. Assuming your data is labelled, then you have to find if you are trying to solve a regression problem or a classification problem. Based on that you can choose the algorithms. Since you have written that you want to find outliers , I'm assuming that it is a regression problem. Then you can use algorithms like Linear Regression, Support Vector Regression, Decision tree regression, Random forest regression etc. If your data is unlabelled, you have to use a unsupervised learning method. You will have algorithms like K-Means clustering, Hierarchical clustering etc. The main part of any solving machine learning problem is learning what your data is and choosing the right algorithm for your problem. So you may need to spend more time in analysing data and choosing the right algorithm. Here are few links for the concepts mentioned above. You can find these algorithms in spark. https://spark.apache.org/docs/latest/ml-guide.html https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/ https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/ https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data Happy machine learning 🙂 . -Aditya
... View more
02-28-2018
12:46 PM
2 Kudos
@Gayathri Devi
Could you try with below query as we are reading 2018-02-27T02:00 value and converting as timestamp value. Query:- hive> select from_unixtime(unix_timestamp('2018-02-27T02:00',"yyyy-MM-dd'T'hh:mm"),'yyyy-MM-dd hh:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2018-02-27 02:00:00 |
+----------------------+--+ (or) By using regexp_replace function we can replace T in your timestamp value hive> select regexp_replace('2018-02-27T02:00','T',' ');
+-------------------+--+
| _c0 |
+-------------------+--+
| 2018-02-27 02:00 |
+-------------------+--+ And use concat function to add missing :00 value to make above value as hive timestamp. hive> select concat(regexp_replace('2018-02-27T02:00','T',' '),":00");
+----------------------+--+
| _c0 |
+----------------------+--+
| 2018-02-27 02:00:00 |
+----------------------+--+
... View more
02-28-2018
03:39 AM
1 Kudo
@Gayathri Devi
You can use INPUT__FILE__NAME(gives all input filenames of the table) virtual column and construct your query then store the results of your query to final table. You need to create a temp table and keep your akolp9app1a_170905_0000.txt file in that table location. Then use hive> select INPUT__FILE__NAME from table; //this statement results your akolp9app1a_170905_0000.txt filename
+---------------------------------------------------------------------------------+--+
| input__file__name |
+---------------------------------------------------------------------------------+--+
| /apps/hive/warehouse/sales/akolp9app1a_170905_0000.txt |
+---------------------------------------------------------------------------------+--+ So then you can use all your string functions like substring on the input_file_name filed and keep your hostname,date fileds extracted from the input__file__name field. hive> select substring(INPUT__FILE__NAME,20,30) hostname,substring(INPUT__FILE__NAME,40,50) `date` from table; Then you can have final table that you can insert the above select statement hostname,date values. hive> insert into finaltable select substring(INPUT__FILE__NAME,20,30) hostname,substring(INPUT__FILE__NAME,40,50) `date` from table; For more references:- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
... View more
01-08-2018
07:11 AM
1 Kudo
@Gayathri Devi, You can use the below script . beeline -u "{connection-string}" -e "show tables" | grep $1
if [ $? -eq 0 ]
then
echo "table found"
else
echo "table not found"
fi But the content in a file say checktable.sh and run the below steps chmod +x checktable.sh
./checktable.sh {tablename to check} Thanks, Aditya
... View more
12-07-2017
10:58 AM
1 Kudo
@Gayathri Devi, Can you try this query insert into table tblename select * from (select from_unixtime(unix_timestamp('161223000001', 'yyMMddHHmmss')))b; #2) If you have timestamp as '1506614501' hive> select from_unixtime(unix_timestamp('1506614501', 'yyMMddHHmm'));
OK
2015-08-01 21:01:00
Time taken: 0.257 seconds, Fetched: 1 row(s)
Thanks, Aditya
... View more