Member since
05-09-2016
280
Posts
58
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3813 | 03-28-2018 02:12 PM | |
3048 | 01-09-2018 09:05 PM | |
1668 | 12-13-2016 05:07 AM | |
5142 | 12-12-2016 02:57 AM | |
4419 | 12-08-2016 07:08 PM |
11-11-2016
08:37 PM
it would be the great help if someone replies to this thread, kind of stuck here. Thanks
... View more
11-10-2016
08:06 PM
Never mind, got this working. The problem was the double quote so I changed the replace string to
${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')} and it worked.
... View more
11-10-2016
06:44 PM
Hi Guys, I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this https://github.com/rcongiu/Hive-JSON-Serde). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this: {"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"} Sample of data in HDFS looks like this: {"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"}
{"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US & Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"} Then I ran SELECT * from the table which is the external table in Hive, I got this error: {"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('\n','')}. The data in HDFS looked much refined but that error was still there. Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this sample-data-worked.png. Any help would be greatly appreciated.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-09-2016
06:21 PM
2 Kudos
@Gunesh P , you have to create a user in OS, login to Sandbox terminal as root and type useradd abc passwd abc This user will not have sudo access, in order to give it, type visudo and add this line under root: ## Allow root to run any commands anywhere root ALL=(ALL) ALL abc ALL=(ALL) ALL Save the file and after this, you can do SSH and later on, you can give this user read/write access through HDFS policies of Ranger.
... View more
11-08-2016
10:46 PM
HI Experts, I am using Spark 2.0.0 and I have an airline dataset. I created a SparkR dataframe and able to run some of the functions of SparkR Dataframe API. But, I am running through some exceptions while building the linear model using Gaussian family. Here is my command: model <- glm(train_data, ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian") ERROR: Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Cannot resolve column name "formula" among (YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, CARRIER, FL_NUM, ORIGIN, DEST, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, CANCELLED, CANCELLATION_CODE, AIR_TIME, DISTANCE, WEEKEND, DEP_HOUR, DELAY_LABELED); For some reason, it tries to fetch formula column, so I replaced above command with: model <- glm(train_data, formula = ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian") This time, I got this error: ERROR Executor: Exception in task 0.0 in stage 45.0 (TID 95) scala.MatchError: [null,1.0,[1.0,11.0,5.0,1.0,2475.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) Has anyone seen such kind of behaviour? Thanks in advance
... View more
Labels:
11-08-2016
10:34 PM
@Manoj Dhake , it depends on the dependent variable. The unit of RMSE is same as dependent variable. If your data has a range of 0 to 100000 then RMSE value of 3000 is small, but if the range goes from 0 to 1, it is pretty huge. Try to play with other input variables, and compare your RMSE values. The smaller the RMSE value, the better the model. Also, try to compare your RMSE values of both training and testing data. If they are almost similar, your model is good. If the RMSE for the testing data is much higher than that of the training data, it is likely that you've badly over fit the data.
... View more
11-03-2016
06:10 PM
1 Kudo
@Hitesh Rajpurohit , make sure that the Ranger Tagsync is up and running. Go to Ambari and then click on Ranger from the list of services, click on Tagsync and start the component.
... View more
11-02-2016
07:23 PM
1 Kudo
@azeltov , @Edgar Daeds When we run anything from Livy server, it tries to connect to Resource manager at 8032 port. But in Sandbox, the value of yarn.resourcemanager.address is 8050 so it waits for 10 tries and then fails. Go to Ambari->Yarn->Configs. Search for yarn.resourcemanager.address and change it to 8032. Restart YARN and then try running %livy.pyspark.
... View more
11-01-2016
08:49 PM
I am trying to install R-studio on Hortonworks sandbox 2.5, downloaded this rpm: rstudio-server-rhel-0.99.893-x86_64.rpm After installing this package, when I am running rstudio-server verify-installation, it shows me the following message: initctl: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused I have tried starting, stopping rstudio server, it shows the same message. Has anyone experienced this before? Please help. PS: Since it is a docker container, 8787 port is not opened so I have configured /etc/rstudio/rserver.conf to use port 8090.
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
11-01-2016
04:37 AM
1 Kudo
@Volodymyr Ostapiv , the Sandbox release has been updated, please download it again, you should not face this issue now. http://hortonworks.com/downloads/
... View more