Member since
05-01-2015
30
Posts
8
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
19099 | 05-20-2015 02:37 AM | |
5998 | 05-08-2015 02:17 AM |
01-22-2021
02:22 AM
fixed it for me. strange that Impala does not automatically chooses Parquet format when creating an external table based on a Parquet file.
... View more
01-07-2021
12:12 AM
1 Kudo
In "Our Commitment to Open Source Software", Cloudera does mention that previously closed components such as Cloudera Manager will become available under an open source license Does this means that in time there will be an alternative open source version of the CDH stack that can be used without a subscription? https://blog.cloudera.com/our-commitment-to-open-source-software/ Over the course of the next 6 months, we plan to consolidate and transition the small number of projects currently licensed by Cloudera under closed source licenses to open source licenses. For example, components such as Cloudera Manager, Cloudera Navigator, and Cloudera Data Science Workbench will all eventually be available under an open source license. " Between September 2019 and January 2020, we’ll establish the new open source projects for formerly closed source components and begin licensing them under the AGPL."
... View more
11-27-2020
05:57 AM
1 Kudo
See latest Cloudera announcement about moving to a subscription model. https://www.cloudera.com/downloads/paywall-expansion.html Does this mean that even the older versions (6.3.x and older) that until now have still been available for download without a subscription, will no longer be available without a subscription? Does that mean that if you have a cluster running 6.3.x without a subscription, you can no longer add new nodes to the cluster? or might this still work if you have already downloaded the parcels using Cloudera Manager? What is the status of making all the code (such as Cloudera Manager) open source and available? I've not seen any news about that anymore. The pricing for CDH is very high, probably a lot of "legacy" users cannot afford it, what is the best option if you now need to move away from CDH, use plain Apache Hadoop? or other database such as Clickhouse or AWS Athena? i appreciate any info on my question. Lot's of questions.
... View more
Labels:
- Labels:
-
Cloudera Essentials
-
Cloudera Manager
03-08-2019
05:17 AM
This worked for me, the snipped below is an example for limiting the diskspace to 5gb <property> <name>firehose_time_series_storage_bytes</name> <value>5368709120</value> </property>
... View more
11-18-2016
07:04 AM
Hi, When i execute a query with the Impala editor in Hue, i get an error after a while. In my logs i can see that it is caused by the web frontend not sending a CSRF token or an incorrect token. I'm using CDH version:5.9.0 with Hue version: 3.11 Error from log: [18/Nov/2016 16:00:26 +0100] WARNING 94.198.159.131 maarten - "POST /notebook/api/check_status HTTP/1.1" -- CSRF token missing or incorrect. Any ideas what is causing this?
... View more
Labels:
- Labels:
-
Apache Impala
-
Cloudera Hue
05-20-2015
02:37 AM
1 Kudo
Found the problem. There were some "old style" parquet files in a hidden directory named .impala_insert_staging After removing these directories Spark could load the data. Impala will recreate the table when i do a new insert into the table. why there were some parquet files left in that dir is not clear to me. it was some pretty old data, so maybe something went wrong during an insert a while ago.
... View more
05-20-2015
02:15 AM
Hi, I am using Spark 1.3.1 and my data is stored in parquet format, the parquet files have been created by Impala. after i added a parition "server" to my partition schema (it was year,month,day and now is year,month,day,server ) and now Spark is having trouwble reading the data. I get the following error: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: ArrayBuffer(year, month, day) ArrayBuffer(year, month, day, server) Does spark keep some data in cache/temp dirs with the old schema? which is causing a mismatch? Any ideas on howto fix his issue? directory layout sample: drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17/server=ns1 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18/server=ns1 drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19 drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19/server=ns1 complete stacktrace: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: ArrayBuffer(year, month, day) ArrayBuffer(year, month, day, server) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933) at org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303) at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:391) at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:540) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC.<init>(<console>:34) at $iwC.<init>(<console>:36) at <init>(<console>:38) at .<init>(<console>:42) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Spark
05-18-2015
11:55 AM
yes that's exactly my situation right now. i've only stored data for 1 server sofar. i will continue with my workaround, thx for confirming that this will work for this situation.
... View more
05-17-2015
11:53 PM
i tested moving the data with "insert into select * from" but this is extremely slow. what if i drop all the partitions from the external table ( the data should remain on hdfs) and then create the new partition sub directories with a script and then move all the data files to the new sub directory. Then i can add the partitions again (with alter table add partition) and the data should be available again in the new partition schema. I did a test with a single partition directory (year, month,date) and after adding the "server" sub directory and moving the parquet data files to it. The after an alter table add partition ... the data could be queried again with the (year, month,date,server) partition schema. I am not sure if this method of adding the new partition has some drawbacks i don'nt know about? Maarten
... View more
05-17-2015
09:25 AM
1 Kudo
Hello, I have a very large table (50B+ rows) and it's partitioned with (year,month,day). Now i want to add a new "server" partition column, the resulting partition schema shoud be (year,month,day,server) looking at the ALTER command there seems to be no way to add an additional partition column. An alternative would be to create a new table with the (year,month,day,server) partition schema and then do a select * from old table to the new table. This is an expensive operation, am i missing something and is there an easier way to do this? Thx, Maarten
... View more