About crcerror

crcerror · ‎01-22-2021

fixed it for me. strange that Impala does not automatically chooses Parquet format when creating an external table based on a Parquet file.

crcerror · ‎01-07-2021

In "Our Commitment to Open Source Software", Cloudera does mention that previously closed components such as Cloudera Manager will become available under an open source license Does this means that in time there will be an alternative open source version of the CDH stack that can be used without a subscription? https://blog.cloudera.com/our-commitment-to-open-source-software/ Over the course of the next 6 months, we plan to consolidate and transition the small number of projects currently licensed by Cloudera under closed source licenses to open source licenses. For example, components such as Cloudera Manager, Cloudera Navigator, and Cloudera Data Science Workbench will all eventually be available under an open source license. " Between September 2019 and January 2020, we’ll establish the new open source projects for formerly closed source components and begin licensing them under the AGPL."

crcerror · ‎11-27-2020

See latest Cloudera announcement about moving to a subscription model. https://www.cloudera.com/downloads/paywall-expansion.html Does this mean that even the older versions (6.3.x and older) that until now have still been available for download without a subscription, will no longer be available without a subscription? Does that mean that if you have a cluster running 6.3.x without a subscription, you can no longer add new nodes to the cluster? or might this still work if you have already downloaded the parcels using Cloudera Manager? What is the status of making all the code (such as Cloudera Manager) open source and available? I've not seen any news about that anymore. The pricing for CDH is very high, probably a lot of "legacy" users cannot afford it, what is the best option if you now need to move away from CDH, use plain Apache Hadoop? or other database such as Clickhouse or AWS Athena? i appreciate any info on my question. Lot's of questions.

crcerror · ‎03-08-2019

This worked for me, the snipped below is an example for limiting the diskspace to 5gb <property> <name>firehose_time_series_storage_bytes</name> <value>5368709120</value> </property>

crcerror · ‎11-18-2016

Hi, When i execute a query with the Impala editor in Hue, i get an error after a while. In my logs i can see that it is caused by the web frontend not sending a CSRF token or an incorrect token. I'm using CDH version:5.9.0 with Hue version: 3.11 Error from log: [18/Nov/2016 16:00:26 +0100] WARNING 94.198.159.131 maarten - "POST /notebook/api/check_status HTTP/1.1" -- CSRF token missing or incorrect. Any ideas what is causing this?

crcerror · ‎05-20-2015

Found the problem. There were some "old style" parquet files in a hidden directory named .impala_insert_staging After removing these directories Spark could load the data. Impala will recreate the table when i do a new insert into the table. why there were some parquet files left in that dir is not clear to me. it was some pretty old data, so maybe something went wrong during an insert a while ago.

crcerror · ‎05-20-2015

Hi, I am using Spark 1.3.1 and my data is stored in parquet format, the parquet files have been created by Impala. after i added a parition "server" to my partition schema (it was year,month,day and now is year,month,day,server ) and now Spark is having trouwble reading the data. I get the following error: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: ArrayBuffer(year, month, day) ArrayBuffer(year, month, day, server) Does spark keep some data in cache/temp dirs with the old schema? which is causing a mismatch? Any ideas on howto fix his issue? directory layout sample: drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17/server=ns1 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18 drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18/server=ns1 drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19 drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19/server=ns1 complete stacktrace: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: ArrayBuffer(year, month, day) ArrayBuffer(year, month, day, server) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933) at org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303) at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:391) at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:540) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC.<init>(<console>:34) at $iwC.<init>(<console>:36) at <init>(<console>:38) at .<init>(<console>:42) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

crcerror · ‎05-18-2015

yes that's exactly my situation right now. i've only stored data for 1 server sofar. i will continue with my workaround, thx for confirming that this will work for this situation.

crcerror · ‎05-17-2015

i tested moving the data with "insert into select * from" but this is extremely slow. what if i drop all the partitions from the external table ( the data should remain on hdfs) and then create the new partition sub directories with a script and then move all the data files to the new sub directory. Then i can add the partitions again (with alter table add partition) and the data should be available again in the new partition schema. I did a test with a single partition directory (year, month,date) and after adding the "server" sub directory and moving the parquet data files to it. The after an alter table add partition ... the data could be queried again with the (year, month,date,server) partition schema. I am not sure if this method of adding the new partition has some drawbacks i don'nt know about? Maarten

crcerror · ‎05-17-2015

Hello, I have a very large table (50B+ rows) and it's partitioned with (year,month,day). Now i want to add a new "server" partition column, the resulting partition schema shoud be (year,month,day,server) looking at the ALTER command there seems to be no way to add an additional partition column. An alternative would be to create a new table with the (year,month,day,server) partition schema and then do a select * from old table to the new table. This is an expensive operation, am i missing something and is there an easier way to do this? Thx, Maarten

Online	Offline
Last Visited	‎02-08-2021 05:42 AM

Member Since	‎05-01-2015 12:35 AM
Last Visited	‎02-08-2021 05:42 AM
Posts	30
Kudos received	8

Cloudera Community

Re: Spark Conflicting partition schema parquet fil...

Re: Python error upgrading manager agent

Re: Cannot read Parquet format generated by Spark

Re: Paywall also for CDH 6.3.x? and open source av...

Paywall also for CDH 6.3.x? and open source availa...

Re: Cleaning up CM logs

HUE CSRF token missing or incorrect.

Re: Spark Conflicting partition schema parquet fil...

Spark Conflicting partition schema parquet files

Re: Adding partition column

Re: Adding partition column

Adding partition column