About Creaping

Creaping · ‎10-12-2017

When I have two Tables A(id, name) and B(id, age) I want to join. through: SELECT * FROM A INNER JOIN B ON A.id=B.id and also through: SELECT * FROM A INNER JOIN B USING(id) in both ways, I get a table with duplicate key columns "id" from both previous tables: (id, name, id, age) What I want is (id, name, age), so the key columns should merge. EDIT: I know I could do it through "SELECT A.id, name, age..." instead of "SELECT * ...", but I have many columns, that I don't want to go this workaround.

Creaping · ‎09-19-2017

Hello Zsolt, thanks for the reply. The problem was, that I don't have the permissions to install python packages like pydoop. I was not sure if there is a native way, but I will ask the sysadmin to install some packages.

Creaping · ‎09-13-2017

Hello, I have some python standalone files, which acces data through the common command: with open("filename") as f: for lines in f: [...] I want make the python scripts able to run, without changing too much of the code and without dependencies, if possible. Right now I start the files as spark-programms in the Workflow in HUE. Are there built-in packages I can use? I tried to import pydoop and hdfs, but they didnt exist. My goal is to make these scripts run and be able to read/write files on the HDFS. Thanks for the help.

Online	Offline
Last Visited	‎08-04-2019 02:54 AM

Member Since	‎09-13-2017 04:07 AM
Last Visited	‎08-04-2019 02:54 AM
Posts	4

Cloudera Community

Impala Joins without duplicate key columns

Re: read/write hdfs files with standalone python s...

read/write hdfs files with standalone python scrip...