Preparing the Database(4.2)
Follow these steps to prepare the Impala database:
- Open a browser and and enter URL of the Hue interface.
- Create a staging directory.
- Open the file browser in Hue.
- Select a directory in the file browser, e.g. /user/impala/uploads
- Click the New button and then select directory.
- Enter the name of the new directory, e.g.
staging
and then click Create. - Select the directory in the file browser.
- Click the Actions button and then select Change Permissions.
- Update the permissions to make the new directory available to the UNIX user(s) that is used to start the ECs.
Â
- Create a database and a table to be used by Data Hub.
- Select Impala from Query Editors.
Enter a CREATE DATABASE statement in the editor and then click the Execute button.
Example - Creating a database
CREATE DATABASE test;
- Click the Refresh button.
Enter a CREATE TABLE statement in the editor and then click the Execute button.
The CREATE TABLE statement may contain the following data types:
STRING
INT
FLOAT
DOUBLE
- BOOLEAN
BIGINT
REAL
SMALINT
TINYINT
TIMESTAMP
Note!
A
PARTIONED BY
clause is optional. However, it is highly recommended since it will improve the performance of queries that restrict results by the partitioned column. A partition column ofINT
type also make it possible to use the Data Hub task agent to automatically remove old data from the table. For further information about the Data Hub task agent, see Data Hub Task Agent(4.2).Data Hub is limited to handle one partition column.
A
STORED AS PARQUET
clause is required. If you omit this clause, Data Hub will fail to update the table.Example - Creating a table in Impala
CREATE TABLE IF NOT EXISTS mytable (id STRING, start BIGINT, stop BIGINT) PARTITIONED BY (yearmonthday INT) STORED AS PARQUET TBLPROPERTIES ('transactional'='false');
When you run the Data Hub agent, temporary tables will be created in the same schema. These table will be visible in Hue but hidden in the system.