/
Preparing the Database

Preparing the Database

Follow these steps to prepare the Impala database:

  1. Open a browser and and enter URL of the Hue interface.

  2. Create a staging directory.
    1. Open the file browser in Hue.
    2. Select a directory in the file browser, e.g. /user/impala/uploads
    3. Click the New button and then select directory.
    4. Enter the name of the new directory, e.g. staging and then click Create.
    5. Select the directory in the file browser.
    6. Click the Actions button and then select Change Permissions.
    7. Update the permissions to make the new directory available to the UNIX user(s) that is used to start the ECs.
       
  3. Create a database and a table to be used by Data Hub.
    1. Select Impala from Query Editors.
    2. Enter a CREATE DATABASE statement in the editor and then click the Execute button.

      Example - Creating a database

      CREATE DATABASE test;
    3. Click the Refresh button.
    4. Enter a CREATE TABLE statement in the editor and then click the Execute button.

      The CREATE TABLE statement may contain the following data types:

      1. STRING

      2. INT

      3. FLOAT

      4. DOUBLE

      5. BOOLEAN
      6. BIGINT

      7. REAL

      8. SMALINT

      9. TINYINT

      10. TIMESTAMP


        Note!

        A PARTIONED BY clause is optional. However, it is highly recommended since it will improve the performance of queries that restrict results by the partitioned column. A partition column of INT type also make it possible to use the Data Hub task agent to automatically remove old data from the table. For further information about the Data Hub task agent, see Data Hub Task Agent.

        Data Hub is limited to handle one partition column.

        A STORED AS PARQUET clause is required. If you omit this clause, Data Hub will fail to update the table.

        Example - Creating a table in Impala

        CREATE TABLE IF NOT EXISTS mytable (id STRING, start BIGINT, stop BIGINT) 
        PARTITIONED BY (yearmonthday INT) 
        STORED AS PARQUET
        TBLPROPERTIES ('transactional'='false');

        When you run the Data Hub agent, temporary tables will be created in the same schema. These table will be visible in Hue but hidden in the system.