The Integrator sandbox software requires two components to function: the Windows-based client to manipulate your data and the server (running inside the sandbox) to execute your query.
The Integrator virtual machine contains following applications pre-installed.
Integrator Hadoop Agent
Apache Hadoop with spark
The Windows Client is installed separately.
Please note that this is not an ideal configuration for production use and is intended for testing purposes only.
Oracle VirtualBox Used to run the Integrator sandbox
Integrator Sandbox Image for VirtualBox The Integrator sandbox environment.
Integrator Windows Client Used to interact with the Integrator server.
Launch the installer
Click run on the security window. (Your security window may look different from the one below, but in general, you will receive a warning that the software is unsigned, and an option to install it anyway.) Since this is a beta, and development is proceeding rapidly, the client will be unsigned until release.
Install VirtualBox from the link provided.
Launch Oracle VirtualBox
Select [System] on left panel
On [Motherboard] tab set base memory to 8192MB or higher. (If your computer doesn’t have enough RAM for you to set the base memory to 8192MB, set it as high as possible. 4096MB will probably work, below that, things get iffy.)
On Virtual Machine Console
Note: Click anywhere on console to start using it. Right-Ctrl key to get mouse control back.
It may take few minutes after server has started for all applications to initialize. A log file is updated when the services start during startup. For debugging purpose you can check contents of log file /tmp/sandbox.log
The last line on log file should contain "********* Spark Submit [Found]"
User Name: eltmaestro
Click [Login] to start using Integrator Client
You can use ssh client like Putty to connect to the sandbox by connecting to localhost on port 2222 using above credentials
|http://localhost:50090/||Secondary Name Node|
SSH Login as user [root]
Run Following Commands
su - hadoop -c "/home/hadoop/scripts/stop_services.sh"
Integrator and Hadoop related services are set to automatically start when Sandbox Virtual Machine reboots. It may take few minutes for all services to come online after server has been started.
This database comes pre-installed on sandbox postgresql database system which is sourced from http://www.postgresqltutorial.com/postgresql-sample-database/
ER Model for this database is available at http://www.postgresqltutorial.com/wp-content/uploads/2018/03/printable-postgresql-sample-database-diagram.pdf
A pre-designed workflow called LOAD_ACTORS copies table [actors] from postgresql into hadoop.
A pre-configured onstage jdbc connection DVD_RENTAL_DATABASE_LOCAL is available to load into hadoop.