SQOOP intrerview Questions
Interview Questions for –
Sqoop(http://www.bigdatatrunk.com/top-50-sqoop-interview-questions/)
Q1 What is the process to
perform an incremental data load in Sqoop?
Answer: The process to perform incremental data load in Sqoop is
to synchronize the modified or updated data (often referred as delta data) from
RDBMS to Hadoop. The delta data can be facilitated through the incremental load
command in Sqoop.
Incremental load can be performed by using Sqoop import command
or by loading the data into hive without overwriting it. The different
attributes that need to be specified during incremental load in Sqoop are-
1)Mode (incremental) –The mode defines how Sqoop will determine
what the new rows are. The mode can have value as Append or Last Modified.
2)Col (Check-column) –This attribute specifies the column that
should be examined to find out the rows to be imported.
3)Value (last-value) –This denotes the maximum value of the
check column from the previous import operation.
Q2 How Sqoop can be used in a Java program?
Answer: The Sqoop jar in classpath should be included in the
java code. After this the method Sqoop.runTool () method must be invoked. The
necessary parameters should be created to Sqoop programmatically just like for
command line.
Q3 What is the significance of using –compress-codec parameter?
Answer: To get the out file of a sqoop import in formats other
than .gz like .bz2 we use the –compress -code parameter.
Q4 How are large objects handled in Sqoop?
Answer: Sqoop provides the capability to store large sized data
into a single field based on the type of data. Sqoop supports the ability to
store-
1)CLOB ‘s – Character Large Objects
2)BLOB’s –Binary Large Objects
Large objects in Sqoop are handled by importing the large
objects into a file referred as “LobFile” i.e. Large Object File. The LobFile
has the ability to store records of huge size, thus each record in the LobFile
is a large object.
Q5 What is a disadvantage of using –direct parameter for faster
data load by sqoop?
Answer: The native utilities used by databases to support faster
load do not work for binary data formats like SequenceFile
Q6 Is it possible to do an incremental import using Sqoop?
Answer: Yes, Sqoop supports two types of incremental imports-
1)Append
2)Last Modified
To insert only rows Append should be used in import command and
for inserting the rows and also updating Last-Modified should be used in the
import command.
Q7 How can you check all the tables present in a single database
using Sqoop?
Answer: The command to check the list of all tables present in a
single database using Sqoop is as follows-
Sqoop list-tables –connect jdbc: mysql: //localhost/user;
Q8 How can you control the number of mappers used by the sqoop
command?
Answer: The Parameter –num-mappers is used to control the number
of mappers executed by a sqoop command. We should start with choosing a small
number of map tasks and then gradually scale up as choosing high number of
mappers initially may slow down the performance on the database side.
Q9 What is the standard location or path for Hadoop Sqoop
scripts?
Answer: /usr/bin/Hadoop Sqoop
Q10 How can we import a subset of rows from a table without
using the where clause?
Answer: We can run a filtering query on the database and save
the result to a temporary table in database.
Then use the sqoop import command without using the –where
clause
Q11 When the source data keeps getting updated frequently, what
is the approach to keep it in sync with the data in HDFS imported by sqoop?
Answer: qoop can have 2 approaches.
a − To use the –incremental parameter with append option where
value of some columns are checked and only in case of modified values the row
is imported as a new row.
b − To use the –incremental parameter with lastmodified option
where a date column in the source is checked for records which have been
updated after the last import.
Q12 What is a sqoop metastore?
Answer: It is a tool using which Sqoop hosts a shared metadata
repository. Multiple users and/or remote users can define and execute saved
jobs (created with sqoop job) defined in this metastore.
Clients must be configured to connect to the metastore in
sqoop-site.xml or with the –meta-connect argument.
Q13 Can free form SQL queries be used with Sqoop import command?
If yes, then how can they be used?
Answer: Sqoop allows us to use free form SQL queries with the
import command. The import command should be used with the –e and – query
options to execute free form SQL queries. When using the –e and –query options
with the import command the –target dir value must be specified.
Q14 Tell few import control commands:
Answer: –Append
–Columns
–Where
These command are most frequently used to import RDBMS Data.
Q15 Can free form SQL queries be used with Sqoop import command?
If yes, then how can they be used?
Answer: Sqoop allows us to use free form SQL queries with the
import command. The import command should be used with the –e and – query
options to execute free form SQL queries. When using the –e and –query options
with the import command the –target dir value must be specified.
Q16 How can you see the list of stored jobs in sqoop metastore?
Answer: sqoop job –list
Q17 What type of databases Sqoop can support?
Answer: MySQL, Oracle, PostgreSQL, IBM, Netezza and Teradata.
Every database connects through jdbc driver.
Q18 What is the purpose of sqoop-merge?
Answer: The merge tool combines two datasets where entries in
one dataset should overwrite entries of an older dataset preserving only the
newest version of the records between both the data sets.
Q19 HOw sqoop can handle large objects?
Answer: Blog and Clob columns are common large objects. If the
object is less than 16MB, it stored inline with the rest of the data. If large
objects, temporary stored in_lob subdirectory. Those lobs processes in a
streaming fashion. Those data materialized in memory for processing. IT you set
LOB to 0, those lobs objects placed in external storage.
Q20 What is the importance of eval tool?
Answer: It allows user to run sample SQL queries against
Database and preview the results on the console. It can help to know what data
can import? The desired data imported or not?
Q21 What is the default extension of the files produced from a
sqoop import using the –compress parameter?
Answer: .gz
Q22 Can we import the data with “Where” condition?
Answer: Yes, Sqoop has a special option to export/import a
particular data.
Q23 What are the limitations of importing RDBMS tables into
Hcatalog directly?
Answer: There is an option to import RDBMS tables into Hcatalog
directly by making use of –hcatalog –database option with the –hcatalog –table but
the limitation to it is that there are several arguments like –as-avro file ,
-direct, -as-sequencefile, -target-dir , -export-dir are not supported.
Q24 what are the majorly used commands in sqoop?
Answer: In Sqoop Majorly Import and export command are used. But
below commands are also useful sometimes. codegen, eval, import-all-tables,
job, list-database, list-tables, merge, metastore.
Q25 What is the usefulness of the options file in sqoop.
Answer: The options file is used in sqoop to specify the command
line values in a file and use it in the sqoop commands.
For example the –connect parameter’s value and –user name value
scan be stored in a file and used again and again with different sqoop
commands.
Q26 what are the common delimiters and escape character in
sqoop?
Answer: The default delimiters are a comma(,) for fields, a
newline(\n) for records
Escape characters are \b,\n,\r,\t,\”, \\’,\o etc
Q27 What are the two file formats supported by sqoop for import?
Answer: Delimited text and Sequence Files.
Q28 while loading table from MySQL into HDFS, if we need to copy
tables with maximum possible speed, what can you do?
Answer: We need to use -direct argument in import command to use
direct import fast path and this -direct can be used only with MySQL and
PostGreSQL as of now.
Q29 How can you sync a exported table with HDFS data in which
some rows are deleted?
Answer: Truncate the target table and load it again.
Q30 Differentiate between Sqoop and distCP.
Answer: DistCP utility can be used to transfer data between
clusters whereas Sqoop can be used to transfer data only between Hadoop and
RDBMS.
Q31 How can you import only a subset of rows form a table?
Answer: By using the WHERE clause in the sqoop import statement
we can import only a subset of rows.
Q32 How do you clear the data in a staging table before loading
it by Sqoop?
Answer: By specifying the –clear-staging-table option we can
clear the staging table before it is loaded. This can be done again and again
till we get proper data in staging.
Q33 What is Sqoop?
Answer: Sqoop is an open source project that enables data
transfer from non-hadoop source to hadoop source. It can be remembered as SQL
to Hadoop -> SQOOP. It allows user to specify the source and target location
inside the Hadoop.
Q34 Is it possible to do an incremental import using Sqoop?
Answer: Yes, Sqoop supports two types of incremental imports-
1)Append
2)Last Modified
To insert only rows Append should be used in import command and
for inserting the rows and also updating Last-Modified should be used in the
import command.
Q35 How can you export only a subset of columns to a relational
table using sqoop?
Answer: By using the –column parameter in which we mention the
required column names as a comma separated list of values.
Q36 Which database the sqoop metastore runs on?
Answer: Running sqoop-metastore launches a shared HSQLDB
database instance on the current machine.
Q37 How will you update the rows that are already exported?
Answer: The parameter –update-key can be used to update existing
rows. In it a comma-separated list of columns is used which uniquely identifies
a row. All of these columns is used in the WHERE clause of the generated UPDATE
query. All other table columns will be used in the SET part of the query.
Q38 You have a data in HDFS system, if you want to put some more
data to into the same table, will it append the data or overwrite?
Answer: No it can’t overwrite, one way to do is copy the new
file in HDFS.
Q39 Where can the metastore database be hosted?
Answer: The metastore database can be hosted anywhere within or
outside of the Hadoop cluster.
Q40 Which is used to import data in Sqoop ?
Answer: In SQOOP import command is used to import RDBMS data
into HDFS. Using import command we can import a particular table into HDFS. –
Q41 What is the role of JDBC driver in a Sqoop set up?
Answer: To connect to different relational databases sqoop needs
a connector. Almost every DB vendor makes this connecter available as a JDBC
driver which is specific to that DB. So Sqoop needs the JDBC driver of each of
the database it needs to interact with.
Q42 How to import only the updated rows form a table into HDFS
using sqoop assuming the source has last update timestamp details for each row?
Answer: By using the lastmodified mode. Rows where the check
column holds a timestamp more recent than the timestamp specified with
–last-value are imported.
Q43 What is InputSplit in Hadoop?
Answer: When a hadoop job is run, it splits input files into
chunks and assign each split to a mapper to process. This is called Input Split
Q44 Hadoop sqoop word came from ?
Answer: Sql + Hadoop = sqoop
Q45 What is the work of Export In Hadoop sqoop ?
Answer: Export the data from HDFS to RDBMS
Q46 Use of Codegen command in Hadoop sqoop ?
Answer: Generate code to interact with database records
Q47 Use of Help command in Hadoop sqoop ?
Answer: List available commands
Q48 How can you schedule a sqoop job using Oozie?
Answer: Oozie has in-built sqoop actions inside which we can
mention the sqoop commands to be executed.
Q49 What are the two file formats supported by sqoop for import?
Answer: Delimited text and Sequence Files.
Q50 What is a sqoop metastore?
Answer: It is a tool using which Sqoop hosts a shared metadata
repository. Multiple users and/or remote users can define and execute saved
jobs (created with sqoop job) defined in this metastore.
Clients must be configured to connect to the metastore in
sqoop-site.xml or with the –meta-connect argument.
(
------------------------------------
https://www.dezyre.com/article/sqoop-interview-questions-and-answers-for-2017/274
Top Answers to Sqoop Interview Questions
1. Compare
Sqoop and Flume
Criteria
|
Sqoop
|
Flume
|
Application
|
Importing data from RDBMS
|
Moving bulk streaming
data into HDFS
|
Architecture
|
Connector –
connecting to respective data
|
Agent – fetching of the
right data
|
Loading of data
|
Event driven
|
Not event driven
|
2. Name a few
import control commands. How can Sqoop handle large objects?
Import
control commands are used to import RDBMS data
Append: Append
data to an existing dataset in HDFS. –append
Columns: columns
to import from the table. –columns
<col,col……> • Where: where clause to use during import. —
<col,col……> • Where: where clause to use during import. —
where The
common large objects are Blog and Clob.Suppose the object is less than 16 MB,
it is stored inline with the rest of the data. If there are big objects, they
are temporarily stored in a subdirectory with the name _lob. Those data are
then materialized in memory for processing. If we set lob limit as ZERO (0)
then it is stored in external memory.
3. How can we
import data from particular row or column? What is the destination types
allowed in Sqoop import command?
Sqoop allows
to Export and Import the data from the data table based on the where clause.
The syntax is
--columns
<col1,col2……> --where
--query
Example:
sqoop import –connect
jdbc:mysql://db.one.com/corp --table INTELLIPAAT_EMP --where “start_date>
’2016-07-20’ ”
sqoopeval --connect
jdbc:mysql://db.test.com/corp --query “SELECT * FROM intellipaat_emp LIMIT 20”
sqoop import –connect
jdbc:mysql://localhost/database --username root --password aaaaa –columns
“name,emp_id,jobtitle”
Sqoop
supports data imported into following services:
·
HDFS
·
Hive
·
Hbase
·
Hcatalog
·
Accumulo
Learn about
the complete Hadoop ecosystem in this blog post.
Get Sqoop Certification in just 8 Hours
GET CERTIFIED
4. Role of
JDBC driver in sqoop setup? Is the JDBC driver enough to connect the sqoop to
the database?
Sqoop needs a
connector to connect the different relational databases. Almost all Database
vendors make a JDBC connector available specific to that Database, Sqoop needs
a JDBC driver of the database for interaction.
No, Sqoop needs JDBC and a connector to connect a database.
No, Sqoop needs JDBC and a connector to connect a database.
Interested in learning Sqoop? Well, we have the
comprehensive Training Course to
give you a head start in your career.
5. Using
Sqoop command how can we control the number of mappers?.
We can
control the number of mappers by executing the parameter –num-mapers in sqoop
command. The –num-mappers arguments control the number of map tasks, which is
the degree of parallelism used. Start with a small number of map tasks, then
choose a high number of mappers starting the performance may down on the
database side.
Syntax: -m,
–num-mappers
6.How will
you update the rows that are already exported? Write sqoop command to show all
the databases in MySQL server.
By using the
parameter – update-key we can update existing rows. Comma-separated list of
columns is used which uniquely identifies a row. All of these columns are used
in the WHERE clause generated UPDATE query. All other table columns will be
used in the SET part of the query.
The command below is used to show all the databases in MySQL server.
The command below is used to show all the databases in MySQL server.
$ sqoop list –databases –connect
jdbc:mysql://database.test.com/
7. Define
Sqoop metastore? What is the purpose of Sqoop-merge?
Sqoop meta
store is a tool for using hosts in a shared metadata repository. Multiple users
and remote users can define and execute saved jobs defined in metastore. End
users configured to connect the metastore in sqoop-site.xml or with the
–meta-connect
argument.
The purpose
of sqoop-merge is:
This tool combines 2 datasets where entries in one dataset overwrite entries of an older dataset preserving only the new version of the records between both the data sets.
This tool combines 2 datasets where entries in one dataset overwrite entries of an older dataset preserving only the new version of the records between both the data sets.
8. Explain
the saved job process in Sqoop.
Sqoop allows
us to define saved jobs which make this process simple. A saved job records the
configuration information required to execute a Sqoop command at a later time.
sqoop-job tool describes how to create and work with saved jobs. Job
descriptions are saved to a private repository stored in $HOME/.sqoop/.
We can
configure Sqoop to instead use a shared metastore, which makes saved jobs
offered to multiple users across a shared cluster. Starting the metastore is
covered by the section on the sqoop-metastore tool.
9. How Sqoop
word came ? Sqoop is which type of tool and the main use of sqoop?
Sqoop word
came from SQL+HADOOP=SQOOP. And Sqoop is a data transfer tool.
The main use of Sqoop is to import and export the large amount of data from RDBMS to HDFS and vice versa.
The main use of Sqoop is to import and export the large amount of data from RDBMS to HDFS and vice versa.
10. How to
enter into Mysql prompt, and explain the command parameter indicates?
The command
for entering into Mysql prompt is “mysql –u root –p”
-u indicatesthe user
Root indicates username
-p indicates password.
-u indicatesthe user
Root indicates username
-p indicates password.
11. I am
getting connection failure exception during connecting to Mysql through Sqoop,
what is the root cause and fix for this error scenario?
This will
happen when there is lack of permissions to access our Mysql database over the
network. We can try the below command to confirm the connect to Mysql database
from aSqoop client machine.
$ mysql –host=MySqlnode> –database=test –user= –password=
We can grant the permissions with below commands.
$ mysql –host=MySqlnode> –database=test –user= –password=
We can grant the permissions with below commands.
mysql> GRANT ALL PRIVILEGES ON
*.* TO ‘%’@’localhost’;
mysql> GRANT ALL PRIVILEGES ON
*.* TO ‘ ’@’localhost’;
Give your
career a big boost by going through our Apache Sqoop
Training Course now!
12. I am
getting java.lang.IllegalArgumentException: during importing tables from oracle
database.what might be the root cause and fix for this error scenario?
Sqoop
commands are case- sensitive of table names and user names.
By specifying the above two values in UPPER case, it will resolve the issue.
In case, the source table is created under different user namespace,then table name should be like USERNAME.TABLENAME as shown below
sqoop import
–connect jdbc:oracle:thin:@intellipaat.testing.com/INTELLIPAAT
–username SQOOP
–password sqoop
–table COMPANY.EMPLOYEES
By specifying the above two values in UPPER case, it will resolve the issue.
In case, the source table is created under different user namespace,then table name should be like USERNAME.TABLENAME as shown below
sqoop import
–connect jdbc:oracle:thin:@intellipaat.testing.com/INTELLIPAAT
–username SQOOP
–password sqoop
–table COMPANY.EMPLOYEES
13. How can
you list all the columns of a table using Apache sqoop?
There is no
straight way to list all the columns of a table in Apache Sqoop like
sqoop-list-columns, so first we should retrieve the columns of the particular
table and transform to a file containing the column names of particular
table.Syntax is:
Sqoop import –m1 –connect
‘jdbc:sqlserver://servername;database=databasename;
Username-DeZyre;password=mypassword’
–query “SELECT column_name,DATA_TYPE FROM INFORMATION_SCHEMA columns
WHEREtable_name=’mytableofinterest’ AND \$CONDITIONS” –target-dir
‘mytableofinterest_column_name’.
14. How to
create a table in Mysql and how to insert the values into the table ?
To create a
table in mysql using the below command
mysql> create table tablename(
col1 datatype, col2 datatype,…………);
Example –
mysql> create table
INTELLIPAAT(emp_idint,emp_namevarchar(30),emp_salint);
Insert the
values into the table
mysql> insert into table
name(value1,value2,value3,………);
Example-
mysql> insert into
INTELLIPAAT(1234,’aaa’,20000);
mysql> insert into
INTELLIPAAT(1235,’bbb’,10000);
mysql> insert into
INTELLIPAAT(1236,’ccc’,15000);
15. What are
the basic commands in HadoopSqoop and its uses?
The basic
commands of HadoopSqoop are
·
Codegen, Create-hive-table, Eval, Export, Help, Import,
Import-all-tables, List-databases, List-tables,Versions.
·
Useof HadoopSqoop basic commands
·
Codegen- It helps to generate code to interact with database
records.
·
Create-hive-table- It helps to Import a table definition into a
hive
·
Eval- It helps to evaluateSQL statement and display the results
·
Export-It helps to export an HDFS directory into a database
table
·
Help- It helps to list the available commands
·
Import- It helps to import a table from a database to HDFS
·
Import-all-tables- It helps to import tables from a database to
HDFS
·
List-databases- It helps to list available databases on a server
·
List-tables-It helps to list tables in a database
·
Version-It helps to display the version information
16. Is sqoop
same as to distcp in hadoop?
No. Because
the only distcp import command is same as Sqoop import command and both the
commands submit parallel map-only jobs but both command functions are
different. Distcp is used to copy any type of files from Local filesystem to
HDFS and Sqoop is used for transferring the data records between RDBMS and
Hadoop eco- system service.
17. For each
sqoop copying into HDFS how many MapReduce jobs and tasks will be submitted?
There are 4
jobs that will be submitted to each Sqoop copying into HDFS and no reduce tasks
are scheduled.
18. How can
Sqoop be used in Java programs?
In the Java
code Sqoop jar is included in the classpath. The required parameters are
created to Sqoop programmatically like for CLI (command line interface). Sqoop.runTool()
method also invoked in Java code.
19. I am
having around 500 tables in a database. I want to import all the tables from
the database except the tables named Table498, Table 323, and Table199. How can
we do this without having to import the tables one by one?
This can be
proficient using the import-all-tables, import command in Sqoop and by
specifying the exclude-tables option with it as follows-
sqoop import-all-tables
–connect –username –password –exclude-tables Table498, Table 323, Table 199
sqoop import-all-tables
–connect –username –password –exclude-tables Table498, Table 323, Table 199
20. Explain
the significance of using –split-by clause in Apache Sqoop?
split-by is a
clause, it is used to specify the columns of the table which are helping to
generate splits for data imports during importing the data into the Hadoop
cluster. This clause specifies the columns and helps to improve the performance
via greater parallelism. And also it helps to specify the column that has an
even distribution of data to create splits,that data is imported.
Take charge of your career by going through this professionally
designed Apache Hadoop Developer
Course.
Comments
Post a Comment