Sqoop:
Sqoop is an open source framework provided by Apache. It is a command-line interface application for efficiently transferring bulk data between Apache Hadoop and external datastores such as relational databases, enterprise data warehouses. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
Sqoop Import:
The import tool is used to imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS.
Sqoop Export:
The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table.
Why Sqoop is used?
For Hadoop developers, the work starts after data is loaded into HDFS. For this, the data residing in the relational database management systems need to be transferred to HDFS and might need to transfer back to relational database management systems. So developers can always write custom scripts to transfer data in and out of Hadoop, but Apache Sqoop provides an alternative.
Sqoop uses MapReduce framework to import and export the data, which provides parallel mechanism as well as fault tolerance. Sqoop makes developers life easy by providing command line interface. Developers just need to provide basic information like source, destination and database authentication details in the sqoop command.
Sqoop Connectors:
All the existing Database Management Systems are designed with SQL standard in mind. However, each DBMS differs with respect to dialect to some extent. So, this difference poses challenges when it comes to data transfers across the systems. Sqoop Connectors are components which help overcome these challenges. Data transfer between Sqoop and external storage system is made possible with the help of Sqoop's connectors. Sqoop has connectors for working with a range of popular relational databases, including MySQL, PostgreSQL, Oracle, SQL Server, and DB2. Each of these connectors knows how to interact with its associated DBMS. There is also a generic JDBC connector for connecting to any database that supports Java's JDBC protocol. In addition, Sqoop provides optimized MySQL and PostgreSQL connectors that use database-specific APIs to perform bulk transfers efficiently.
No comments:
Post a Comment