By Kathleen Ting, Jarek Jarcec Cecho

Integrating facts from a number of assets is key within the age of huge information, however it could be a difficult and time-consuming job. this convenient cookbook offers dozens of ready-to-use recipes for utilizing Apache Sqoop, the command-line interface program that optimizes facts transfers among relational databases and Hadoop. Sqoop is either strong and bewildering, yet with this cookbook's problem-solution-discussion layout, you will fast find out how to install after which observe Sqoop on your surroundings. The authors offer MySQL, Oracle, and PostgreSQL database examples on GitHub so that you can simply adapt for SQL Server, Netezza, Teradata, or different relational platforms.

Show description

Read or Download Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database PDF

Similar storage & retrieval books

Internet Resources for Leisure and Tourism

'Internet assets for relaxation and Tourism' is designed to permit scholars, lecturers and practitioners in the relaxation and tourism fields to get the very such a lot out of the realm broad net, aiding them song down and entirely take advantage of the main priceless assets to be had. This ebook comprises tips about how to define and utilise, between different issues: the newest financial records and demographics, information regarding executive enterprises and their courses, the content material of universities' web content, up to date facts on customer arrivals and departures, info on imminent conferences and meetings, and info of contents in periodicals.

Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition

During this absolutely up-to-date moment variation of the hugely acclaimed dealing with Gigabytes, authors Witten, Moffat, and Bell proceed to supply extraordinary insurance of cutting-edge recommendations for compressing and indexing facts. no matter what your box, when you paintings with huge amounts of knowledge, this e-book is key reading--an authoritative theoretical source and a realistic advisor to assembly the hardest garage and entry demanding situations.

The Google Model: Managing Continuous Innovation in a Rapidly Changing World

This e-book indicates how businesses like Google have reinvented the typical perform in administration that allows you to consistently innovate in speedy altering industries. With the ever-increasing speed of swap, reinventing latest administration rules might turn into a need and turn out an important within the long term competitiveness of many businesses.

Image databases : search and retrieval of digital imagery

The explosive progress of multimedia info transmission has generated a severe want for effective, high-capacity photo databases, in addition to strong se's to retrieve photograph information from them. This ebook brings jointly contributions via a global all-star crew of innovators within the box who percentage their insights into all key elements of photo database and seek engine building.

Additional resources for Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database

Example text

Use of this parameter is not limited to overcoming MySQL’s unsigned types problem. It is further applicable to many use cases where Sqoop’s default type mapping is not a good fit for your environment. Sqoop fetches all metadata from database structures without touching the stored data, so any extra knowledge about the data itself must be provided separately if you want to take advantage of it. For example, if you’re using BLOB or BINARY columns for storing textual data to avoid any encoding issues, you can use the --column-map-java parameter to override the default mapping and import your data as String.

4. info | 43 Solution You can take advantage of the update feature that will issue UPDATE instead of INSERT statements. The update mode is activated by using the parameter --update-key that contains the name of a column that can identify a changed row—usually the primary key of a table. com/sqoop \ --username sqoop \ --password sqoop \ --table cities \ --update-key id Discussion The parameter --update-key is used to instruct Sqoop to update existing rows rather than insert new ones. This parameter requires a comma-separated list of columns that should be used to uniquely identify a row.

6. Using Stored Procedures Problem Your database already has a workflow for ingesting new data that heavily uses stored procedures instead of direct INSERT statements. Solution You can switch from INSERT statements to stored procedures very easily. 6. info | 45 by the name of the stored procedure that should be called. com/sqoop \ --username sqoop \ --password sqoop \ --call populate_cities Discussion Using a stored procedure in Sqoop export is very straightforward. Instead of issuing an INSERT statement, Sqoop will call your stored procedure with the value for each column of the input data as a separate parameter.

Download PDF sample

Rated 4.58 of 5 – based on 32 votes