Thursday, February 10, 2011

SQL *Loader

SQL*Loader is a very flexible utility that allows you to load data from a flat file into one or more database tables. That's the sole reason for SQL*Loader's existence.
The basis for almost everything you do with SQL*Loader is a file known as the control file. The SQL*Loader control file is a text file into which you place a description of the data to be loaded. You also use the control file to tell SQL*Loader which database tables and columns should receive the data that you are loading.
Do not confuse SQL*Loader control files with database control files. Database control files are binary files containing information about the physical structure of your database. They have nothing to do with SQL*Loader. SQL*Loader control files, on the other hand, are text files containing commands that control SQL*Loader's operation.
Once you have a data file to load and a control file describing the data contained in that data file, you are ready to begin the load process. You do this by invoking the SQL*Loader executable and pointing it to the control file that you have written. SQL*Loader reads the control file to get a description of the data to be loaded. Then it reads the input file and loads the input data into the database.
SQL*Loader is a very flexible utility, and this short description doesn't begin to do it justice. The rest of this chapter provides a more detailed description of the SQL*Loader environment and a summary of SQL*Loader's many capabilities.
The SQL*Loader Environment
When we speak of the SQL*Loader environment, we are referring to the database, the SQL*Loader executable, and all the different files that you need to be concerned with when using SQL*Loader.

The SQL*Loader Control File

The SQL*Loader control file is the key to any load process. The control file provides the following information to SQL*Loader:
  • The name and location of the input data file
  • The format of the records in the input data file
  • The name of the table or tables to be loaded
  • The correspondence between the fields in the input record and the columns in the database tables being loaded
  • Selection criteria defining which records from the input file contain data to be inserted into the destination database tables.
  • The names and locations of the bad file and the discard file
Some of the items shown in this list may also be passed to SQL*Loader as command-line parameters. The name and location of the input file, for example, may be passed on the command line instead of in the control file. The same goes for the names and locations of the bad files and the discard files.
It's also possible for the control file to contain the actual data to be loaded. This is sometimes done when small amounts of data need to be distributed to many sites, because it reduces (to just one file) the number of files that need to be passed around. If the data to be loaded is contained in the control file, then there is no need for a separate data file.

The Log File

The log file is a record of SQL*Loader's activities during a load session. It contains information such as the following:
  • The names of the control file, log file, bad file, discard file, and data file
  • The values of several command-line parameters
  • A detailed breakdown of the fields and datatypes in the data file that was loaded
  • Error messages for records that cause errors
  • Messages indicating when records have been discarded
  • A summary of the load that includes the number of logical records read from the data file, the number of rows rejected because of errors, the number of rows discarded because of selection criteria, and the elapsed time of the load
Always review the log file after a load to be sure that no errors occurred, or at least that no unexpected errors occurred. This type of information is written to the log file, but is not displayed on the terminal screen.

The Bad File and the Discard File

Whenever you insert data into a database, you run the risk of that insert failing because of some type of error. Integrity constraint violations undoubtedly represent the most common type of error. However, other problems, such as the lack of free space in a tablespace, can also cause insert operations to fail. Whenever SQL*Loader encounters a database error while trying to load a record, it writes that record to a file known as the bad file.
Discard files, on the other hand, are used to hold records that do not meet selection criteria specified in the SQL*Loader control file. By default, SQL*Loader will attempt to load all the records contained in the input file. You have the option, though, in your control file, of specifying selection criteria that a record must meet before it is loaded. Records that do not meet the specified criteria are not loaded, and are instead written to a file known as the discard file.
Discard files are optional. You will only get a discard file if you've specified a discard file name, and if at least one record is actually discarded during the load. Bad files are not optional. The only way to avoid having a bad file generated is to run a load that results in no errors. If even one error occurs, SQL*Loader will create a bad file and write the offending input record (or records) to that file.
The format of your bad files and discard files will exactly match the format of your input files. That's because SQL*Loader writes the exact records that cause errors, or that are discarded, to those files. If you are running a load with multiple input files, you will get a distinct set of bad files and discard files for each input file


 

No comments:

Post a Comment