Oracle Data Pump


Oracle Data Pump enables very high-speed data and metadata loading and unloading of Oracle databases. The Data Pump infrastructure is callable via the DBMS_DATAPUMP PL/SQL package. Thus, custom data movement utilities can be built by using Data Pump.


Oracle Database provides the following tools:
          Command-line export and import clients called expdp and impdp, respectively
          A Web-based export and import interface that is accessible from Database Control

Data Pump automatically decides the data access methods to use; these can be either direct path or external tables. Data Pump uses direct path load and unload when a table’s structure allows it and when maximum single-stream performance is desired. However, if there are clustered tables, referential integrity constraints, encrypted columns, or a number of other items, Data Pump uses external tables rather than direct path to move the data.

The ability to detach from and reattach to long-running jobs without affecting the job itself enables you to monitor jobs from multiple locations while they are running. All stopped Data Pump jobs can be restarted without loss of data as long as the meta information remains undisturbed. It does not matter whether the job is stopped voluntarily or involuntarily due to a crash.

Oracle Data Pump: Benefits
The EXCLUDE, INCLUDE, and CONTENT parameters are used for fine-grained object and data selection.
You can specify the database version for objects to be moved (using the VERSION parameter) to create a dump file set that is compatible with a previous release of the Oracle database that supports Data Pump.
You can use the PARALLEL parameter to specify the maximum number of threads of active execution servers operating on behalf of the export job.
You can estimate how much space an export job would consume (without actually performing the export) by using the ESTIMATE_ONLY parameter.
Network mode enables you to export from a remote database directly to a dump file set. This can be done by using a database link to the source system.

During import, you can change the target data file names, schemas, and tablespaces.

In addition you can specify a percentage of data to be sampled and unloaded from the source database when performing a Data Pump export. This can be done by specifying the SAMPLE parameter.

You can use the COMPRESSION parameter to indicate whether the metadata should be compressed in the export dump file so that it consumes less disk space. If you compress the metadata, it is automatically uncompressed during import.

Data Pump Enhancements in Oracle Database
In Oracle Database, new features have been added that enable you to:
          Compress both data and metadata, only data, only metadata, or no data during an export
          Specify additional encryption options in the following areas:
          You can choose to encrypt both data and metadata, only data, only metadata, no data, or only encrypted columns during an export.
          You can specify a specific encryption algorithm to use during an export.
          You can specify the type of security to use for performing encryption and decryption during an export. For example, perhaps the dump file set will be imported into a different or remote database and it must remain secure in transit. Or perhaps the dump file set will be imported onsite using the Oracle Encryption Wallet but it may also need to be imported offsite where the Oracle Encryption Wallet is not available.
          Perform table mode exports and imports using the transportable method; specify how partitioned tables should be handled during import operations
          Overwrite existing dump files during an export operation
          Rename tables during an import operation
          Specify that a data load should proceed even if nondeferred constraint violations are encountered (This is valid only for import operations that use the external tables access method.)
          Specify that XMLType columns are to be exported in uncompressed CLOB format regardless of the XMLType storage format that was defined for them
          During an export, specify a remap function that takes as a source the original value of the designated column and returns a remapped value that will replace the original value in the dump file
          Remap data as it is being imported into a new database

Data Pump Export and Import
          Data Pump Export is a utility for unloading data and metadata into a set of operating system files called dump file sets. Data Pump Import is used to load metadata and data stored in an export dump file set into a target system.
          The Data Pump API accesses its files on the server rather than on the client.
          These utilities can also be used to export from a remote database directly to a dump file set, or to load the target database directly from a source database with no intervening files. This is known as network mode. This mode is particularly useful to export data from a read-only source database.
          At the center of every Data Pump operation is the master table (MT), which is a table created in the schema of the user running the Data Pump job. The MT maintains all aspects of the job. The MT is built during a file-based export job and is written to the dump file set as the last step. Conversely, loading the MT into the current user’s schema is the first step of a file-based import operation and is used to sequence the creation of all objects imported.

Note: The MT is the key to Data Pump’s restart capability in the event of a planned or unplanned stopping of the job. The MT is dropped when the Data Pump job finishes normally.

Data Pump Utility: Interfaces and Modes
You can interact with Data Pump Export and Import by using one of the following interfaces:

          Command line interface: Enables you to specify most of the export parameters directly on the command line
          Parameter file interface: Enables you to specify all command line parameters in a parameter file. The only exception is the PARFILE parameter.
          Interactive-command interface: Stops logging to the terminal and displays the export or import prompts, where you can enter various commands. This mode is enabled by pressing [Ctrl] + [C] during an export operation that is started with the command line interface or the parameter file interface. Interactive-command mode is also enabled when you attach to an executing or stopped job.
          Web interface: On the Database Control home page, click the Maintenance tab, and then select one of the following links from the Utilities region: Export to Files, Import from Files, or Import from Database.

Data Pump Export and Import provide different modes for unloading and loading different portions of the database. The mode is specified on the command line by using the appropriate parameter.

Fine-Grained Object Selection
The Data Pump job can include or exclude virtually any type of object.

The EXCLUDE parameter enables any database object type to be excluded from an export or import operation. The optional name qualifier enables you to have finer selectivity within each object type that is specified, as in these examples:
EXCLUDE=VIEW
EXCLUDE=PACKAGE
EXCLUDE=INDEX:"LIKE 'EMP%'"

The INCLUDE parameter includes only the specified object types and objects in an operation.
Syntax: INCLUDE  = object_type[:"name_expr"]
The CONTENT parameter enables you to request the current operation, only the metadata, only the data, or both metadata and data.
Syntax:
CONTENT = ALL | METADATA_ONLY | DATA_ONLY
The QUERY parameter operates in a similar manner as the original export utility, with two significant enhancements: It can be qualified with a table name so that it applies to only that table, and it can be used during import as well. Here is an example:
QUERY=hr.employees:"WHERE department_id in (10,20) and salary < 1600 ORDER BY department_id"

Advanced Feature: Sampling
With the SAMPLE parameter, you can specify a percentage of data to be sampled and unloaded from the source database when performing a Data Pump export.

Syntax: SAMPLE=[[schema_name.]table_name:]sample_percent 
The sample percentage indicates the likelihood that a block of rows will be included. The range of values for sample_percent is .000001 to (but not including) 100.

Note: The SAMPLE parameter is not valid for network exports.

Export Options: Files
Three types of files are managed by Data Pump jobs:
          Dump files for data and metadata that is to be moved
          Log files for messages
          SQL files for the output of a SQLFILE operation

Because Data Pump is server based rather than client based, Data Pump files are accessed relative to Oracle directory paths. Absolute paths are not supported for security reasons.

Data Pump File Locations
          Per-file directory objects may be specified for each dump file, log file, and SQL file. If specified, they are separated from the file name by a colon (:).
          The Data Pump Export and Import clients provide a DIRECTORY parameter, which specifies the name of a directory object. These directory objects describe the location in which the files are accessed.
          You can alternatively define an environment variable, DATA_PUMP_DIR, to specify the directory object name rather than use the DIRECTORY parameter. The Data Pump clients look for this environment variable if no explicit directory object is specified.
          A default directory object is created for every database. This directory object is named DATA_PUMP_DIR. Access to the DATA_PUMP_DIR directory is granted automatically to the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles.
          You do not need to create a directory object manually before using Data Pump Export.
A default directory object named DATA_PUMP_DIR is created for every database, whether newly created or upgraded by a script on UNIX or Windows platforms. Access to the DATA_PUMP_DIR directory is granted automatically to the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles. The DATA_PUMP_DIR directory is created in one of the following locations:
          <ORACLE_BASE>/admin/DB_UNIQUE_NAME
          <ORACLE_HOME>/admin/DB_UNIQUE_NAME

The exact directory path specification for DATA_PUMP_DIR varies depending on the value of the ORACLE_BASE and ORACLE_HOME system environment variables and on the existence of the DATA_PUMP_DIR subdirectory. If ORACLE_BASE is defined on the target system, that value is used. Otherwise, the value of ORACLE_HOME is used. If the DATA_PUMP_DIR subdirectory is for some reason not found, the following default path is used:
ORACLE_HOME/rdbms/log

Note: In all cases, you must have the appropriate access privileges to the directory object for the attempted operation. For export, you need write access for all files; for import, you need read access for dump files and write access for log files and SQL files.

Data Pump File Naming and Size
The DUMPFILE parameter specifies the names and (optionally) directories of disk-based dump files. Multiple file specifications may be provided as a comma-separated list or in separate DUMPFILE parameter specifications. File names may contain the substitution variable %U, which implies that multiple files may be generated. %U is expanded in the resulting file names into a two-character, fixed-width, monotonically increasing integer starting at 01. If no DUMPFILE is specified, expdat.dmp is used by default. Created dump files are autoextensible by default.

If FILESIZE is specified, each file is FILESIZE bytes in size and nonextensible. If more dump space is required and a template with %U has been supplied, a new file is automatically created with FILESIZE bytes; otherwise, the client receives a message to add a new file.
If a template with %U is specified, the number of files that are initially created is equal to the PARALLEL parameter.
Preexisting files that match the resulting file names are not overwritten. Instead, they result in an error and cause the job to be aborted.

Note: If multiple dump file templates are provided, they are used to generate dump files in a circular fashion.

Data Pump Import
Data Pump Import is a utility for loading an export dump file set into a target system. The dump file set comprises one or more disk files that contain table data, database object metadata, and control information. The files are written in a proprietary binary format. During an import operation, Data Pump Import uses these files to locate each database object in the dump file set.
You can interact with Data Pump Import by using a command line, a parameter file, or an interactive-command mode:
          You can use the impdp command and specify parameters directly on the command line.
          You can enter command line parameters in a file (the PARFILE parameter is excluded because parameter files cannot be nested).
          In interactive-command mode, the current job continues running, but logging to the terminal is stopped and the Import prompt is displayed. You can, for example, attach additional jobs to an executing or stopped job.

Data Pump Import: Transformations
Because object metadata is stored as XML in the dump file set, it is easy to apply transformations when DDL is being formed during import. Data Pump Import supports several transformations:

          REMAP_DATAFILE is useful when moving databases across platforms that have different file-system semantics.
          REMAP_TABLESPACE enables objects to be moved from one tablespace to another.
          REMAP_SCHEMA provides the old FROMUSER /TOUSER capability to change object ownership.

The TRANSFORM parameter enables you to alter the object-creation DDL for specific objects or for all applicable objects being loaded. Note the following possible options:
          SEGMENT_ATTRIBUTES: If the value is specified as Y, segment attributes (physical attributes, storage attributes, tablespaces, and logging) are included.
          STORAGE: If the value is specified as Y, the STORAGE clauses are included.
          OID: Determines whether the object ID (OID) of abstract data types is reused or created as new. If the value is specified as N, the generation of the export OID clause for object types is suppressed. This is useful when you need to duplicate schemas across databases by using export and import, but you cannot guarantee that the object types will have identical OID values in those databases.
          PCTSPACE: Reduces the amount of space that is required for tablespaces by performing a shrink operation on tablespace storage allocation. The value supplied for this transformation must be a number greater than zero. It represents the percentage multiplier that is used to alter extent allocations and the size of data files.

Data Pump: Performance Considerations
          You can improve job throughput with the PARALLEL parameter. The parallelism setting is enforced by the master process, which allocates work to be executed to worker processes that perform the data and metadata processing in an operation. These worker processes operate in parallel. In general, the degree of parallelism should be set to more than twice the number of CPUs on an instance. To maximize parallelism, you must supply at least one file for each degree of parallelism. If there are not enough dump files, the performance will not be optimal because multiple threads of execution will try to access the same dump file. The degree of parallelism can be reset at any time during a job.
          The example shows a full database export. All data and metadata in the database will be exported. Dump files (full101.dmp, full201.dmp, full102.dmp, and so on) will be created in a round-robin fashion in the directories pointed to by the dp_dir1 and dp_dir2 directory objects. For best performance, these should be on separate I/O channels. Each file will be up to 2 GB in size (as necessary). Up to three files will be created initially, and more files will be created if needed. The job and master table have the same name: expfull. The log file will be written to expfull.log in the dp_dir1 directory.

Performance Initialization Parameters
          You can try using the parameters listed to improve performance, although the results may not be the same on all platforms.
          Additionally, the SHARED_POOL_SIZE and UNDO_TABLESPACE initialization parameters should be generously sized. The exact values will depend upon the size of your database. Turning off DB_BLOCK_CHECKING and DB_BLOCK_CHECKSUM only to improve Data Pump performance is not recommended because this will affect the detection of block corruption.

Data Pump Direct Path: Considerations
Data Pump automatically selects the appropriate access method for each table.

Direct path: Data Pump uses direct path load and unload when a table’s structure allows it and when maximum single-stream performance is desired.

External tables: Data Pump uses external tables for any of the following conditions:

          Tables with fine-grained access control enabled in insert and select modes
          Domain index for a LOB column
          Tables with active triggers defined
          Global index on partitioned tables with a single-partition load
          BFILE or opaque type columns
          Referential integrity constraint
          VARRAY columns with an embedded opaque type

Note: Because both methods support the same external data representation, data that is unloaded with one method can be loaded using the other method.

Using Enterprise Manager to Monitor Data Pump Jobs
You can use the Enterprise Manager graphical user interface (GUI) to monitor all Data Pump jobs, including those created by using the expdp or impdp command line interfaces or by using the DBMS_DATAPUMP package.
You can view the current status of the job and change the status to EXECUTE, STOP, or SUSPEND.
To access the “Export and Import Jobs” page, click the “Monitor Export and Import Jobs” link in the Move Row Data region on the Maintenance page.