Oracle Data Pump enables very high-speed data and metadata loading and
unloading of Oracle databases. The Data Pump infrastructure is callable via the
DBMS_DATAPUMP PL/SQL package. Thus, custom data movement utilities can be built
by using Data Pump.
Oracle Database provides the following tools:
•
Command-line export and import clients called expdp
and impdp, respectively
•
A Web-based export and import interface that is
accessible from Database Control
Data Pump automatically decides the data access methods to use; these can
be either direct path or external tables. Data Pump uses direct path load and
unload when a table’s structure allows it and when maximum single-stream
performance is desired. However, if there are clustered tables, referential
integrity constraints, encrypted columns, or a number of other items, Data Pump
uses external tables rather than direct path to move the data.
The ability to detach from and reattach to long-running jobs without
affecting the job itself enables you to monitor jobs from multiple locations
while they are running. All stopped Data Pump jobs can be restarted without
loss of data as long as the meta information remains undisturbed. It does not
matter whether the job is stopped voluntarily or involuntarily due to a crash.
Oracle
Data Pump: Benefits
The EXCLUDE,
INCLUDE, and CONTENT parameters are used for fine-grained object and data
selection.
You can
specify the database version for objects to be moved (using the VERSION parameter)
to create a dump file set that is compatible with a previous release of the
Oracle database that supports Data Pump.
You can use
the PARALLEL parameter to specify the maximum number of threads of active
execution servers operating on behalf of the export job.
You can
estimate how much space an export job would consume (without actually
performing the export) by using the ESTIMATE_ONLY parameter.
Network
mode enables you to export from a remote database directly to a dump file set.
This can be done by using a database link to the source system.
During
import, you can change the target data file names, schemas, and tablespaces.
In addition
you can specify a percentage of data to be sampled and unloaded from the source
database when performing a Data Pump export. This can be done by specifying the
SAMPLE parameter.
You can use
the COMPRESSION parameter to indicate whether the metadata should be compressed
in the export dump file so that it consumes less disk space. If you compress
the metadata, it is automatically uncompressed during import.
Data
Pump Enhancements in Oracle Database
In Oracle
Database, new features have been added that enable you to:
•
Compress
both data and metadata, only data, only metadata, or no data during an export
•
Specify
additional encryption options in the following areas:
•
You
can choose to encrypt both data and metadata, only data, only metadata, no
data, or only encrypted columns during an export.
•
You
can specify a specific encryption algorithm to use during an export.
•
You
can specify the type of security to use for performing encryption and
decryption during an export. For example, perhaps the dump file set will be
imported into a different or remote database and it must remain secure in
transit. Or perhaps the dump file set will be imported onsite using the Oracle
Encryption Wallet but it may also need to be imported offsite where the Oracle
Encryption Wallet is not available.
•
Perform
table mode exports and imports using the transportable method; specify how
partitioned tables should be handled during import operations
•
Overwrite
existing dump files during an export operation
•
Rename
tables during an import operation
•
Specify
that a data load should proceed even if nondeferred constraint violations are
encountered (This is valid only for import operations that use the external
tables access method.)
•
Specify
that XMLType columns are to be exported in uncompressed CLOB format regardless
of the XMLType storage format that was defined for them
•
During
an export, specify a remap function that takes as a source the original value
of the designated column and returns a remapped value that will replace the
original value in the dump file
•
Remap
data as it is being imported into a new database
Data Pump
Export and Import
•
Data
Pump Export is a utility for unloading data and metadata into a set of
operating system files called dump file sets. Data Pump Import is used
to load metadata and data stored in an export dump file set into a target
system.
•
The
Data Pump API accesses its files on the server rather than on the client.
•
These
utilities can also be used to export from a remote database directly to a dump file
set, or to load the target database directly from a source database with no
intervening files. This is known as network mode. This mode is
particularly useful to export data from a read-only source database.
•
At
the center of every Data Pump operation is the master table (MT), which is a
table created in the schema of the user running the Data Pump job. The MT
maintains all aspects of the job. The MT is built during a file-based export
job and is written to the dump file set as the last step. Conversely, loading
the MT into the current user’s schema is the first step of a file-based import
operation and is used to sequence the creation of all objects imported.
Note: The MT is the key to Data Pump’s
restart capability in the event of a planned or unplanned stopping of the job.
The MT is dropped when the Data Pump job finishes normally.
Data
Pump Utility: Interfaces and Modes
You can
interact with Data Pump Export and Import by using one of the following
interfaces:
•
Command
line interface:
Enables you to specify most of the export parameters directly on the command
line
•
Parameter
file interface:
Enables you to specify all command line parameters in a parameter file. The
only exception is the PARFILE parameter.
•
Interactive-command
interface: Stops
logging to the terminal and displays the export or import prompts, where you
can enter various commands. This mode is enabled by pressing [Ctrl] + [C]
during an export operation that is started with the command line interface or
the parameter file interface. Interactive-command mode is also enabled when you
attach to an executing or stopped job.
•
Web
interface: On the
Database Control home page, click the Maintenance tab, and then select one of
the following links from the Utilities region: Export to Files, Import from
Files, or Import from Database.
Data Pump
Export and Import provide different modes for unloading and loading different
portions of the database. The mode is specified on the command line by using
the appropriate parameter.
Fine-Grained
Object Selection
The Data
Pump job can include or exclude virtually any type of object.
The EXCLUDE
parameter enables any database object type to be excluded from an export or
import operation. The optional name qualifier enables you to have finer
selectivity within each object type that is specified, as in these examples:
EXCLUDE=VIEW
EXCLUDE=PACKAGE
EXCLUDE=INDEX:"LIKE
'EMP%'"
The INCLUDE
parameter includes only the specified object types and objects in an operation.
Syntax: INCLUDE =
object_type[:"name_expr"]
The CONTENT
parameter enables you to request the current operation, only the metadata, only
the data, or both metadata and data.
Syntax: CONTENT = ALL | METADATA_ONLY | DATA_ONLY
Syntax: CONTENT = ALL | METADATA_ONLY | DATA_ONLY
The QUERY
parameter operates in a similar manner as the original export utility, with two
significant enhancements: It can be qualified with a table name so that it
applies to only that table, and it can be used during import as well. Here is
an example:
QUERY=hr.employees:"WHERE department_id in (10,20)
and salary < 1600 ORDER BY department_id"
Advanced
Feature: Sampling
With the SAMPLE
parameter, you can specify a percentage of data to be sampled and unloaded from
the source database when performing a Data Pump export.
Syntax: SAMPLE=[[schema_name.]table_name:]sample_percent
The sample
percentage indicates the likelihood that a block of rows will be included. The
range of values for sample_percent is .000001 to (but not including) 100.
Note: The SAMPLE parameter is not valid
for network exports.
Export
Options: Files
Three types
of files are managed by Data Pump jobs:
•
Dump
files for data and metadata that is to be moved
•
Log
files for messages
•
SQL
files for the output of a SQLFILE operation
Because
Data Pump is server based rather than client based, Data Pump files are accessed
relative to Oracle directory paths. Absolute paths are not supported for
security reasons.
Data
Pump File Locations
•
Per-file
directory objects may be specified for each dump file, log file, and SQL file.
If specified, they are separated from the file name by a colon (:).
•
The
Data Pump Export and Import clients provide a DIRECTORY parameter, which specifies
the name of a directory object. These directory objects describe the location
in which the files are accessed.
•
You
can alternatively define an environment variable, DATA_PUMP_DIR, to specify the
directory object name rather than use the DIRECTORY parameter. The Data Pump
clients look for this environment variable if no explicit directory object is
specified.
•
A
default directory object is created for every database. This directory object
is named DATA_PUMP_DIR. Access to the DATA_PUMP_DIR directory is granted
automatically to the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles.
•
You
do not need to create a directory object manually before using Data Pump
Export.
A default directory object named DATA_PUMP_DIR is created for every database, whether newly created or upgraded by a script on UNIX or Windows platforms. Access to the DATA_PUMP_DIR directory is granted automatically to the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles. The DATA_PUMP_DIR directory is created in one of the following locations:
A default directory object named DATA_PUMP_DIR is created for every database, whether newly created or upgraded by a script on UNIX or Windows platforms. Access to the DATA_PUMP_DIR directory is granted automatically to the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles. The DATA_PUMP_DIR directory is created in one of the following locations:
•
<ORACLE_BASE>/admin/DB_UNIQUE_NAME
•
<ORACLE_HOME>/admin/DB_UNIQUE_NAME
The exact
directory path specification for DATA_PUMP_DIR varies depending on the value of
the ORACLE_BASE and ORACLE_HOME system environment variables and on the
existence of the DATA_PUMP_DIR subdirectory. If ORACLE_BASE is defined on the
target system, that value is used. Otherwise, the value of ORACLE_HOME is used.
If the DATA_PUMP_DIR subdirectory is for some reason not found, the following
default path is used:
ORACLE_HOME/rdbms/log
Note: In all cases, you must have the
appropriate access privileges to the directory object for the attempted
operation. For export, you need write access for all files; for import, you
need read access for dump files and write access for log files and SQL files.
Data
Pump File Naming and Size
The DUMPFILE
parameter specifies the names and (optionally) directories of disk-based dump
files. Multiple file specifications may be provided as a comma-separated list
or in separate DUMPFILE parameter specifications. File names may contain the
substitution variable %U, which implies that multiple files may be generated. %U
is expanded in the resulting file names into a two-character, fixed-width,
monotonically increasing integer starting at 01. If no DUMPFILE is specified, expdat.dmp
is used by default. Created dump files are autoextensible by default.
If FILESIZE
is specified, each file is FILESIZE bytes in size and nonextensible. If more dump
space is required and a template with %U has been supplied, a new file is
automatically created with FILESIZE bytes; otherwise, the client receives a
message to add a new file.
If a
template with %U is specified, the number of files that are initially created
is equal to the PARALLEL parameter.
Preexisting
files that match the resulting file names are not overwritten. Instead, they
result in an error and cause the job to be aborted.
Note: If multiple dump file templates are
provided, they are used to generate dump files in a circular fashion.
Data
Pump Import
Data Pump
Import is a utility for loading an export dump file set into a target system.
The dump file set comprises one or more disk files that contain table data,
database object metadata, and control information. The files are written in a
proprietary binary format. During an import operation, Data Pump Import uses
these files to locate each database object in the dump file set.
You can
interact with Data Pump Import by using a command line, a parameter file, or an
interactive-command mode:
•
You
can use the impdp command and specify parameters directly on the command line.
•
You
can enter command line parameters in a file (the PARFILE parameter is excluded
because parameter files cannot be nested).
•
In
interactive-command mode, the current job continues running, but logging to the
terminal is stopped and the Import prompt is displayed. You can, for example,
attach additional jobs to an executing or stopped job.
Data
Pump Import: Transformations
Because
object metadata is stored as XML in the dump file set, it is easy to apply
transformations when DDL is being formed during import. Data Pump Import
supports several transformations:
•
REMAP_DATAFILE
is useful when moving databases across platforms that have different
file-system semantics.
•
REMAP_TABLESPACE
enables objects to be moved from one tablespace to another.
•
REMAP_SCHEMA
provides the old FROMUSER /TOUSER capability to change object ownership.
The TRANSFORM
parameter enables you to alter the object-creation DDL for specific objects or
for all applicable objects being loaded. Note the following possible options:
•
SEGMENT_ATTRIBUTES: If the value is specified as Y,
segment attributes (physical attributes, storage attributes, tablespaces, and
logging) are included.
•
STORAGE: If the value is specified as Y, the
STORAGE clauses are included.
•
OID: Determines whether the object ID
(OID) of abstract data types is reused or created as new. If the value is
specified as N, the generation of the export OID clause for object types is
suppressed. This is useful when you need to duplicate schemas across databases
by using export and import, but you cannot guarantee that the object types will
have identical OID values in those databases.
•
PCTSPACE: Reduces the amount of space that is
required for tablespaces by performing a shrink operation on tablespace storage
allocation. The value supplied for this transformation must be a number greater
than zero. It represents the percentage multiplier that is used to alter extent
allocations and the size of data files.
Data
Pump: Performance Considerations
•
You
can improve job throughput with the PARALLEL parameter. The parallelism setting
is enforced by the master process, which allocates work to be executed to
worker processes that perform the data and metadata processing in an operation.
These worker processes operate in parallel. In general, the degree of
parallelism should be set to more than twice the number of CPUs on an instance.
To maximize parallelism, you must supply at least one file for each degree of
parallelism. If there are not enough dump files, the performance will not be
optimal because multiple threads of execution will try to access the same dump
file. The degree of parallelism can be reset at any time during a job.
•
The
example shows a full database export. All data and metadata in the database
will be exported. Dump files (full101.dmp, full201.dmp, full102.dmp, and so on)
will be created in a round-robin fashion in the directories pointed to by the dp_dir1
and dp_dir2 directory objects. For best performance, these should be on
separate I/O channels. Each file will be up to 2 GB in size (as necessary). Up
to three files will be created initially, and more files will be created if
needed. The job and master table have the same name: expfull. The log file will
be written to expfull.log in the dp_dir1 directory.
Performance
Initialization Parameters
•
You
can try using the parameters listed to improve performance, although the
results may not be the same on all platforms.
•
Additionally,
the SHARED_POOL_SIZE and UNDO_TABLESPACE initialization parameters should be
generously sized. The exact values will depend upon the size of your database.
Turning off DB_BLOCK_CHECKING and DB_BLOCK_CHECKSUM only to improve Data Pump
performance is not recommended because this will affect the detection of block
corruption.
Data
Pump Direct Path: Considerations
Data Pump
automatically selects the appropriate access method for each table.
Direct
path: Data Pump
uses direct path load and unload when a table’s structure allows it and when
maximum single-stream performance is desired.
External
tables: Data Pump
uses external tables for any of the following conditions:
•
Tables
with fine-grained access control enabled in insert and select modes
•
Domain
index for a LOB column
•
Tables
with active triggers defined
•
Global
index on partitioned tables with a single-partition load
•
BFILE
or opaque type columns
•
Referential
integrity constraint
•
VARRAY
columns with an embedded opaque type
Note: Because both methods support the
same external data representation, data that is unloaded with one method can be
loaded using the other method.
Using
Enterprise Manager to Monitor Data Pump Jobs
You can use
the Enterprise Manager graphical user interface (GUI) to monitor all Data Pump
jobs, including those created by using the expdp or impdp command line
interfaces or by using the DBMS_DATAPUMP package.
You can
view the current status of the job and change the status to EXECUTE, STOP, or SUSPEND.
To access
the “Export and Import Jobs” page, click the “Monitor Export and Import Jobs”
link in the Move Row Data region on the Maintenance page.