SQL Server 2005 XML Integration Services

Sorry guys for catching only after a week of  time,I was busy quite this whole week but this week I have some hot stuff on XML integration with SQL Server 2005 which I was doing some R&D last week .

The importance of XML-based technologies is steadily increasing and this trend is reflected by a variety of new features in this category introduced in SQL Server 2005 Integration Services.

Due to importance and omnipresent nature of XML, its support has been improved in SQL Server 2005 Integration Services (comparing with Data Transformation Services in earlier versions of SQL Server). Even though implementation of the XML Data Flow Destination is painfully absent (you have to emulate it through scripting), you can take advantage of built-in the XML Data Flow source, as well as versatile XML Control Flow task. To get familiar with the latter, launch Business Intelligence Development Studio and initiate a new Integration Services project. Drag XML Task from the Toolbox and drop it on the Control Flow tab of the Designer interface. Launch its Editor window by selecting the Edit… item from its context sensitive menu.

The General area is divided into several sections, although their exact content depends largely on the value assigned to the OperationType property. Choosing one of its six possible options (which you can view in the drop-down list in the Input section and which we will be describing in more detail next) affects the range of the remaining properties that need to be configured in order to deliver desired functionality. All of the operations require two operands (where the first one is referred to as Source and contains an original XML file against which operations are carried out) with independently configurable types (in the majority of cases, both of them can be either entered directly in the XML Task Editor interface or stored in a file or a variable).

Introduction to System.Xml, SQLXML, and the XML Data Type

XML support provided in Microsoft SQL Server 2000 and an overview of the three options provided in the Microsoft Visual Studio 2005/SQL Server 2005 environment for manipulating XML and relational data. The options are: 1) the classes in the System.Xml namespace, 2) the SQLXML classes, and 3) the XML data type provided in SQL Server 2005.

XML support was added to Microsoft SQL Server 2000 to provide the user with features such as:

  • Exposing relational data as XML
  • Shredding XML documents into row sets
  • Creating XML views by mapping XML schemas to database schemas using XML-Data Reduced (XDR) schemas
  • Creating queries on XML views using XPath
  • Exposing data in SQL Server through HTTP

This support was further enhanced in subsequent SQLXML Web releases. Enhancements include:

  • Updategrams and XML Bulkload to persist changes made to XML views
  • Support for annotated XML Schema definition language (XSD) to describe mappings (support for XDR still exists but its use is not recommended)
  • Client-side FOR XML
  • SQLXML Managed Classes
  • Support for Web services

SQLXML is a set of libraries and technologies that enables the seamless integration of relational data in SQL Server databases with XML. It is a middle-tier component and does not include the server-side XML support provided by FOR XML and OPENXML. SQLXML provides a schema-driven mapping approach to generating XML from relational source data and loading XML that represents relational information back into relational tables. SQLXML classes provide XML support to SQL Server 2000 databases and above.

Microsoft SQL Server 2005 adds built-in support for XML in the form of the XML data type. XML data can be stored natively in XML data type columns. Additionally, an XML data type column can be further constrained by associating an XML schema collection with this column. XML values stored in XML data type columns can be manipulated with the help of XQuery and an XML Data Modification Language (DML). Indexes can be built on XML data to improve query performance. Furthermore, FOR XML and OPENXML have been enhanced to support the new XML data type.

New features introduced in SQL Server 2005 to store and process XML data coupled with XML features offered in earlier versions of SQL Server provide developers with several ways in which XML data can be stored and processed in XML applications. As there are multiple ways in which XML applications can be built using the alternative approaches provided by SQL Server 2005, it is important to understand the scenarios, trade-offs, and synergies of the different technologies in order to make the right choice. This article provides guidance in selecting the appropriate alternative for developing XML applications with SQL Server 2005.

XML Usage Scenarios

The areas in which XML is used can be broadly classified as follows:

  • XML for Business Integration: Business integration, also known as enterprise application integration (EAI), includes A2A (application-to-application), B2B (business-to-business), and B2C (business-to-consumer) applications. Applications working on disparate systems communicate with each other using XML-based messages.
  • XML for Content Management: Content management systems based on XML enable users to store, retrieve, modify, and query XML documents. These systems store XML documents in their native format.

Next a few scenarios that fall under the previously mentioned categories are discussed. The solutions for these scenarios are presented in subsequent sections with detailed treatments of the different XML options available in the SQL Server 2005/Visual Studio 2005 environment.

Scenario 1: Insurance Claims

An auto insurance company providing services on the Internet allows its insurance buyers or agents to enter insurance claims through the company’s Web site. These claims will be processed by the centralized system located at the corporate head quarters. Once the processing is complete, the system is required to store specific information related to the claim in a specified XML format. Exact copies of these XML documents must be maintained in the system for legal purposes. This scenario shows the use of XML for content management.

Scenario 2: Data Exchange Between Automobile Manufacturer and Parts Suppliers I

An automobile manufacturer interacts with multiple parts suppliers to procure the parts required for the company. Currently the manufacturer receives invoices from suppliers. The data corresponding to these invoices is then manually fed to a legacy invoice processing system. The invoice processing system stores the data in relational format. The manufacturer now wants to automate the invoice data feeding process to the legacy invoice processing system. This scenario is an example of the use of XML for business integration.

Scenario 3: Data Exchange Between Automobile Manufacturer and Parts Suppliers II

This scenario involves an automobile manufacturer that interacts with multiple parts suppliers as mentioned in the previous scenario. The manufacturer’s present system does not provide the facility for the suppliers to check the status of an invoice or to get a copy of payment instructions from the manufacturer. Currently this information is available to the suppliers only over the phone. The automobile manufacturer needs to be able to expose this information over the Web so that the suppliers can perform these tasks automatically. This scenario demonstrates the use of XML for business integration.

Scenario 4: Content Management System

A company provides information in the fields of medicine, law, and technology to its customers through various channels including Web, books, and CD-ROM. The company wants to build a content management system to help it deliver high quality content to its customers in less time. This scenario illustrates the use of XML for content management.

Scenario 5: Customer Survey

A company provides air ticket booking services on the Internet and conducts surveys for each season to identify the most sought after destination by its customers for the current season. The questionnaire used for each season is different and the questionnaire may change in the future. The company will analyze the information, and the analysis results are used to design packaged travel deals that will satisfy the needs of the maximum number of customers. This scenario can be classified as the use of XML for content management.

XML Data Type in SQL Server 2005

The hierarchical nature of XML data makes it hard to model it as relational data as the structure of the data becomes complex (e.g., depth of hierarchy increases). Moreover, when XML data is mapped to relational data, the order of elements in XML instances is also not preserved and there is a significant cost involved in composing the original XML document from decomposed relational data. The limitations of the relational model to store XML data make it ideal to store XML instances natively. Native XML instances do not suffer from the limitations of relational models and offer features such as the ability to handle hierarchical or nested data, the ability to preserve the order of elements, a straightforward way to store and retrieve XML data, the flexibility to support multiple schemas, and so on.

Microsoft SQL Server 2005 provides extensive support for XML data processing. With SQL Server 2005, XML values can be stored natively in an XML data type column, which can be typed according to a collection of XML schemas, or can be left untyped. Fine-grained data manipulation is supported using XQuery and XML DML, the latter being an extension for data modification. Furthermore, the XML column can be indexed for improved query performance.

Typed XML

Typed XML is ideal for situations where you have XML schemas describing your XML data. In such cases, you can associate a collection of XML schemas with the XML column to yield typed XML. Validation on XML type columns is performed based on the XML schema collection associated with the column. Additionally, the performance of queries that involve typed XML data is better compared to untyped XML data, as it requires no runtime conversion of node values.

Untyped XML

Using untyped XML is suitable if you have schemas, but you do not want the server to validate the data, or if no schema is available. You may want to store untyped XML even when a schema is present if an application:

  • Has no fixed schema
  • Performs the validation on the client side before storing the data at the server
  • Temporarily stores XML data that is invalid according to the schema
  • Uses schema components that are unsupported at the server (e.g., key/keyref)

Untyped XML documents are checked to ensure that they are well formed, even though they are not associated with any schema. It is important to note that untyped XML imposes a performance penalty due to the runtime conversion of node values, as the node values are stored internally as Unicode strings.

Usage Scenarios of XML Data Type

With the new XML data type in SQL Server 2005, it is now possible to:

  • Create a table with one more columns of type XML in addition to the relational columns.
  • Create a typed XML column type by associating it with an XML schema collection.
  • Create a constraint on an XML column involving other XML or non-XML type columns to enforce business rules.
  • Create a variable of type XML that can be used to store an instance of an XML data type.
  • Create a parameter of type XML to a stored procedure or user-defined function.
  • Return an XML type value from a user-defined function.
  • Assign the result of a FOR XML query obtained using the new TYPE directive to a variable of type XML.
  • Run a subset of XQuery to query into the XML structure and transform XML data.
  • Create a computed column based on a column of type XML.
  • Create an XML index on a column of type XML to improve the performance of queries.
  • Perform element-level insert, delete, and update operations on XML instances using XML DML.
  • Pass an instance of XML type data to sp_xml_preparedocument to prepare an in-memory representation of the XML document.
  • Write cross-domain queries that will include both relational and XML columns using XQuery and XML DML.
  • Cast or convert the XML type to varchar or nvarchar types using CAST and CONVERT, respectively.
  • Convert or cast string data types such as [n]varchar, [n]text, varbinary, and image into XML type using CAST or CONVERT.

XML Data Type Methods and XML DML

Queries and manipulation on XML data type columns are supported through five methods. Fragments of an XML document can be extracted using the query() method of XML data type. The query() method accepts an XQuery expression as an argument and returns an untyped XML instance. Scalar values can be extracted from an XML instance using the value() method by specifying an XQuery expression and the desired SQL type to be returned. Existential checks on an XML instance can be performed using the exist() method. Decomposing an XML document into relational data is facilitated with the help of the nodes() method.

Data manipulation operations can be performed on an XML instance using the modify() method. Support for XML DML is provided through insert, delete, and update keywords added to XQuery. One or more nodes can be inserted, deleted, and updated using the insert, delete, and update keywords, respectively.

XML Indexing

The parsing and shredding involved in query processing operations on XML data type columns can consume a prohibitively long time if the size of the XML instance is very big. Query performance on the XML data type can be improved by creating indexes on these columns. The size of the XML data and usage scenario play an important role in dictating the type of index that is required. SQL Server supports two types of indexes—primary XML indexes and secondary XML indexes; the latter cannot exist without the former.

The creation of a primary XML index on an XML column shreds the XML BLOB and stores these values in an internal table. This results in improved performance during query execution time as there is no shredding involved at runtime. Depending on the usage scenario, query performance can be further improved by creating secondary XML indexes. Three types of secondary XML indexes—PATH, PROPERTY, and VALUE—can be created to improve the performance of queries based on path, property, and value respectively. More information on selecting an appropriate secondary index for an XML type column is available in the Performance Optimizations for the XML Data Type white paper.

Storing XML documents as an XML data type is a suitable choice if:

  • Your application has a requirement to preserve the Infoset content of XML instances. The Infoset content of an XML document includes document hierarchy, order of elements, values of elements and attributes, and so on. Information such as order of attributes, prefixes of namespaces, insignificant white spaces, and XML declaration is not preserved.
  • Your application requires element-level modify and query operations on XML documents.
  • Your application needs indexes on XML data type columns to speed up query processing.
  • Your XML data may or may not have a schema.
  • Your application uses XML documents with a wide range of structures, or XML documents conforming to different or complex schemas that are too hard to map to relational structures.
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s