Friday, February 19, 2010

Data Warehousing

Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

Ralph Kimball provided a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and analysis.

This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did, rather he focused on the functionality of a data warehouse.

Data Quality

There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996).

A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. These lists commonly include accuracy, correctness, currency, completeness and relevance. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognise this as a similar problem to "ilities".

MIT has a Total Data Quality Management program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ).

In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from datawarehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the US economy of data quality problems at over US$600 billion per annum (Eckerson, 2002). In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organisations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organisations

Problems with data quality don't only arise from incorrect data. Inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the inititatives a company can take to ensure data consistency.

The market is going some way to providing data quality assurance. A number of vendors make tools for analysing and repairing poor quality data in situ, service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. Most data quality tools offer a series of tools for improving data, which may include some or all of the following:

  1. Data profiling - initially assessing the data to understand its quality challenges
  2. Data standardization - a business rules engine that ensures that data conforms to quality rules
  3. Geocoding - for name and address data. Corrects data to US and Worldwide postal standards
  4. Matching or Linking - a way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. It often recognizes that 'Bob' and 'Robert' may be the same individual. It might be able to manage 'householding', or finding links between husband and wife at the same address, for example. Finally, it often can build a 'best of breed' record, taking the best components from multiple data sources and building a single super-record.
  5. Monitoring - keeping track of data quality over time and reporting variations in the quality of data. Software can also auto-correct the variations based on pre-defined business rules.
  6. Batch and Real time - Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.

There are several well-known authors and self-styled experts, with Larry English perhaps the most popular guru. In addition, the International Association for Information and Data Quality (IAIDQ) was established in 2004 to provide a focal point for professionals and researchers in this field.

ISO 8000 is the international standard for data quality.

data integration

Data integration involves combining data residing in different sources and providing users with a unified view of these data. This process becomes significant in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example). Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. In management circles, people frequently refer to data integration as "Enterprise Information Integration"

Monday, February 15, 2010

Informatica Interview Questions

Data Loading While Session Failed
If there are 10000 records and while loading, if the session fails in between, how will you load the remaining data?
Latest Answer: Using performance recovery option in Session property.Recover session from the last commit level.RegardsPuneet ...


Session Performance
How would you improve session performance?
Latest Answer: Session performance can be improved by allocating the cache memory in a way that it can execute all the transformation within that cache size.Mainly only those transformations are considered as a bottleneck for performance which uses CACHE.Say for example:We ...



Informatica Tracking Levels
What are the Tracking levels in Informatica transformations? Which one is efficient and which one faster, and which one is best in Informatica Power Center 8.1/8.5?
Latest Answer: Also for efficient & faster depends on your requirement.If you want minimum information to be written to the session logs you can use the Terse tracing level.FromInformatica Transformation Guide:"To add a slight performance boost, you can ...



Transaction Control
Explain Why it is bad practice to place a Transaction Control transformation upstream from a SQL transformation?
Latest Answer: First Thing as mentioned in the Other Posts it drops all the incoming transaction control boundries.Besides you can use Commit and Rollback statements within SQLTransformation Script to control the transactions, thus avoiding the need for the Transaction ...


Update Strategy Transformation
Why the input pipe lines to the joiner should not contain an update strategy transformation?
Latest Answer: Update Strategy flags each row for either Insert, Update, Delete or Reject. I think when you use it before Joiner, Joiner drops all the flagging details.This is a curious question though, but I can not imagine how would one expect to deal with the scenario ...



Passive Router Transformation
Router is passive transformation, but one may argue that it is passive because in case if we use default group (only) then there is no change in number of rows. What explanation will you give?
Latest Answer: A transformation is called active if it varries the number of rows passing through it, just say total incoming rows=1000 and total outgoing rows <> 1000.in case of routertransformation count of total output rows in all the routing groups is ...


Update Strategy
In which situation do we use update strategy?
Latest Answer: We use update strategy when we need to alter the Database Operation (Insert/Update/Delete) based on the data passing through and some logic. Update Strategy alllows us to Insert (DD_INSERT) , Update (DD_UPDATE) and Delete (DD_Delete) based on logic specified ...


Unconnected Lookup
In which situation do we use unconnected lookup?
Latest Answer: The advantage of using an Unconnected Lookup is that the Lookup will be executed only when a certain condition is met and therefore improve the performance of your mapping. If the majority of your records will meet the condition, then an unconnected ...



Dependency Problems
What are the possible dependency problems while running session?
Latest Answer: Dependency problems means when we run a process, the process output is input to other process. Then first process is stopped then it causes problem or stop running other process. One process is depending on other other. If one process get effected then ...

Error handling Logic
How do you handle error logic in Informatica? What are the transformations that you used while handling errors? How did you reload those error records in target?
Latest Answer: Bad files contains column indicator and row indicator.Row indicator: It generally happens when working with update strategy transformation. The writer/target rejects the rows going to the targetColumnindicator:D -valido - overflown - nullt - truncateWhen ...

Enterprise Data Integration

This article discusses the importance of a data services layer built upon an enterprise data integration (EDI) platform for service-oriented architecture (SOA).

A large company found itself handicapped by an ornery snarl of siloed applications that compromised its agility, performance and profitability. Its IT department was constantly behind schedule and over budget in hand-coding point-to-point connectivity among supply chain, financials, CRM, and other packaged and custom-built legacy applications.

The solution: Integrating critical business processes and applications by adopting a service-oriented architecture, or SOA. Internal IT personnel and consultants engineered a loosely coupled infrastructure, with reusable services based on XML and standard Web services protocols such as SOAP and WSDL.

Informatica Training

BI-Training.com is the premier onsite training firm for the Informatica suite of business intelligence products. We offer Informatica training at your location. In addition we provide consulting services for the design and implementation of the server products along with the design and development of complex reporting and analysis solutions utilizing the entire Informatica family of software products.

Saturday, February 13, 2010

Step 6:Configure domain

After finishing Installation of Informatica client do the fallowing procedure.
Open Informatica Repository Manager.


Select the RS from the RHS pane and click on OK.


Connect to the repository service.



Enter the name of the repositoy username and password.

Creating folder:




Free Blogger Templates by Isnaini Dot Com. Powered by Blogger and Supported by Furnitures