Microsoft Business Intelligence Stuff: May 2014

Friday, May 30, 2014

SSAS cube dimension processing failure.

I was just developing a SSAS cube with relatively no problems. Once I began to process the dimensions , the Cube processing failed and returned the following errors...

OLE DB error: OLE DB or ODBC error: The SELECT permission was denied on the object 'dimTask', database 'Project', schema 'dbo'.; 42000.

Impersonation Information Tab

The first thing I checked was the Impersonation Information tab on the Data Source.

Secondly, I then looked in SSMS and needed to verify that this account had the appropriate SQL Security role to process data from the Project DB (or whatever database). To resolve this problem, I opened up SQL Server Management Studio (SSMS), Under Security, Select the specified account (SSASProc in this scenario), Select the Server Roles in the Login Properties page. I then gave this account sysadmin server role.

SQL Server Management Studio

This solved my SSAS Cube processing problem. I hope this works for you.

Wednesday, May 28, 2014

Simple Performance Monitoring for SSIS

I come from an operating system support and monitoring background so I’ve always been aware of the manageability tools available in Windows operating systems and the tools available on the market. I also have learned that the native tools in windows have improved so much over the last decade that anyone can build a relatively simple yet power monitoring process.

Performance Monitor, System Monitor, or Reliability and Performance Monitor (or whatever we call it next)

Performance Monitoring (or whatever we want to call it today) has been introducing since Windows 3.51 and has gotten better over the releases on subsequent Windows operating systems. It’s still relatively complex for users that don’t understand operating systems components, but even with some basic computer OS knowledge (i.e. I know what I CPU, RAM, and Disk is used for) you can gather and understand some of the basic information performance statistics. In our discussion, I’ll speak about SSIS related performance stats.

Data Collector Sets via Permon.exe

Within Performance Monitor application, there is an additional toolset called Data Collector Sets. Data Collector Sets are used to establish what will be collected, how frequently, and in which format.

What to Collect?

Adding SSIS Performance Counters

This will depend heavily on what your role is and what needs to be monitored from a performance standpoint. From a purely SSIS standpoint (specifically SSIS 2008), I recommend the following performance counters to be captured…

SQLServer:SSIS Pipeline 10.0 - Buffers in use

SQLServer:SSIS Pipeline 10.0 -Rows read

SQLServer:SSIS Pipeline 10.0 -Buffers spooled

SQLServer:SSIS Pipeline 10.0 - Rows written

SQLServer:SSIS Service 10.0 – SSIS Package Instances

These five counters will give you a good view of the overall performance of your SSIS package processing. Buffers represent the amount of memory available to do SSIS work. So if you allocate 1MB for activity the Buffers is use should present up to 1MB. This is useful because it tells you how much memory is being used and if it has met the upper limit that has been allocated. This gets a bit more complicated as you add more packages and run them in parallel because if you allocate different amounts a memory between packages it would be difficult to see if one package has maxed out is assigned amount of memory. Its gets even more complicated if you commonly specify that DefaultMaxBufferRows exceeds the available to the DefaultBufferRows in the package setting. Because of this, we need to look at Buffers Spooled to see if we are sending SSIS work to disk (very bad).

How Frequently?

The frequency of how you should monitor SSIS activity depends on when it executes, how many packages are being monitored, and how much disk space you are willing to assign to this monitoring process. Since the premise around this post is using Windows performance counters to address large SSIS installations, this assumes that your environment SSIS processing is an issue. Therefore, a collection interval of about 15 seconds would be suitable. If your packages typically take a few minutes to process, we don’t need to monitor this activity as that would generally be a suitable processing expectation. However, SSIS processing that takes hours is where we will have the most use of collecting this performance data.

What Format?

As a SQL guy, my preferred method would be to use a SQL format which allows you to insert the data into an existing SQL Database. To have this work, you need to have to establish a DSN so that the Data Collector Set properties can connect and insert to the targeted database. If you don’t have access to a database DSN (which I would not be surprised as most BI developers will not often have created these) then you can simply retain the performance monitor format (*.mlb) or write the results to a CSV file.

Overall

Get in the habit of using the native OS tools to help access the performance issues within your BI environment. Although I spoke specifically on SSIS, Data Collector sets can be created for other BI applications SSAS and SSRS processing/usage as well.

Saturday, May 17, 2014

SSIS Lookups with Integers vs. non-integers

One of the most common SSIS Data Flow objects that a SSIS developer will use is the Lookup Data Flow task. When I’m developing for your typical data warehouse project where we need to populate the fact tables by joining the primary surrogate keys to foreign surrogate keys, or primary keys to foreign keys, we typically (or should always) use integers as surrogate or primarily key. Now this may be obviously since primary and surrogate keys often are identity type columns and the required data type is integer. However, there is nothing stopping you from using columns that are not integer data types as lookup relationships. In the image below, you can see the typical mapping relationship for SKU (In this example SKU was a Varchar data type)

Naturally, you would then create another data flow task that will sent matches columns to a slowing changing dimensions (SCD) data flow task, and if you’re using the SCD data flow task provided by SQL 2012, then your likely going to use the same column as your business key for the SCD wizard. This would subsequently create the OLE DB Command that uses the selected business key as the joining column. This is where the problem starts because you are now joining non-integer types in SSIS. This will become a major problem when you start to use lots of data as the updates will be extremely slow. The image below points to the Where statement that should only use integers

Now I’m not saying using non-integer data types won’t work, I’m, saying that it will work SLOWLY compared to integers. For example, just updating 65K rows had taking about 6 minutes compared to less than 10 seconds in my labs. Yes, it’s really a night and day difference!

Basically, avoid using alphanumeric/strings (or essentially any non-integer data type) for lookups relationships when possible. This is particularly true when using OLE DB Command (for your SCD operations) to update tables that match when doing the lookup. You'll find DBAs that will say it makes no difference. However, when you'll joining fact tables that could contain millions of rows, using integers makes the big difference.