SQL with Excel

This is a response to an inquiry in the U2 tech forum about the need to use [Worksheet$] syntax in a SQL query, and the idea that an Excel worksheet is treated like a System Table. This isn’t really a U2 topic so I thought I’d tackle it here.

From: Ray Wurlod
To get a list of tables from Excel you must specify that you want to retrieve system tables. For some arcane reason best known to Microsoft and not revealed to the great unwashed, Excel worksheets are reported to the ODBC driver as system tables (and therefore get their names prefixed with "$").

My intent here isn’t to respond to Ray directly. I’m grateful that he’s provided a base for some notes here on a topic where I otherwise don’t get much opportunity to share information.

Let’s take this in steps.

  1. In SQL queries, Worksheets as tables are suffixed, not prefixed as indicated above, and surrounded in brackets:
    select * from [SheetName$]
  2. Without the brackets and dollar sign the table reference is a named range:
    select * from NamedRange
  3. I believe Ray is mistaken when he says you must specify that you want to retrieve system tables. If you ask for all tables in an Excel workbook you’ll get named ranges reported as tables and worksheets reported as system tables. With ADO you can choose to retrieve all worksheets, minus the named ranges, if you specify that you want only system tables.
    OpenSchema(adSchemaTables, Array(Empty, Empty, Empty, "System Table"))
    That said, according to Microsoft Knowledge Base item #257819 "this does not work against an Excel data source with MDAC versions later than 2.0". This is a point that bears investigation, and I don’t really care about details, I’m just pointing out the anomaly that something that "must" be done is documented as something that "does not work".
  4. The character used in Excel formulas to delimit a worksheet from cell references is the exclamation mark:
    =SheetName!A1:C2
    They couldn’t use the dollar sign in formulas because it’s a reserved character to lock cell references:
    =$A$1:$C$2
  5. So why not use the exclamation mark for SQL too?
    Because that’s already a reserved character to delimit a database name from a table:
    select * from OtherDatabase!SomeTable
  6. Why brackets? Why not just use Worksheet$ for a table name?
    The use of brackets is different in each SQL implementation. Oracle may return errors if brackets are used, as does ODBC with SQL Server. But OLEDB requires brackets around table names for access to some (not all) database types. *sigh* Let’s consider the brackets one of those things that just ‘is’.
    That said, you can also use back-ticks (reverse single quote) but Microsoft recommends brackets:
    `SheetName$`
    Note the quote is the same one on the USA-English keyboard as the tilde character (~), it’s not the one under the double-quote.

  7. Why the dollar sign?
    The dollar sign isn’t a reserved character in SQL table names, so the trailing $ is harmless enough and that probably gave Microsoft one reason to consider using it for worksheet references.
  8. Ray notes that Worksheet$ is reported by ODBC as a "System Table". Indeed, KB257819 also says "worksheets are treated as ‘System Tables,’ and there is not much useful table information you can retrieve beyond this ‘table_type’ property".
    A system table is a data dictionary which contains metadata about a table. In Oracle, sys tables are referenced as Table$.
    While an Excel worksheet contains data, it also contains other things like formulas and formatting. One could dare to call that metadata, so the term System Table isn’t far from wrong here. But when we use SQL to query this Worksheet$ file, we’re not doing a query on the metadata, we’re requesting data from the cells exposed by the worksheet. It seems Table$ serves a dual purpose as both a metadata provider and a data provider.

Summary

  • It was good and important for Microsoft to note a distinction between this unique object type and other relational datasource/tables, especially since Excel as a data source has limited SQL query capabilities.
  • How else would they implement the distinction except by treating a worksheet like a System Table in software and documentation? Introducing special hooks to ADO and SQL would only earn Microsoft more criticism for abuse of standards.
  • The trailing dollar sign works as required, and as a bit of a bonus for consistency it gives Excel worksheets the same syntax as an Oracle system table.
  • I don’t know how many people really care about this narrow topic, but given the wealth of info available on this topic it seems it’s not as arcane as the two or three of us thought.

Here are a few other references that might be helpful on these topics:

http://support.microsoft.com/kb/321686
http://www.sql-server-helper.com/tips/read-import-excel-file-p01.aspx
http://www.databasejournal.com/features/mssql/article.php/3508881

If you’re working with MV and Excel, I welcome you to check out the new NebulaXLite.

Leave a Reply