Solved Linking/Importing Excel File w/ "Dirty Data" Into Access (1 Viewer)

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
Good morning all.

There is an Excel file that was created and is maintained by multiple coworkers. I have to work with the data in that file so I currently have it as a linked table in my Access db.

I ran my query and found empty cells where there should be data. Looking at the linked table I am seeing #NUM! where there should be data. I went into the Excel file and the data is there with the little green indicator letting me know that the data is a number stored as text.

Access does not give me the option to format any of the data when I set it up as linked table.

I then converted the linked table to a local table. I played around with the different number formats and tried converted those fields to text (I do not need them to be a number). I imported the data using DoCmd.TransferSpreadsheet in VBA but I got import errors and it dropped the data from the cells where they are numbers stored as text.

I know I can manually go in an format the data in the sheet but this sheet has (90) fields. If they screw it up in one place it is only a matter of time until it is screwed up somewhere else.

How can I bring the data in without some of dropping out?

GETTING THESE GUYS TO KEEP THE DATA CLEAN IS IMPOSSIBLE SO PLEASE LETS NOT GET INTO THAT.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 10:34
Joined
May 7, 2009
Messages
14,111
How can I bring the data in without some of dropping out?
cleaning through vba.
you can have a mapping Table for the columns and their datatype.
on vba you can inspect each rows, columns of the worksheet against
your defined table column datatype and convert the cell value when
necessary.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:34
Joined
Feb 28, 2001
Messages
20,859
Been there, done that, wore out that T-shirt...

Part of the problem may be that you are EzGoingKev. If you were HardAssKev they might listen. But all kidding aside, this is an unstable situation for which we need to know just a bit more. How important is this dataset to your company or your department or whomever it serves?

A shared spreadsheet doesn't share very well because as far as I recall, it is difficult verging on impossible for two people to be in there editing at the same time. Excel doesn't share write-locks. It is a many-read/one-write model if I remember correctly.

You can do as ArnelGP suggests, but there is the problem that these coworkers might have serious inconsistencies on data entry even into the same column on two different rows. As long as you are using Excel, you CANNOT force them into entering decent data. That is because the internal structure of an Excel cell is that EACH CELL is an independent entity. Its value CAN reference some other cell via some formula - but doesn't have to. Its value CAN conform to some column format selection - but doesn't have to. It can have data filled in - but doesn't have to. Short of writing some VBA behind the spreadsheet, you have ZERO force to apply.

So the first thing to do is talk to the person who wants this data to be "worked with." If you are doing this on your own initiative, the sad news is that you have a LONG road ahead of you. But if your boss really wants the information you can provide, have a talk about the department's true needs. And if the boss is on board, talk about the chaotic way that things are being entered. Point out that you spend more time fixing someone else's mistakes than in doing this analysis.

IF the boss really wants to resolve the chaos, the solution might be to get rid of the spreadsheet and make a shared front-end interface to an Access back-end data file. The forms used to enter data can then pre-filter user input and give them error messages when they enter data like a bunch of drunken monkeys. I've been through this and I can assure you it is no picnic. They will do a screaming howler monkey dance on your desk at first. But if the data becomes better regulated and they can start seeing benefits, perhaps you will win them over, however slowly.

The only other way I've ever managed this is a slow and very tedious method involving opening an Excel Application Object, opening the workbook, activating the spreadsheet, and then in a nested loop, "walk" through each column one row at a time and look at each cell's contents. Then convert the contents to an internal variable. At the end of each row do a recordset .AddNew / .Update and take the next outer loop (to a new row). The sad part is that this is where your have the hard road. If the data sets really ARE as bad as you say, that "convert the contents" step will be like meeting Freddy Kruger on "A Nightmare on Access Street."
 

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
cleaning through vba.
you can have a mapping Table for the columns and their datatype.
on vba you can inspect each rows, columns of the worksheet against
your defined table column datatype and convert the cell value when
necessary.

That sounds like a plan. Would it require :
- "blanket" code that automatically looks at each column and adjusts as needed
- Individual code be written for each one of the (90) fields

Been there, done that, wore out that T-shirt...

There is nothing here. When I say nothing I mean there is no leadership. There are no practices/procedures. There is no standardized format for anything. My company is a collection of multiple companies they purchased. They are all under one banner but they still run their day to day like they are individual companies. No understands data standards. There is no discipline here.

As for the locks and stuff, the spreadsheet is saved in Sharepoint. I do not have any experience with using Sharepoint other than when I need the data I go in and download a copy on my local machine. I do know that I can view it but do not have permission to edit the data.

I believe the cause of my current situation is the field in the sheet is set up as General. The ones that are in there as a number stored as text are a result of someone cutting and pasting data out of somewhere else.

If I cannot fix it myself it will not get fixed.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:34
Joined
Feb 28, 2001
Messages
20,859
At this point, you won't get management support, so the Excel App Object may be the way to go if you can't do anything with queries.

When I say nothing I mean there is no leadership. There are no practices/procedures. There is no standardized format for anything.

I was once in a U.S. Navy office like that, right after the division had been "re-aligned." Navy upper echelon management can NEVER leave well enough alone, possibly because "well enough" occurs so rarely that they don't recognize it when they see it. But in your case, that means you will have to live with inconsistent input and will need to put some smarts into the deciphering of whatever you find in those "GENERAL" cells.
 

Pat Hartman

Super Moderator
Staff member
Local time
Yesterday, 22:34
Joined
Feb 19, 2002
Messages
33,223
One simplistic method I've used is if I don't get so many files that I actually need to automate the import, I open the Excel workbook and Select all the non-date columns and convert the type from general to text. Then I select the date columns and make sure they are a valid date data type. Save and close. Now you can import all the numbers as text and Access won't complain. Relink the spreadsheet and Access will "see" the columns as text rather than numeric. When you link a spreadsheet, Access and Excel collaborate or conspire to determine the data type of each column using the first 30 rows. That is why you are having trouble with numeric values. If the first bunch are actually numeric, Access decides the column is numeric and therefore all values must conform unlike Excel which is way too flexible regarding data type.

Once you can reliably fix the spreadsheet manually, you can attempt to automate the process by opening Excel and using VBA to do what needs to be done. Since I am not an Excel expert, I generally do this first in excel and turn on the macro recorder. Then when I'm writting the VBA code, I use the VBA generated by the macro recorder. Sometimes I can just pop it in. Sometimes I need to fix how objects are referenced since it is the difference between referencing an object within the object (in excel) vs from outside (Access looking in).
 

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
I use the data in the Excel to build (7) individual data sets that are used as load sheets to get our data into another system. I have a macro set up that runs all the code/queries to get it done.

I am looking for something that can be added to that macro so it is done automatically. I do not build these sheets all the time so my concern is I would forget it to manually change things and the data would have holes where it should not. I am down for the best way to do it, I just do not know how to do it.
 

Pat Hartman

Super Moderator
Staff member
Local time
Yesterday, 22:34
Joined
Feb 19, 2002
Messages
33,223
Try my suggestion. Open one of the spreadsheets. turn on the macro recorder. Step through the fix up process. Turn off the recorder. Lift the code into Access. Sometimes you can port it directly. Other times, you need to change how cells are referenced. but it gives you a start.
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:34
Joined
Mar 14, 2017
Messages
5,774
Use DoCmd.TransferSpreadsheet with a saved Import Spec, destined for an Access table where virtually all columns are Long or Short text.
(Access will find fault with a lot of data that "cannot be converted to date", or "cannot be converted to number" - etc - but it will have a much harder time saying "could not be converted to text", since nothing can't be converted to text).

Then go from there with queries to transform as needed.

I'd skip the Linking entirely in your case, because you have inconsistent data, and in that scenario, querying a Link can be pretty challenging.

But also try to commandeer their spreadsheet for a few hours and do the best you can with Data Validation or other sneaky tricks to disallow garbage.
 
Last edited:

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
Use DoCmd.TransferSpreadsheet with a saved Import Spec, destined for an Access table where virtually all columns are Long or Short text.
(Access will find fault with a lot of data that "cannot be converted to date", or "cannot be converted to number" - etc - but it will have a much harder time saying "could not be converted to text", since nothing can't be converted to text).

I went to try your idea and Access. When I import a text file there is an Advanced option that lets me save the import specs I have set up. I am not seeing that when I import an Excel file. All I have is the "Save Import Steps" at the end. Would this be the same thing?
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:34
Joined
Mar 14, 2017
Messages
5,774
I went to try your idea and Access. When I import a text file there is an Advanced option that lets me save the import specs I have set up. I am not seeing that when I import an Excel file. All I have is the "Save Import Steps" at the end. Would this be the same thing?
I just realized I was wrong, you might not be able to use a saved import spec with docmd.transferspreadsheet. Sorry about that.

However, you could write a few lines of code to open the spreadsheet, save it as CSV (see top answer)or Text, and then import that - save an import spec while doing so - then use docmd.transfertext, thereafter, with the saved import spec as one of the parameters.

Alternately, you still might be able to implement most of my idea - except without the saved import spec.
Try just using docmd.transferspreadsheet - let Access try to figure stuff out - except have the destination table be one with 100% text columns. Access might be totally successful. Then you can use queries to figure out how to handle the bad data
 

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
I just realized I was wrong, you might not be able to use a saved import spec with docmd.transferspreadsheet. Sorry about that.

However, you could write a few lines of code to open the spreadsheet, save it as CSV (see top answer)or Text, and then import that - save an import spec while doing so - then use docmd.transfertext, thereafter, with the saved import spec as one of the parameters.

Alternately, you still might be able to implement most of my idea - except without the saved import spec.
Try just using docmd.transferspreadsheet - let Access try to figure stuff out - except have the destination table be one with 100% text columns. Access might be totally successful. Then you can use queries to figure out how to handle the bad data

After you posted that I did the import and saved the steps. I tried using DoCmd.RunSavedImportExport but still had a problem.

The Excel file has has multiple sheets so it would need more steps to convert it to a CSV.

I found this here and tried this with one of options listed about 3/4 of the way down -

SQL:
DoCmd. RunSql "INSERT INTO mytablename SELECT T1.*, 1 AS SheetSource FROM [Excel 10;HDR=YES;IMEX=1;Database=C:\MyFolder\MyFileName.xlsx].[WorksheetName$] as T1"

I guessed on the "Excel 10". I am using Office 365.

Running that gave me a "Run-time error '3170': Could not find installable ISAM."

Not sure if I can get this to work and if it will do what I am trying to do.
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:34
Joined
Mar 14, 2017
Messages
5,774
If I remember correctly, either @pbaldy or @CJ_London have considerable experience using those connection strings that reference external files. Perhaps they will chime in.

I've never used RunSavedImportExport, a newer feature I think, so can't comment on that, just never had a use for it.

The Excel file has has multiple sheets so it would need more steps to convert it to a CSV.
Worksheet.Copy 'this copies it OUT of the primary workbook, and INTO a new workbook, which becomes ActiveWorkbook
ActiveWorkbook.SaveAs (xlCSV) 'should work

Or more reliably, without using Active in Excel vba (good thing to avoid):
PrimaryWorkbook.Worksheets("Sheet1").Copy after:=Secondaryworkbook.Worksheets(SecondaryWorkbook.worksheets.count)
'guarantees it will become the last sheet in the secondary workbook
Set ws = SecondaryWorkbook.Worksheets(secondaryworkbook.worksheets.count)
for each worksheet in secondaryworkbook.worksheets
...if worksheet.name<>ws.name then
delete it
end if
secondaryworkbook.saveas (xlcsv)

....is another way to do it. I'm using a workbook.saveas > CSV right now in a project to prepare CSV's for sql uploads, the save happens pretty fast. Excel has one weird habit, though...it only "quotes" those columns that need quoting due to quotation marks in the value...which means I also use automation to do a little Replace() manipulation on the CSV file before uploading it to sql...but that's another story.

Have you tried simply DoCmd.TransferSpreadsheet into an Access table with all Text columns?
 

bastanu

AWF VIP
Local time
Yesterday, 19:34
Joined
Apr 13, 2010
Messages
865
Maybe you could upload a sample of the spreadsheet (no sensitive data) just the sheet names with the fields and a few "dummy' records to illustrate the problem.

If the main\only problem is numbers saved as text there are many examples of VBA code to address that, here is one that provides a variety of solutions: https://stackoverflow.com/questions/36771458/vba-convert-text-to-number

You would have to open the Excel file in VBA, loop through the sheets, run one of the functions to fix the numbers as text in each of the columns affected ,save it then import it and move to the next sheet.
Cheers,
 

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
Try my suggestion. Open one of the spreadsheets. turn on the macro recorder. Step through the fix up process. Turn off the recorder. Lift the code into Access. Sometimes you can port it directly. Other times, you need to change how cells are referenced. but it gives you a start.
I did that and have the code but do not how to use it in VBA.
Have you tried simply DoCmd.TransferSpreadsheet into an Access table with all Text columns?

Yes I did that before I posted on here.
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:34
Joined
Mar 14, 2017
Messages
5,774
Personally, I think at this point, I would go the route of saving a worksheet to CSV, then manually importing the CSV into Access all text columns, then thereafter use DoCmd.TransferText with the named import spec for import. This gives you full control and gets your raw data into a database as early as possible.

You can go the route of manipulating the Excel data first, but to me that raises more questions. Say your goal is to process a column in Excel that ought to contain all Integers. Your code comes to one particular cell, and instead of an Integer, it says "spaghetti". What will you do? You can't convert "spaghetti" to integer anyway. Will you delete the entire row? Is that acceptable to the business, or would it be better to divert it off to some Exception Process? If the latter, you're better off getting it inside the database before trying to remove or divert it.

On the other hand, IF the only problem is, some numbers in Excel are saved as text and that's the only problem you have, then manipulating it in Excel wouldn't be too bad. For integer example: Loop through a Range, save the CINT(rng.value) to an integer-typed variable, then rng.NumberFormat="0", then rng.value=SavedVariable.
 

pbaldy

Wino Moderator
Staff member
Local time
Yesterday, 19:34
Joined
Aug 30, 2003
Messages
35,383
If I remember correctly, either @pbaldy or @CJ_London have considerable experience using those connection strings that reference external files. Perhaps they will chime in.

Hopefully it's CJ. ;)

I do a fair amount of importing from Excel, but I've never used a connection string like that. Depend on specifics, I either pull it into a local staging table as Isaac described, link to it and then import, or use automation. As Isaac suggested, with a staging table you can analyze before importing. I'd have to have a lot of confidence in the source data to pull it in directly without checking it.
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:34
Joined
Mar 14, 2017
Messages
5,774
Ahh yes, double checking my guess shows that it must be CJ.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 10:34
Joined
May 7, 2009
Messages
14,111
demo. extract both db and .xlsx file
to same folder.

check mappingTable records.
the records correspond to which
column in the excel file.
the "expression" field is the way
we want to convert each cell in excel file.

see the excel file before converting.

open module1 and run test() sub.
 

Attachments

  • Clean and Import Excel.zip
    39.9 KB · Views: 133

EzGoingKev

Registered User.
Local time
Yesterday, 22:34
Joined
Nov 8, 2019
Messages
141
Thanks for the replies guys.

@arnelgp - I played with what you sent me and it worked on doing a quick test.

I have something else I need to get done before a meeting tomorrow so I will not get a chance to do anything with this until after that.
 

Users who are viewing this thread

Top Bottom