With VBA. grab file name automatically generated in "save as" IE pop-up

Patrick77

New member
Local time
Today, 11:02
Joined
Mar 7, 2017
Messages
6
Hello,
I am relatively new to using VBA for web scraping. I am trying to collect information behind a password protected site and have been successful for the most part. There is one problem that I have not been able to figure out.

I want to obtain file names which are linked, but where the names do not exist in the HTML. Essentially, when I physically click on the link in Internet Explorer, the file name does appear and I want to be able to grab that as a variable.

I am able to trigger the event such that the "Do you want to open or save ..." pop-up appears. However, I have no idea how or if it is possible to GRAB the save as name that appears.

Here is a *stupid* example that ends in the save as window appearing.
I mention why it is stupid in the code below, but will repeat here. My real example does NOT have the file name in the HTML or URL used to trigger the file download - the example below does (just pretend it doesn't!).

Thank you for any ideas on how to get the file name!


Code:
Public Sub Example01()
            
    Dim dbsSCF As DAO.Database
    Dim rstSCF As DAO.Recordset
     
    Set dbsSCF = CurrentDb
    Set IE = CreateObject("InternetExplorer.application")
    IE.Visible = True
        
' Not necessary, but this is the web page where the file link exists
    IE.Navigate ("https://cran.r-project.org/package=abc")
        
'Wait while the page loads
Dim sw As StopWatch
Set sw = New StopWatch
sw.StartTimer

Do While (IE.Busy Or IE.ReadyState <> 4)
    If sw.EndTimer / 1000 > 60 Then
        MsgBox "Webpage taking too long to load; please check connection and try again."
        Exit Sub
    End If
Loop


' Note: this is a stupid example bacause the file name is part of the URL
' The real example I am looking at does not have the file name as part of the URL
    IE.Navigate ("https://cran.r-project.org/web/packages/abc/../../../bin/windows/contrib/3.4/abc_2.1.zip")

' What I want is to be able to get the file name some other way.

'    IE.Quit
'    Set IE = Nothing
               
End Sub


As a last resort, I know I could simply download the file and then get the name from that, but this whole process is a loop and I would need to be able to link 10+ downloaded files with the loop iteration.
 
What they are doing is often done to prevent bots from scaping.

Do you have permissions from the site to scrape it?
 
Hi,
Yes I do; I have full access to the site for work purposes.

It just happens that the site isn't the best for historical record keeping; things tend to be pulled down at random times and are then gone forever.

My script essentially meant to grab service request details every day, and when those requests include excel files - I want to know what the exact name of the file is. I need this essentially for accuracy checks against my co-workers who completed and uploaded the file.

I haven't messed with this since, but I also suppose I could simply have my code open the file, then grab the name of an open excel file. The caveat would be that no other excel file could be opened. - this is not ideal.





What they are doing is often done to prevent bots from scaping.

Do you have permissions from the site to scrape it?
 
IE.navigate will fire the BeforeNavigate2 event where you can get the URL
 
Hi Static,
Thank you for the response. I might have made my example more confusing than it needs to be. I do not need the URL, I have that. I need the actual file name, which does not exist in the HTML code or in the URL which downloads the file.
 
Is the site actually located on "https://cran.r-project.org/package=abc"?
 
No.
This is just used as example code that gets to the same basic point - where the "save as" pop-up window occurs.

The actual site is a secure URL that I can't use as an example.

Is the site actually located on "https://cran.r-project.org/package=abc"?
 
A path to a file is still an URL. Clicking an anchor <a href='...'> fires a navigate event which you can catch with the BeforeNavigate event. The URL (href) in that case is the link to the file.

Open the site in a proper browser and use inspect element instead of view source. If it's not an anchor the above probably wont work, but you should be able to access it through the dom.
 

Users who are viewing this thread

Back
Top Bottom