Scraping some text off an Amazon webpage... (1 Viewer)

isladogs

MVP / VIP
Local time
Today, 17:04
Joined
Jan 14, 2017
Messages
18,186
Thanks for the links.
I've had a quick glance but will look properly over the weekend

Do you know if the data can be exported as JSON files?
It only seems to mention XML
 

peskywinnets

Registered User.
Local time
Today, 17:04
Joined
Feb 4, 2014
Messages
576
Do you know if the data can be exported as JSON files?
It only seems to mention XML

I think they only use XML...this is of massive frustration...I came to their APIS totally new...I spent ages learning how to parse it & use the result within Access ...I figured that other platforms/marketplaces world take a similar approach....only to find that JSON seems to be the future!

My intended webcart is something like shopify or similar....& once migrated, I'd like to be able to pull in customer orders, here's Shopify's reference...

https://help.shopify.com/api/reference/order#show

yep, all JSON!
 

isladogs

MVP / VIP
Local time
Today, 17:04
Joined
Jan 14, 2017
Messages
18,186
Quick update

I've modified the database slightly & tested all 5 examples successfully
Time taken approx 1 second per search





There are a couple of bugs to sort out which I'll look at over the weekend.
Once I've done so, I'll post the new version
 

Attachments

  • Form.PNG
    Form.PNG
    23.5 KB · Views: 416
  • Table.PNG
    Table.PNG
    28.4 KB · Views: 392

jdraw

Super Moderator
Staff member
Local time
Today, 13:04
Joined
Jan 23, 2006
Messages
15,364
Guys,

I took the function Pesky got from analystcave and modified it as below.

Code:
'---------------------------------------------------------------------------------------
' Procedure : AsinSoldByAmazon
' Author    : mellon
' Date      : 27-Oct-2017
' Purpose   :To see if an ASIN (amazon talk for a vendor product??) is Dispatched and Sold By Amazon
'
'This function returns:
'    True if product sold by amazon
'    False if product not sold by amazon
'---------------------------------------------------------------------------------------
'
Public Function AsinSoldByAmazon(ReqASIN As String) As Boolean  'ReqASIN As String
'Public Function GetElementById()
    Dim tempINfo As String
    Dim BaseURL As String
    'Dim ReqASIN As String
10    On Error GoTo AsinSoldByAmazon_Error

      'ReqASIN = "B001KOTNG2"

20    BaseURL = "https://www.amazon.co.uk/dp/"
    Dim id As String
    Dim SoldByAmazon As Boolean
    Dim XMLHTTP As Object, html As Object, objResult As Object
30    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
40    XMLHTTP.Open "GET", BaseURL & ReqASIN, False
50    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
60    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"

70    id = "merchant-info"
80    XMLHTTP.send
90    DoEvents
100   Set html = CreateObject("htmlfile")
110   html.body.innerHTML = XMLHTTP.responseText
120   Set objResult = html.GetElementById(id)
    
130   tempINfo = objResult.innerHTML
140   tempINfo = Mid(tempINfo, 20, 60)    '29,6) ''JED adjusted based on trial and error with "B0009VX1ZG"
150       'Debug.Print tempINfo
160   If tempINfo Like "*Amazon*" Then
170     AsinSoldByAmazon = True
180   Else
190     AsinSoldByAmazon = False
200   End If

GetElementById_Exit:

210   Exit Function


AsinSoldByAmazon_Exit:
220   Exit Function

AsinSoldByAmazon_Error:
230   MsgBox "Error " & Err.Number & " in line " & Erl & " (" & Err.Description & ") in procedure AsinSoldByAmazon of Module ModJdraw"
240   Resume AsinSoldByAmazon_Exit

End Function


Here is a test routine to determine if the submitted ASIN is dispatched and sold by Amazon.
Code:
Sub testAmazon()
'FROM Pesky

'B0073H1CIC False
'B00UQ8D5OE False
'B01LWQBKMO True
'B0001IW8TW True
'B0009VX1ZG True
10  Debug.Print "started: " & Now
    Dim j As Integer
    Dim asin(5) As String
20  asin(0) = "B0073H1CIC"
30  asin(1) = "B00UQ8D5OE"
40  asin(2) = "B01LWQBKMO"
50  asin(3) = "B0001IW8TW"
60  asin(4) = "B0009VX1ZG"
70  asin(5) = "B00CEVFMTW"  'I added this one
80  For j = LBound(asin) To UBound(asin) Step 1
90      Debug.Print "j:" & j & "  " & asin(j) & " sold by amazon Is " & AsinSoldByAmazon(asin(j))
100 Next j
110 Debug.Print "Ended: " & Now
End Sub

Here are the results showing execution time:

Code:
started: 27-Oct-2017 5:28:10 PM
j:0  B0073H1CIC sold by amazon Is False
j:1  B00UQ8D5OE sold by amazon Is False
j:2  B01LWQBKMO sold by amazon Is True
j:3  B0001IW8TW sold by amazon Is True
j:4  B0009VX1ZG sold by amazon Is True
j:5  B00CEVFMTW sold by amazon Is False
Ended: 27-Oct-2017 5:28:31 PM


Note: I found an issue with the original test when using ASIN B0009VX1ZG. The original check to see that Mid(xxx,29,6) did not match "amazon". The html was different. So I modified the lenght and changed from = to Like.

Hope it may be useful.
 
Last edited:

peskywinnets

Registered User.
Local time
Today, 17:04
Joined
Feb 4, 2014
Messages
576
Thanks guys excellent work - following on from my last post, I cracked on with my code & have it working - ish ...I think amazon are throttling, because when I ran it on my list of 150 ASINs I got an error after about 50 requests ...this led me to think that my IP address may end up getting blacklisted (Amazon have all manner of detections in place) - so I might need to ponder using a proxy!
 

isladogs

MVP / VIP
Local time
Today, 17:04
Joined
Jan 14, 2017
Messages
18,186
Hi both

There was such a flurry of mails earlier that I missed post #19 with the SoldByAmazon function.

Haven't checked the details of the function but it sounds like exactly what you want which is great.
As I've almost completed mine, I'll finish it anyway & upload hopefully on Sunday

Note: I found an issue with the original test when using ASIN B0009VX1ZG. The original check to see that Mid(xxx,29,6) did not match "amazon". The html was different. So I modified the lenght and changed from = to Like.

I also did the same on my version - if you look at the screenshot for records 3 & 4 the output was Amazon followed by a full stop.
So I also used like *Amazon* to check if it was true.
Oddly record 5, B0009VX1ZG, just gave Amazon so I didn't need the wildcard for that one

Looking at line 60 in the function, does the code still work if using another browser instead of Firefox e.g. Chrome or Edge?
 

peskywinnets

Registered User.
Local time
Today, 17:04
Joined
Feb 4, 2014
Messages
576
I also did the same on my version - if you look at the screenshot for records 3 & 4 the output was Amazon followed by a full stop.
So I also used like *Amazon* to check if it was true.
Oddly record 5, B0009VX1ZG, just gave Amazon so I didn't need the wildcard for that one

I'm no longer using Mid() to try & extract the exact string "Amazon", but now using the Instr() function to look for the string "Amazon" somewhere along the row of captured text (I'm thinking that Amazon have different ways of saying they're selling the product, so I can't assume the text will always be in the exact same position!)

I've also changed the actual result from Boolean to a String because then I can check for other sellers at a later date...

Code:
    Temp = objResult.innerHTML
        If InStr(1, Temp, "Amazon")>0 Then
        SoldBy = "Amazon"
        Else
        SoldBy = "Other"
        End If

Looking at line 60 in the function, does the code still work if using another browser instead of Firefox e.g. Chrome or Edge?

I don't know - I've all those browsers installed on my PC ...I don't intend uninstalling to see if the code still works :)
 
Last edited:

jdraw

Super Moderator
Staff member
Local time
Today, 13:04
Joined
Jan 23, 2006
Messages
15,364
Colin,

I did not check other browsers.

I have found within the various source pages that "sold by amazon" occurs without the "dispatched from and ". I modified my function to look for the full string/text.

Code:
130   tempINfo = objResult.innerHTML
[COLOR="Blue"]140   tempINfo = Mid(tempINfo, 1, 80)    '29,6) ''JED adjusted based on trial and error with "B0009VX1ZG"
150      ' Debug.Print tempINfo
160   If tempINfo Like "*Dispatched from and sold by Amazon*" [/COLOR]Then
170     AsinSoldByAmazon = True
180   Else
190     AsinSoldByAmazon = False
200   End If


Pesky,
My guess is that Amazon, like Google and others, may have some restrictions to prevent tying up their servers. Google maps will let a "free account" access approximately 2000 urls per day. I have had interaction with posters suggesting breaking up a number of requests to < 2000/day so google doesn't block your account/ip addr or whatever.

As I have said in our previous discussions, if you get your requirements (the WHAT) sorted, there are many on the forum to offer suggestions and advice (even code) for HOW to do something.

Continued good luck with your project.

Update: Here is some output to see what exactly was in

Code:
....
130   tempINfo = objResult.innerHTML
 Debug.Print tempINfo
....

Result:

Code:
started: 27-Oct-2017 10:18:12 PM
Dispatched from and sold by <A href="about:/gp/help/seller/at-a-glance.html/ref=dp_merchant_link?ie=UTF8&seller=A1KDCT42GCX3ZR">neatsales</A>. <SPAN></SPAN>
j:0  B0073H1CIC sold by amazon Is False

Dispatched from and sold by <A href="about:/gp/help/seller/at-a-glance.html/ref=dp_merchant_link?ie=UTF8&seller=A7YZ1S6Z0TLZ3">Langley Steelworks Ltd</A>. <SPAN></SPAN>
j:1  B00UQ8D5OE sold by amazon Is False

Dispatched from and sold by Amazon. <SPAN>Gift-wrap available. </SPAN>
j:2  B01LWQBKMO sold by amazon Is True

Dispatched from and sold by Amazon. <SPAN>Gift-wrap available. </SPAN>
j:3  B0001IW8TW sold by amazon Is True

<SPAN id=pe-text-availability-merchant-info>Dispatched from and sold by Amazon <B>exclusively for Prime members</B>. </SPAN><SPAN id=pe-bb-details-trigger class=a-declarative data-action="a-popover" data-a-popover='{"name":"primeExclusiveIntro","activate":"onclick","width":"450","header":"Exclusively for Amazon Prime Members","position":"triggerHorizontal"}'><A id=pe-link-availability-details class="a-link-normal prime-exclusive-details-link-trigger a-text-normal" href="about:blank#">Details </A></SPAN><BR><SPAN>Gift-wrap available. </SPAN>
j:4  B0009VX1ZG sold by amazon Is True

Dispatched from and sold by <A href="about:/gp/help/seller/at-a-glance.html/ref=dp_merchant_link?ie=UTF8&seller=A22A10WS32DZ1V">go4products</A>. <SPAN></SPAN>
j:5  B00CEVFMTW sold by amazon Is False

Ended: 27-Oct-2017 10:18:33 PM
 
Last edited:

peskywinnets

Registered User.
Local time
Today, 17:04
Joined
Feb 4, 2014
Messages
576
Pesky,
My guess is that Amazon, like Google and others, may have some restrictions to prevent tying up their servers. Google maps will let a "free account" access approximately 2000 urls per day. I have had interaction with posters suggesting breaking up a number of requests to < 2000/day so google doesn't block your account/ip addr or whatever.

My (short term) solution is to use a proxy for the http requests, something like this...
Code:
    If Count < 80 Then
    XMLHTTP.setProxy 2, "http=178.62.28.110:8118"
    Else
    XMLHTTP.setProxy 2, "http=139.59.175.229:8118"
    End If

I've 150 products, so cycling the http requests between proxies after 80 requests makes sense. Potentially I could add heaps of proxies & retrieve all day long without being blacklisted :)
 

Users who are viewing this thread

Top Bottom