DDi Compendium API and Java Classes?
Moderators: dorpond, trevor, Azhrei
Re: DDi Compendium API and Java Classes?
Hmm, I have written some code to do this (POST simulation) and tested it successfully on my own web page. However when I try it on the login.aspx page I get a HTTP 500 error when trying to get the InputStream back in after posting the POST request up to the login page.
I'm guessing this is too little info to get any help here mind you.
Sigh - it seems really obstinant this bit!
I'm guessing this is too little info to get any help here mind you.
Sigh - it seems really obstinant this bit!
The guy in the green hat.
- jfrazierjr
- Deity
- Posts: 5176
- Joined: Tue Sep 11, 2007 7:31 pm
Re: DDi Compendium API and Java Classes?
Blakey wrote:Hmm, I have written some code to do this (POST simulation) and tested it successfully on my own web page. However when I try it on the login.aspx page I get a HTTP 500 error when trying to get the InputStream back in after posting the POST request up to the login page.
I'm guessing this is too little info to get any help here mind you.
Sigh - it seems really obstinant this bit!
Can you post up the code you have so we can see exactly what you are doing?
I save all my Campaign Files to DropBox. Not only can I access a campaign file from pretty much any OS that will run Maptool(Win,OSX, linux), but each file is versioned, so if something goes crazy wild, I can always roll back to a previous version of the same file.
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
Re: DDi Compendium API and Java Classes?
Sure can (without try/catch blocks):
The RTE I get from Java is:
If I use the above source code and replace the "login.aspx" bit with search.php file in my web site and replace 'data' with some sensible search info values for that page it runs perfectly taking me off to the search results page as expected.
I have also tried this code with just "login.aspx" and with "power.aspx?id-805" - all of which returns me a HTTP 500 error. If I change the data so that it doesn't match the input fields of the form then I just get the HTML from login.aspx returned.
Any ideas?
Code: Select all
public void doit() {
//Build parameter string
String data = "email=myname%40hotmail.com&password=XXXXXX&__VIEWSTATE=/wEPDwUKLTMxMzExNzE1NGRk7Sq8X//JLUzpWIgD1qI4mQ0VFAg=&__EVENTVALIDATION=/wEWBALPrLO0AQKyzcaDDQLyveCRDwK4+vrXBd4Rwv7kTY347wUJWvlx2SAjXND8&InsiderSignin=Sign%20In";
// Send the request
URL url = new URL("http://www.wizards.com/dndinsider/compendium/power.aspx?id=805");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
conn.setDoInput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
//write parameters
writer.write(data);
writer.flush();
// Get the response
StringBuffer answer = new StringBuffer();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
answer.append(line+"\n");
}
writer.close();
reader.close();
//Output the response
System.out.println(answer.toString());
}
Code: Select all
java.io.IOException: Server returned HTTP response code: 500 for URL: http://www.wizards.com/dndinsider/compendium/login.aspx?page=power&id=805
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1313)
at info.rodinia.tokenmaker.CompendiumReader.login(CompendiumReader.java:154)
at info.rodinia.tokenmaker.TokenMaker.main(TokenMaker.java:20)
I have also tried this code with just "login.aspx" and with "power.aspx?id-805" - all of which returns me a HTTP 500 error. If I change the data so that it doesn't match the input fields of the form then I just get the HTML from login.aspx returned.
Any ideas?
The guy in the green hat.
Re: DDi Compendium API and Java Classes?
The login form in question has the following hidden fields:
I've just loaded the form from two different machines on two different networks, and the values of those fields were the same in both cases... whoops, you're already getting those, I see. Huh.
You should definitely be using http://www.wizards.com/dndinsider/compendium/login.aspx. The power.aspx Web page doesn't know how to process the POST variables you're sending it. Hm.
When I get all system adminstrator and use my command line tools to try posting that data to the login.aspx page, I get an error message: "The state information is invalud for this page and might be corrupted." This makes me think there's something tricky about handling those two hidden fields.
Um, try URL-encoding the values of those fields, come to think of it?
Code: Select all
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTMxMzExNzE1NGRk7Sq8X//JLUzpWIgD1qI4mQ0VFAg=" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBALPrLO0AQKyzcaDDQLyveCRDwK4+vrXBd4Rwv7kTY347wUJWvlx2SAjXND8" />
You should definitely be using http://www.wizards.com/dndinsider/compendium/login.aspx. The power.aspx Web page doesn't know how to process the POST variables you're sending it. Hm.
When I get all system adminstrator and use my command line tools to try posting that data to the login.aspx page, I get an error message: "The state information is invalud for this page and might be corrupted." This makes me think there's something tricky about handling those two hidden fields.
Um, try URL-encoding the values of those fields, come to think of it?
Re: DDi Compendium API and Java Classes?
Thanks for the responses.
I added URLEncouder.encode() around the keys and suddenly the HTTP 500 error has gone away. I'm now back to getting the standard code for login.aspx. I get that whether I ask for power.aspx or login.aspx in my URL.
I added URLEncouder.encode() around the keys and suddenly the HTTP 500 error has gone away. I'm now back to getting the standard code for login.aspx. I get that whether I ask for power.aspx or login.aspx in my URL.
The guy in the green hat.
- jfrazierjr
- Deity
- Posts: 5176
- Joined: Tue Sep 11, 2007 7:31 pm
Re: DDi Compendium API and Java Classes?
As noted, you need to pass in ALL of the form's input parameters just to make sure, even the submit button!!!
Also, to deal with state information, this is why it is best to open a connection to the page directly, read the page data, parse out each of the HTML input fields to get the name/value pairs and THEN post the paired data to try the login. What this means is that if they add a new hidden field with a value, your code JUST WORKS, where as hard coding the fields manually means that might break in that same situation.
Also, to deal with state information, this is why it is best to open a connection to the page directly, read the page data, parse out each of the HTML input fields to get the name/value pairs and THEN post the paired data to try the login. What this means is that if they add a new hidden field with a value, your code JUST WORKS, where as hard coding the fields manually means that might break in that same situation.
I save all my Campaign Files to DropBox. Not only can I access a campaign file from pretty much any OS that will run Maptool(Win,OSX, linux), but each file is versioned, so if something goes crazy wild, I can always roll back to a previous version of the same file.
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
- jfrazierjr
- Deity
- Posts: 5176
- Joined: Tue Sep 11, 2007 7:31 pm
Re: DDi Compendium API and Java Classes?
So that's at least 3 connections:jfrazierjr wrote:As noted, you need to pass in ALL of the form's input parameters just to make sure, even the submit button!!!
Also, to deal with state information, this is why it is best to open a connection to the page directly, read the page data, parse out each of the HTML input fields to get the name/value pairs and THEN post the paired data to try the login. What this means is that if they add a new hidden field with a value, your code JUST WORKS, where as hard coding the fields manually means that might break in that same situation.
First to get the login page and see what it's form fields are
Second to actually send the login data to the DDI login page
Third to actually look up some data about a power, monster, etc
I save all my Campaign Files to DropBox. Not only can I access a campaign file from pretty much any OS that will run Maptool(Win,OSX, linux), but each file is versioned, so if something goes crazy wild, I can always roll back to a previous version of the same file.
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
Get your Dropbox 2GB via my referral link, and as a bonus, I get an extra 250 MB of space. Even if you don't don't use my link, I still enthusiastically recommend Dropbox..
Re: DDi Compendium API and Java Classes?
OK, you're pretty close. Now, after you've loaded the login.aspx page with the appropriate parameters, try loading the power.aspx page. If you've saved the cookies appropriately (I have no idea how to do that in Java), you'll be able to load it.Blakey wrote:Thanks for the responses.
I added URLEncouder.encode() around the keys and suddenly the HTTP 500 error has gone away. I'm now back to getting the standard code for login.aspx. I get that whether I ask for power.aspx or login.aspx in my URL.
Scraping Web pages does feel pretty clumsy. I've been wishing for months that they'd provide a more elegant API, but I doubt it's in the cards.
Re: DDi Compendium API and Java Classes?
jfrazierjr wrote:So that's at least 3 connections:jfrazierjr wrote:As noted, you need to pass in ALL of the form's input parameters just to make sure, even the submit button!!!
Also, to deal with state information, this is why it is best to open a connection to the page directly, read the page data, parse out each of the HTML input fields to get the name/value pairs and THEN post the paired data to try the login. What this means is that if they add a new hidden field with a value, your code JUST WORKS, where as hard coding the fields manually means that might break in that same situation.
First to get the login page and see what it's form fields are
Second to actually send the login data to the DDI login page
Third to actually look up some data about a power, monster, etc
Agreed that your suggestion to trawl for input statements is a very good one. I'll add that in once I get it working. Meantime I'll work off what I read on the 'View Source' page for that.
So, are you saying I need literally 2 different URLConnections, first to the login.aspx which I write out the form data, and then once that write is done, a second connection to the powers.aspx page to read in the power?
Thanks for the help - it's all new to me!
Blakey the Java-nub.
The guy in the green hat.
Re: DDi Compendium API and Java Classes?
https://addons.mozilla.org/en-US/firefox/addon/6647/
This is a godsend for me. Install that extension, disable all others, open the viewer and go to the ddi web site and login like you normally would, you'll be able to see all the information sent and received including cookies and etc.
In my experience, scraping sites that require logins usually go like this:
1. request a page
2. server checks for a cookie (usually an expiring session id) to see if you're logged in
3. since you're not, it responds with the first id in a cookie and a login page in the body
4. you POST back all the required input fields as well as sending the cookie in the headers
5. if you were accepted, the server responds letting you know such and gives you a second session cookie that you must return with each subsequent request.
6. it they're REALLY secure, each response you receive will contain a new one-time-use-cookie that must be included with the next request.
it's been a while since i've done this so that may not be exactly right, but should give you an idea.
http://www.voidspace.org.uk/python/arti ... tion.shtml
This tutorial is written for python, but it does a good job of explaining what's going on and the common types of authentication, so maybe it'll help you or some one else =)
This is a godsend for me. Install that extension, disable all others, open the viewer and go to the ddi web site and login like you normally would, you'll be able to see all the information sent and received including cookies and etc.
In my experience, scraping sites that require logins usually go like this:
1. request a page
2. server checks for a cookie (usually an expiring session id) to see if you're logged in
3. since you're not, it responds with the first id in a cookie and a login page in the body
4. you POST back all the required input fields as well as sending the cookie in the headers
5. if you were accepted, the server responds letting you know such and gives you a second session cookie that you must return with each subsequent request.
6. it they're REALLY secure, each response you receive will contain a new one-time-use-cookie that must be included with the next request.
it's been a while since i've done this so that may not be exactly right, but should give you an idea.
http://www.voidspace.org.uk/python/arti ... tion.shtml
This tutorial is written for python, but it does a good job of explaining what's going on and the common types of authentication, so maybe it'll help you or some one else =)
Re: DDi Compendium API and Java Classes?
Cheers! That Firefox addon has really been helpful - I can now see what is being passed back and forth between the browser and the server. I'll see if I can replicate that in my Java code now.
Will post here how I get on!
Will post here how I get on!
The guy in the green hat.
Re: DDi Compendium API and Java Classes?
I'm starting to wonder if I've bitten off more than I can chew!
No luck at all with this so far. Still plodding along. Thanks for all the help though guys - I'm determined to crack this!
No luck at all with this so far. Still plodding along. Thanks for all the help though guys - I'm determined to crack this!
The guy in the green hat.
Re: DDi Compendium API and Java Classes?
Post what you're seeing in the FF extension and we'll help ya figure it out, just remember to obscure your uid and pass lol =)
Re: DDi Compendium API and Java Classes?
Okay.
I start FF. I clear all my cookies and then type the URL into the field:
http://www.wizards.com/...snip.../power.aspx?id=805 (which is Divine Challenge). That takes me to the login screen (there is a GET tag saying 'page=powers&id=805' so it knows where it's about to redirect me when I get the password right. I type in UID and PW and hit submit and I'm delivered to the powers.aspx?id-805 page.
I get the following cookies and post data:
Is it possible to hard code these cookies in to make it work, just once? I have proved I can do a POST with my own web page as I've said before but I'm getting nowhere with this...
Cheers!
I start FF. I clear all my cookies and then type the URL into the field:
http://www.wizards.com/...snip.../power.aspx?id=805 (which is Divine Challenge). That takes me to the login screen (there is a GET tag saying 'page=powers&id=805' so it knows where it's about to redirect me when I get the password right. I type in UID and PW and hit submit and I'm delivered to the powers.aspx?id-805 page.
I get the following cookies and post data:
Is it possible to hard code these cookies in to make it work, just once? I have proved I can do a POST with my own web page as I've said before but I'm getting nowhere with this...
Cheers!
The guy in the green hat.