CS 325 Assignment #2
Fetching Files from an HTTP Server
Due September 22nd by 11:59pm
(Not accepted after 9/24)
Introduction:
This assignment will give you first-hand experience with the application layer of the network stack, and its interface with the layer below. You'll write, in Java, a simple HTTP client that fetches a single document from an HTTP server and writes it to a file. The hostname, document path, and filename will all be passed as command-line arguments. Your program is responsible for opening the TCP connection, sending a well-formatted HTTP request, and retrieving the data payload from the response. You'll therefore be using an application-layer protocol (HTTP), and also becoming more familiar with the interface between the application layer and the services offered by the layers below.
Specifics:
- For full credit, your program should be able to retrieve both text-based responses (HTML pages) and binary data (jpg, png, etc) correctly. You do not need to make any attempt to determine which kind of data is being returned — a program that can properly fetch binary data files will also work properly on text.
- Make sure you don't write the HTTP header data to the output file. Watch for the separator between the header and the data, and only write the content following the separator. (This is particularly important if you want binary files like images to be stored properly.)
- Your program should be able to fetch files of any size. In particular, do not assume that a single read from the socket will produce the entire response from the server. Keep any buffers (byte arrays) that you use under 500 bytes in length.
- Since we're only fetching a single file, there's no point in holding the TCP connection open. For full credit, format your request such that it tells the HTTP server to close the connection after the transfer.
- All exceptions must be caught by your code and reported via an appropriate error message. Error-check any system calls as well (e.g. make sure that a read from the socket actually returned data, etc).
- Style is important! Don't put all of your code in main. Document your work properly. Use constants where they make sense. Check for proper inputs from the user before proceeding, etc.
Sample Output
Here's some sample output from runs of my program. (My Java class is called Fetch
.) I'm interacting with it on the command line to make it easier to demonstrate how it responds. You're welcome to run and test yours from within an IDE, though you'll need to edit the Run Configuration (if using eclipse) so that arguments are passed to your program when it runs.
The first run of my program shows how it behaves if it doesn't get the three inputs it's expecting: It reminds the user about the arguments it's expecting. I then fetch the short HTML page from the first wireshark lab. (The cat
command displays the contents of a file.) I then download a .jpg file and use ls
to show its size. Yours should also be 23,556 bytes long, and should display properly when opened in your favorite .jpg viewer. Finally, I show what happens if I try to reach a host that doesn't exist, and if I request a file that doesn't exist.
brichards[100] : javac Fetch.java
brichards[101] : java Fetch
Usage: <hostname> <resource> <filename>
brichards[102] : java Fetch gaia.cs.umass.edu /wireshark-labs/INTRO-wireshark-file1.html message.html
Grabbing /wireshark-labs/INTRO-wireshark-file1.html from gaia.cs.umass.edu
Writing data to /Users/brichards/message.html
brichards[103] : cat message.html
<html>
Congratulations! You've downloaded the first Wireshark lab file!
</html>
brichards[104] : java Fetch cs.pugetsound.edu /~brichards/sound.jpg image.jpg
Grabbing /~brichards/sound.jpg from cs.pugetsound.edu
Writing data to /Users/brichards/image.jpg
brichards[105] : ls -l image.jpg
-rw-r--r-- 1 brichards staff 23556 Sep 10 12:36 image.jpg
brichards[106] : java Fetch brad.pugetsound.edu /index.html output.txt
Grabbing /index.html from brad.pugetsound.edu
Error: Host brad.pugetsound.edu is unreachable
brichards[107] : java Fetch cs.pugetsound.edu /~brichards/bogus.html output.html
Grabbing /~brichards/bogus.html from cs.pugetsound.edu
Writing data to /Users/brichards/output.html
brichards[108] : cat output.html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>
Hints & Tips:
- Don't forget that there's some sample code online that provides a good starting point for this assignment.
- Don't worry about getting it working with binary data until you've gotten it working with text (HTML). It's a lot easier to debug your code when working with text!
- You might consider writing the entire response to the file at first, without trying to look for the boundary between the HTTP header and the data. Make sure you can get that right before updating your code to deal with the header.
- Consider using Wireshark as a debugging tool if necessary! It can show you the details of the packets you're building and sending (and the server's response) if you have it recording when you run your program.
Submitting:
Submit your .java
file (and only your .java
file) via Canvas.
Brad Richards, 2023