The dangers of only relying on FILTER_VALIDATE_URL when downloading URLs
If you have ever wanted to download a file from an external website from your PHP application, you likely used file_get_contents or cURL.
If the URL your application needs to download is coming from the user, then it’s extremely important that you’re validating the input that goes into file_get_contents() or curl_setopt() is a valid URL. The way you can accomplish this is by using filter_var() and the FILTER_VALIDATE_URL constant.
Let’s create a scenario. You are writing a website that will allow a user to view a website’s HTML by typing in the URL. Using FILTER_VALIDATE_URL and cURL, you create the following code.
You’re validating the user’s input is a URL, then downloading the URL and giving the contents to the user. So what’s the problem here? The problem here is the different things a valid URL can be. A valid URL doesn’t have to be HTTP or HTTPS. Here’s a small list of URLs that pass under FILTER_VALIDATE_URL but can still cause harm to your application including, but not limited to, SSRF and local file inclusion.
- ftp://user:pass@host/file.txt
- ldap://host/
- telnet://host/
- file:///etc/passwd/
If a user hits that script with the parameters ?url=file:///etc/passwd
, all your checks will pass, but your application will expose your /etc/passwd file. The user can even request your PHP files, given they know the full path to them. This can lead to exposed credentials and secrets.
Your application also becomes a proxy to access and download files from FTP servers. This can become a hotbed for hackers and malicious users that may want to test a target without it coming from their computer.
Be paranoid. Validate correctly. If you validate the URL begins with the HTTP or HTTPS scheme, the number of malicious things a user can request go down significantly.