If you have ever wanted to download a file from an external website from your PHP application, you likely used file_get_contents or cURL.

If the URL your application needs to download is coming from the user, then it’s extremely important that you’re validating the input that goes into file_get_contents() or curl_setopt() is a valid URL. The way you can accomplish this is by using filter_var() and the FILTER_VALIDATE_URL constant.

Let’s create a scenario. You are writing a website that will allow a user to view a website’s HTML by typing in the URL. Using FILTER_VALIDATE_URL and cURL, you create the following code.

<?php

if (!isset($_GET['url'])) {
    die("URL not provided.");
}

$url = $_GET['url'];

if (filter_var($url, FILTER_VALIDATE_URL) === false) {
    die("URL is invalid: " . $url);
}

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$response = curl_exec($ch);
$errno = curl_errno($ch);
curl_close($ch);

if ($errno !== 0) {
    die("An error occurred while retrieving the URL. " . $errno);
}

echo $response;

You’re validating the user’s input is a URL, then downloading the URL and giving the contents to the user. So what’s the problem here? The problem here is the different things a valid URL can be. A valid URL doesn’t have to be HTTP or HTTPS. Here’s a small list of URLs that pass under FILTER_VALIDATE_URL but can still cause harm to your application including, but not limited to, SSRF and local file inclusion.

  • ftp://user:pass@host/file.txt
  • ldap://host/
  • telnet://host/
  • file:///etc/passwd/

If a user hits that script with the parameters ?url=file:///etc/passwd, all your checks will pass, but your application will expose your /etc/passwd file. The user can even request your PHP files, given they know the full path to them. This can lead to exposed credentials and secrets.

Your application also becomes a proxy to access and download files from FTP servers. This can become a hotbed for hackers and malicious users that may want to test a target without it coming from their computer.

Be paranoid. Validate correctly. If you validate the URL begins with the HTTP or HTTPS scheme, the number of malicious things a user can request go down significantly.

<?php
if (filter_var($url, FILTER_VALIDATE_URL) === false || (stripos($url, "http://") !== 0 && stripos($url, "https://") !== 0)) {
    die("URL is invalid: " . $url);
}