How to automatically get the contents of a website using the PHP cURL library?

| | 2 min read

We needed to get the description of a site automatically from its url after parsing the HTML markup, for one of the Drupal sites we were working on. This is a task that can be easily accomplished by using the PHP curl library. Read on to know how to use a php script and curl to download the contents of a website.

Curl is a php library where we access the contents of a site (it can connect and communicate to different types of servers).

My objective was to get the HTML content of a site. Let's see how it can be obtained.


function curl_download($Url){
  // is cURL installed yet?
  if (!function_exists('curl_init')){
    die('Sorry cURL is not installed!');
  }
 
  // OK cool - then let's create a new cURL resource handle
  $ch = curl_init();
 
  // Now set some options (most are optional)
 
  // Set URL to download
  curl_setopt($ch, CURLOPT_URL, $Url);
 
  // User agent
  curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
 
  // Include header in result? (0 = yes, 1 = no)
  curl_setopt($ch, CURLOPT_HEADER, 0);
 
  // Should cURL return or print out the data? (true = retu	rn, false = print)
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
 
  // Timeout in seconds
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
 
  // Download the given URL, and return output
  $output = curl_exec($ch);
 
  // Close the cURL resource, and free system resources
  curl_close($ch);
 
  return $output;
}

Let's take a look at the above function.$ch is a new cURL resource handle. curl_init() is used to initialize the session and returns a cURL handle. We can also set some options by using curl_setopt(). CURLOPT_URL specifies the URL which you want to process.curl_exec() is used to execute a curl function. The url which we want to process is passed to the above function, which results in the output.

Try this out to download the contents of a website.