Welcome to the help section of promptcloud data api


Walkthrough


Check this sample link to see how the API looks and how the files would be listed.
   http://api.promptcloud.com/data/info?id=demo
                
In order to check all data uploaded in last 10 minutes, append the above links with &minutes=10.
   http://api.promptcloud.com/data/info?id=demo&minutes=10
                
In order to check all data uploaded in last 1 hours, append the above links with &hours=1.
   http://api.promptcloud.com/data/info?id=demo&hours=1
                
In order to check all data uploaded in last 3 days, append the above links with &days=3.
   http://api.promptcloud.com/data/info?id=demo&days=3
                
In order to check all data uploaded so far, append the base link with &ts=0.
   http://api.promptcloud.com/data/info?id=demo&ts=0
                
To see when each file was uploaded, append the above links with &ht=true for human readable time.
   http://api.promptcloud.com/data/info?id=demo&ts=0&ht=true
                
To get API response in json, pretty json, ldjson(line delimited json) and csv, append the api link with &api_res_type=json, &api_res_type=pretty_json, &api_res_type=ldjson and &api_res_type=csv respectively, default is xml.
   http://api.promptcloud.com/data/info?id=demo&ts=0&api_res_type=json
		
http://api.promptcloud.com/data/info?id=demo&ts=0&api_res_type=pretty_json
http://api.promptcloud.com/data/info?id=demo&ts=0&api_res_type=ldjson
http://api.promptcloud.com/data/info?id=demo&ts=0&api_res_type=csv
To get data of a particular site(s)(let say xyz,abc), append the api link with &site=xyz,abc.
   http://api.promptcloud.com/data/info?id=demo&ts=0&site=xyz,abc
                
To get data of a particular Folder(s)(let say pqr,abc), append the api link with &folder=pqr,abc.
   http://api.promptcloud.com/data/info?id=demo&ts=0&folder=pqr,abc
                
To get data of a particular category(let say demo), append the api link with &cat=demo.
   http://api.promptcloud.com/data/info?id=demo&ts=0&cat=demo
                
To get record count for each segment, append the api link with &count=true.
   http://api.promptcloud.com/data/info?id=demo&ts=0&count=true
                


All the parameters which you can pass to our api to fine the results are below
                  id: <PROJECT_ID> (which we will supply)

                  ts: timestamp gives files newer than or same as this.
		  If you give ts=0 then all the files uploaded till now are listed (but this operation is slow).

		  minutes: number of minutes for which file listing is to be displayed integer
		  
		  hours: number of hours for which file listing is to be displayed integer
		  
		  days: number of days for which file listing is to be displayed integer
                  
                  from_date: date in yyyymmdd format list all files from this date onwards

                  to_date: date in yyyymmdd format list all the files till this date

                  Priority: ts > (minutes = hours = days) > (from_date = to_date). 
		  If we do not pass any of those, then last 2 days data will be displayed.

		  api_res_type: type of the API response xml, json, pretty_json, ldjson and csv (default type is xml)

                  site: exact site name(s) gives files for the site(s)
		  
		  cat: category gives files for the category
		 
		  folder: exact folder name(s) gives files for the folder(s)

		  count: true gives record count of each segment
                
When you query api.promptcloud.com you get entries like
  http://data.promptcloud.com/data/client_related_key/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz
                

If you get data for multiple categories say blogs, forums, news then category is part of the url after . For example it could look like
   http://data.promptcloud.com/data/client_related_key/blog/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz
		  
http://data.promptcloud.com/data/client_related_key/news/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz

It is explained this way,

  • Part after The blue part ("client_related_key") is the actual filename
  • The grey part always remains same. So say if you are crawling blog.promptcloud.com, name of the files from this site could be blog_promptcloud_data_client_deduped_n.
  • The green part is the crawl date of the server in yyyymmdd format. Things after yyyymmdd are internal Promptcloud modifiers and don't have any specific meaning.
  • The orange part is for categories.
Trick is to pass the highest updated time you got after the last call to the API back to the API in the next call. That way it will list all the files which were uploaded after the given timestamp. You may also use the our rubygem listed below.
We also have Ruby gem, Java and Python client for programmable access to our API. Please find below links.
Ruby gem
   https://rubygems.org/gems/promptcloud_data_api
		

Java client
   https://github.com/promptcloud/promptcloud-data-api-java/releases
                

Python client
   https://github.com/promptcloud/promptcloud_data_api_python
                
You can access our second high availability server(bcp server).
These are the corresponding bcp server url:
   http://api-bcp.promptcloud.com/data/info?id=demo
                
We encourage our clients to use http://api.promptcloud.com and fallback to bcp server temporarily in case of unavailability of http://api.promptcloud.com.

FAQ


A. You should have received the api id from Promptcloud. Please use that id in place of demo below.
   http://api.promptcloud.com/data/info?id=demo
                
Now it gives list of files. To access each of those files you need to pass user name and password which has been separately supplied. We use basic http auth for authentication , please enable that in your http client. In standard tools like curl and wget you can use following
    wget --user --password URL  
                  Or
		
curl --user name:password URL
A. Yes This by default gives data for last 2 days. There are other parameters to fine tune. They are ts, days , from_date and to_date to specify time and cat , site to limit to a particular category or site. With combination of these you'll be able to get all data.
From example :
  http://api.promptcloud.com/data/info?id=demo&days=10&cat=cat_name
                
URL above will give files uploaded in last 10 days for the client with id=demo and in category cat_name.
  http://api.promptcloud.com/data/info?id=demo&from_date=20140201&site=yahoo
                
URL above will give files uploaded from date 20140201(yyyymmdd) for the client with id=demo and in site name matching yahoo.

ts: unix timestamp gives files newer than or same as this, defaults to the timestamp 2 days back. If you give ts=0 then all the files uploaded till now are listed (but this operation is slow).
  http://api.promptcloud.com/data/info?id=demo&ts=0
                
URL above will give all files uploaded so far.
A. That is because by default our api gives data for only last 2 days. If simplest way to get previously uploaded data is pass days params.
From example :
  http://api.promptcloud.com/data/info?id=demo&days=10
                
A. That is because the actual data is password protected. You should have received password from promptcloud team. If not please send a mail to support . Also please note that the api password is not same as the ticketing system password which you might have received earlier :)
A. This is a compressed file format. On Unix/Linux like systems you can use gunzip command to decompress the data. On Windows based systems 7-zip should work. On Mac just double clicking on this file should work. Please see this wikihow link for more detail.
A. We have a 24x7 fallback for the server. Please use
   http://api-bcp.promptcloud.com
                
in place of
   http://api.promptcloud.com
                
if the main server is no accessible for some reason.
A. Please use cat parameter or site .