Welcome to the help section of promptcloud data api


Walkthrough


Click this sample link to see how the API looks and how the files would be listed. By default it shows files uploaded in last 2 days
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY
                
In order to check all data uploaded in last 3 days, append the above links with &days=3.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&days=3
                
In order to check all data uploaded in last 1 hours, append the above links with &hours=1.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&hours=1
                
In order to check all data uploaded in last 10 minutes, append the above links with &minutes=10.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&minutes=10
                
In order to check all data uploaded so far, append the base link with &ts=0.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0
                
To get API response in json, pretty json and ldjson(line delimited json), append the api link with &api_res_type=json, &api_res_type=pretty_json and &api_res_type=ldjson respectively, default is xml.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&api_res_type=json
		
http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&api_res_type=pretty_json
http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&api_res_type=ldjson
http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&api_res_type=csv
To get data of a particular site(s)(let say xyz,abc), append the api link with &site=xyz,abc.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&site=xyz,abc
                
To get data of a particular Folder(s)(let say pqr,abc), append the api link with &folder=pqr,abc.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&folder=pqr,abc
                
To get data of a particular category(let say demo), append the api link with &cat=demo.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&cat=demo
                
To get record count for each segment, append the api link with &count=true.
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&ts=0&count=true
                


All the parameters which you can pass to our api to fine the results are below
                  id: <PROJECT_ID> (which we will supply)
		  
		  client_auth_key: <CLIENT_AUTHENTICATION_KEY> (which we will supply)

		  from_ts: timestamp From which timestamp you want to get data.

		  to_ts: timestamp To which timestamp you want to get data.

                  ts: timestamp gives files newer than or same as this.
		  Similar as from_ts. We recommend you to use from_ts because in future we will be removing this field

		  minutes: number of minutes for which file listing is to be displayed integer
		  
		  hours: number of hours for which file listing is to be displayed integer
		  
		  days: number of days for which file listing is to be displayed integer
                  
                  from_date: date in yyyymmdd format list all files from this date onwards

                  to_date: date in yyyymmdd format list all the files till this date

                  Priority: ts > (minutes = hours = days) > (from_date = to_date). 
		  If we do not pass any of those, then last 2 days data will be displayed.

		  api_res_type: type of the API response xml, json, pretty_json, ldjson and csv (default type is xml)

                  site: exact site name(s) gives files for the site(s)
		  
		  cat: category gives files for the category

		  folder: exact folder name(s) gives files for the folder(s)

                  count: true gives record count of each segment
                
When you query api.promptcloud.com you get entries like
  http://datav2.promptcloud.com/data/client_related_key/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz
                

If you get data for multiple categories say blogs, forums, news then category is part of the url after . For example it could look like
   http://datav2.promptcloud.com/data/client_related_key/blog/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz
		  
http://datav2.promptcloud.com/data/client_related_key/news/fixed_site_name_deduped_n-20130226_2503711328087_20130226042503.xml.gz

It is explained this way,

  • Part after The blue part ("client_related_key") is the actual filename
  • The grey part always remains same. So say if you are crawling blog.promptcloud.com, name of the files from this site could be blog_promptcloud_data_client_deduped_n.
  • The green part is the crawl date of the server in yyyymmdd format. Things after yyyymmdd are internal Promptcloud modifiers and don't have any specific meaning.
  • The orange part is for categories.
Trick is to pass the highest updated time you got after the last call to the API back to the API in the next call. That way it will list all the files which were uploaded after the given timestamp. You may also use the our rubygem listed below.
We have Ruby gem, Python client as well as Java client for programmable access to our API. Please find below the links.
Ruby gem
   https://rubygems.org/gems/promptcloud_data_api
    		

Java client
   https://github.com/promptcloud/promptcloud-data-api-java/releases
                

Python client
   https://github.com/promptcloud/promptcloud_data_api_python
                
You can access our second high availability server(bcp server).
These are the corresponding bcp server url:
   https://api-bcp.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY
                
We encourage our clients to use http://api.promptcloud.com and fallback to bcp server temporarily in case of unavailability of http://api.promptcloud.com.

FAQ


A. You should have received id and authentication key from Promptcloud. If not then please login to https://app.promptcloud.com and go to the Auth key section to find/reset these
   http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY
                
Now it gives list of files. You can download these files thy way you'd download any file from internet using the language bindings of your development environment . Using standard tools like curl and wget you can use following
    wget  URL  
                  Or
		
curl URL
A. Yes This by default gives data for last 2 days. There are other parameters to fine tune. They are ts, days , from_date and to_date to specify time and cat , site to limit to a particular category or site. With combination of these you'll be able to get all data.
From example :
  http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&days=10&cat=cat_name
                
URL above will give files uploaded in last 10 days for the client with id=demo and in category cat_name.
  http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&from_date=20140201&site=yahoo
                
URL above will give files uploaded from date 20140201(yyyymmdd) for the client with id=demo and in site name matching yahoo.

ts: unix timestamp gives files newer than or same as this, defaults to the timestamp 2 days back. If you give ts=0 then all the files uploaded till now are listed (but this operation is slow).
  http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&from_date=20140201&ts=0
                
URL above will give all files uploaded so far.
A. That is because by default our api gives data for only last 2 days. If simplest way to get previously uploaded data is pass days params.
From example :
  http://api.promptcloud.com/v2/data/info?id=demo&client_auth_key=hR6-MaPUHo_1Gze8FQJDx8gp_v7DvgWT-r-7p9yO8zY&days=10
                
A. This is a compressed file format. On Unix/Linux like systems you can use gunzip command to decompress the data. On Windows based systems 7-zip should work. On Mac just double clicking on this file should work. Please see this wikihow link for more detail.
A. We have a 24x7 fallback for the server. Please use
   http://api-bcp.promptcloud.com
                
in place of
   http://api.promptcloud.com
                
if the main server is no accessible for some reason.
A. We have Ruby gem, Python client as well as Java client for programmable access to our API. Please find below the links.
Ruby gem
   https://rubygems.org/gems/promptcloud_data_api
                

Java client
   https://github.com/promptcloud/promptcloud-data-api-java/releases
		

Python client
   https://github.com/promptcloud/promptcloud_data_api_python
                
A. Please use cat parameter or site .