Http Files¶
The Http connector allows to read files on the web using either http or https protocol. Files should be small enough to fit memory. The connector can parse either text files, CSV files or HTML files. For the later, one should provide a xpath to locate each column (more precisely a jsoup selector, see here or more information).
Http files can’t be used for output.
Parameters on the datastore element are the following :
Parameters |
Details |
---|---|
name |
how you want this data store be refered as |
type |
should be http |
Parameters on the table element are the following :
Parameters |
Details |
---|---|
name |
how you want this table be refered as |
location |
Web location for the file |
format |
should be text, csv or html |
csvSeparator |
(CSV only), which cell separator to use (default “,”) |
csvQuote |
(CSV only), how to quote cells (default “, escaped with “”“) |
csvHeader |
(CSV only), is there an header: true (default) or false |
For the html format, there is also a new parameter for the underlying column elements.
Parameters |
Details |
---|---|
path |
how to finds the cells for the column, use JSOUP selectors |
Examples¶
<datastore name="web" type="http">
<!-- Example reading a csv file located on the web -->
<table name="earthquake" location="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv" format="csv" csvHeader="true">
<column name="time" type="datetime" temporalFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>
<column name="latitude" type="numeric"/>
<column name="longitude" type="numeric"/>
<column name="depth" type="numeric"/>
<column name="mag" type="numeric"/>
<column name="magType" type="text"/>
<column name="nst" type="numeric"/>
<column name="gap" type="numeric"/>
<column name="dmin" type="numeric"/>
<column name="rms" type="numeric"/>
<column name="net" type="text"/>
<column name="id" type="text"/>
<column name="update" type="datetime" temporalFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>
<column name="place" type="text"/>
<column name="type" type="text"/>
<column name="horizontalError" type="numeric"/>
<column name="depthError" type="numeric"/>
<column name="magError" type="numeric"/>
<column name="magNst" type="numeric"/>
<column name="status" type="text"/>
<column name="locationSource" type="text"/>
<column name="magSource" type="text"/>
</table>
<!-- Example reading a text file located on the web -->
<table name="iso_8859_1" location="https://www.w3.org/TR/PNG/iso_8859-1.txt" format="text">
<column name="row" type="bigtext"/>
</table>
<!-- Example reading a web page and extracting two point of data -->
<table name="spy" location="https://finance.yahoo.com/quote/SP?p=SP"
format="html">
<column name="last_close" type="numeric" path="td[data-test=PREV_CLOSE-value]"/>
<column name="pe_ratio" type="numeric" path="td[data-test=PE_RATIO-value]"/>
</table>
</datastore>