matminer.data_retrieval package¶
Submodules¶
matminer.data_retrieval.retrieve_Citrine module¶
-
class
matminer.data_retrieval.retrieve_Citrine.
CitrineDataRetrieval
(api_key=None)¶ Bases:
object
CitrineDataRetrieval is used to retrieve data from the Citrination database. See API client docs at http://citrineinformatics.github.io/api-documentation/
-
__init__
(api_key=None)¶ - Args:
- api_key: (str) Your Citrine API key, or None if
- you’ve set the CITRINE_KEY environment variable
Returns: None
-
get_api_data
(formula=None, prop=None, data_type=None, reference=None, min_measurement=None, max_measurement=None, from_record=None, data_set_id=None, max_results=None)¶ Gets raw api data from Citrine in json format. See client docs at http://citrineinformatics.github.io/api-documentation/ for more details on these parameters.
- Args:
- formula: (str) filter for the chemical formula field; only those
- results that have chemical formulas that contain this string will be returned
prop: (str) name of the property to search for data_type: (str) ‘EXPERIMENTAL’/’COMPUTATIONAL’/’MACHINE_LEARNING’;
filter for properties obtained from experimental work, computational methods, or machine learning.- reference: (str) filter for the reference field; only those
- results that have contributors that contain this string will be returned
min_measurement: (str/num) minimum of the property value range max_measurement: (str/num) maximum of the property value range from_record: (int) index of first record to return (indexed from 0) data_set_id: (int) id of the particular data set to search on max_results: (int) number of records to limit the results to
Returns: (list) of jsons/pifs returned by Citrine’s API
-
get_dataframe
(formula=None, prop=None, data_type=None, reference=None, min_measurement=None, max_measurement=None, from_record=None, data_set_id=None, max_results=None, show_columns=None)¶ Gets a Pandas dataframe object from data retrieved from the Citrine API. See client docs at http://citrineinformatics.github.io/api-documentation/ for more details on input parameters.
- Args:
- formula: (str) filter for the chemical formula field; only those
- results that have chemical formulas that contain this string will be returned
prop: (str) name of the property to search for data_type: (str) ‘EXPERIMENTAL’/’COMPUTATIONAL’/’MACHINE_LEARNING’;
filter for properties obtained from experimental work, computational methods, or machine learning.- reference: (str) filter for the reference field; only those
- results that have contributors that contain this string will be returned
min_measurement: (str/num) minimum of the property value range max_measurement: (str/num) maximum of the property value range from_record: (int) index of first record to return (indexed from 0) data_set_id: (int) id of the particular data set to search on max_results: (int) number of records to limit the results to show_columns: (list) list of columns to show from the
resulting dataframe
Returns: (object) Pandas dataframe object containing the results
-
-
matminer.data_retrieval.retrieve_Citrine.
get_value
(dict_item)¶
-
matminer.data_retrieval.retrieve_Citrine.
parse_scalars
(scalars)¶
matminer.data_retrieval.retrieve_MDF module¶
-
class
matminer.data_retrieval.retrieve_MDF.
MDFDataRetrieval
(anonymous=False, **kwargs)¶ Bases:
object
MDFDataRetrieval is used to retrieve data from the Materials Data Facility database and convert them into a Pandas dataframe. Note that invocation with full access to MDF will require authentication via https://materialsdatafacility.org/, but an anonymous mode is supported, which can be used with anonymous=True as a keyword arg.
- Examples:
>>>mdf_dr = MDFDataRetrieval(anonymous=True) >>>results = mdf_dr.get_dataframe(elements=[“Ag”, “Be”], sources=[“oqmd”])
>>>results = mdf_dr.get_dataframe(sources=[‘oqmd’], >>> match_ranges={“oqmd.band_gap.value”: [4.0, “*”]})
-
__init__
(anonymous=False, **kwargs)¶ - Args:
- anonymous (bool): whether to use anonymous login (i. e. no
- globus authentication)
- **kwargs: kwargs for Forge, including index (globus search index
- to search on), local_ep, anonymous
-
get_dataframe
(sources=None, elements=None, titles=None, tags=None, resource_types=None, match_fields=None, exclude_fields=None, match_ranges=None, exclude_ranges=None, unwind_arrays=True)¶ Retrieves data from the MDF API and formats it as a Pandas Dataframe
- Args:
sources ([str]): source names to include, e. g. [“oqmd”] elements ([str]): elements to include, e. g. [“Ag”, “Si”] titles ([str]): titles to include, e. g. [“Coarsening of a semisolid
Al-Cu alloy”]tags ([str]): tags to include, e. g. [“outcar”] resource_types ([str]): resources to include, e. g. [“record”] match_fields ({}): field-value mappings to include, e. g.
{“oqdm.converged”: True}- exclude_fields ({}): field-value mappings to exclude, e. g.
- {“oqdm.converged”: False}
- match_ranges ({}): field-range mappings to include, e. g.
- {“oqdm.band_gap.value”: [1, 5]}, use “*” for no lower or upper bound, e. g. {“oqdm.band_gap.value”: [1, “*”]},
- exclude_ranges ({}): field-range mapping to exclude,
- {“oqdm.band_gap.value”: [3, “*”]} to exclude all results with band gap higher than 3.
- raw (bool): whether or not to return raw (non-dataframe)
- output, defaults to False
- unwind_arrays (bool): whether or not to unwind arrays in
- flattening docs for dataframe
- Returns:
- DataFrame corresponding to all documents from aggregated query
-
get_dataframe_by_query
(query, unwind_arrays=True, **kwargs)¶ Gets a dataframe from the MDF API from an explicit string query (rather than input args like get_dataframe).
- Args:
query (str): String for explicit query unwind_arrays (bool): whether or not to unwind arrays in
flattening docs for dataframe**kwargs: kwargs for query
- Returns:
- dataframe corresponding to query
-
matminer.data_retrieval.retrieve_MDF.
make_dataframe
(docs, unwind_arrays=True)¶ Formats raw docs returned from MDF API search into a dataframe
- Args:
- docs [{}]: list of documents from forge search
- or aggregation
Returns: DataFrame corresponding to formatted docs
matminer.data_retrieval.retrieve_MP module¶
-
class
matminer.data_retrieval.retrieve_MP.
MPDataRetrieval
(api_key=None)¶ Bases:
object
MPDataRetrieval is used to retrieve data from the Materials Project database, print the results, and convert them into an indexed Pandas dataframe.
-
__init__
(api_key=None)¶ - Args:
- api_key: (str) Your Materials Project API key, or None if you’ve
- set up your pymatgen config.
-
get_dataframe
(criteria, properties, mp_decode=False, index_mpid=True)¶ Gets data from MP in a dataframe format. See API docs at https://materialsproject.org/wiki/index.php/The_Materials_API for more details.
- Args:
- criteria: (str/dict) see MPRester.query() for a description of this
- parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
- properties: (list) see MPRester.query() for a description of this
- parameter. Example: [“formula”, “formation_energy_per_atom”]
- mp_decode: (bool) see MPRester.query() for a description of this
- parameter. Whether to decode to a Pymatgen object where possible.
- index_mpid: (bool) Whether to set the materials_id as the dataframe
- index.
Returns: A pandas Dataframe object
-
matminer.data_retrieval.retrieve_MPDS module¶
matminer.data_retrieval.retrieve_MongoDB module¶
-
class
matminer.data_retrieval.retrieve_MongoDB.
MongoDataRetrieval
(coll)¶ Bases:
object
-
__init__
(coll)¶ Tool to retrieve data from a MongoDB collection and put into a pandas Dataframe object
- Args:
- coll: A MongoDB collection object
-
get_dataframe
(projection, query=None, sort=None, limit=None, idx_field=None, strict=False)¶ - Args:
- projection: (list) - a list of str fields to retrieve; dot-notation is
- allowed. Set to “None” to try to auto-detect the fields.
query: (JSON) - a pymongo-style query to filter data records sort: (tuple) - pymongo-style sort option limit: (int) - max number of entries idx_field: (str) - name of field to use as index (must have unique
entries)strict: (bool) - if False, replaces missing values with NaN
-
-
matminer.data_retrieval.retrieve_MongoDB.
clean_projection
(projection)¶ Projecting on e.g. ‘a.b.’ and ‘a’ is disallowed in MongoDb, so project inclusively. See unit tests for examples of what this is doing. Args:
projection: (list) - list of fields to retrieve; dot-notation is allowed.
-
matminer.data_retrieval.retrieve_MongoDB.
is_int
(x)¶
-
matminer.data_retrieval.retrieve_MongoDB.
remove_ints
(projection)¶ Transforms a string like “a.1.x” to “a.x” - for Mongo projection purposes Args:
projection: (str) the projection to remove ints fromReturns: