Changes
#109 (Feb 21, 2022, 7:42:18 PM)
- Fixed build due to moved packages for annotations — Claus Stadler / githubweb
- Fixed a broken test case by switching to RDFDataMgrEx.readAsGiven that leaves relative IRIs of the test data untouched. — Claus Stadler / githubweb
#108 (Jan 26, 2022, 11:08:50 AM)
#107 (Jan 26, 2022, 11:07:32 AM)
#106 (Jan 26, 2022, 10:02:58 AM)
#105 (Jan 21, 2022, 9:26:28 PM)
#104 (Jan 21, 2022, 9:25:21 PM)
#103 (Jan 21, 2022, 9:14:35 PM)
#102 (Jan 21, 2022, 9:14:16 PM)
#101 (Jan 21, 2022, 9:13:40 PM)
#100 (Jan 21, 2022, 9:10:51 PM)
#99 (Jan 21, 2022, 9:10:18 PM)
#98 (Jan 21, 2022, 9:07:02 PM)
#97 (Jan 21, 2022, 9:05:31 PM)
#96 (Jan 21, 2022, 9:03:21 PM)
#95 (Jan 21, 2022, 9:02:59 PM)
#94 (Jan 21, 2022, 9:01:17 PM)
#93 (Jan 21, 2022, 9:00:30 PM)
#92 (Jan 21, 2022, 8:54:53 PM)
#91 (Jan 21, 2022, 8:48:48 PM)
#90 (Jan 21, 2022, 8:47:48 PM)
#89 (Jan 21, 2022, 6:21:46 PM)
- reorganise docs — carsten.draschner / githubweb
#88 (Jan 21, 2022, 6:11:35 PM)
- publication information moved to subchapter — carsten.draschner / githubweb
#87 (Jan 21, 2022, 6:01:55 PM)
- rename file — carsten.draschner / githubweb
#86 (Jan 21, 2022, 5:56:49 PM)
- rename structure — carsten.draschner / githubweb
#85 (Jan 21, 2022, 5:50:47 PM)
- structure — carsten.draschner / githubweb
#84 (Jan 21, 2022, 5:43:44 PM)
- documentation structure — carsten.draschner / githubweb
#83 (Jan 21, 2022, 5:25:35 PM)
- Bump to jena 4.3.1 (probably there are or will be separate spark/hadoop releases with log4j 2.15.0) — Claus Stadler / githubweb
- bump to jena 4.3.2 — Claus Stadler / githubweb
- started with new documentation for github pages of sansa ml — carsten.draschner / githubweb
#82 (Dec 10, 2021, 7:11:28 PM)
- bump to jena 4.3.0 — Claus Stadler / githubweb
#81 (Dec 10, 2021, 3:05:07 PM)
#80 (Dec 10, 2021, 2:59:06 PM)
#79 (Dec 9, 2021, 7:56:42 PM)
- Fixed missing start patterns in RecordReaderRdfTrigQuad — Claus Stadler / githubweb
#78 (Dec 9, 2021, 7:14:52 PM)
- Fixes for the async parser (somewhat hacky though) — Claus Stadler / githubweb
#77 (Dec 9, 2021, 1:09:05 PM)
#76 (Dec 9, 2021, 1:08:53 PM)
#75 (Dec 9, 2021, 1:07:40 PM)
- Added a comment to AnnotationMapperTests — Claus Stadler / githubweb
#73 (Dec 8, 2021, 8:14:30 PM)
- better usage of verbose option — carsten.draschner / githubweb
- put all println into verbose clause — carsten.draschner / githubweb
- more println into verbose optional handling — carsten.draschner / githubweb
- semantification clean up — carsten.draschner / githubweb
#72 (Dec 8, 2021, 3:27:17 PM)
- Fixed build — Claus Stadler / githubweb
#71 (Dec 7, 2021, 2:56:20 PM)
- introduce option to filter also by predicate — carsten.draschner / githubweb
- filter by preicate and fix bug for features with multiple different types — carsten.draschner / githubweb
- started with evaluation classes on top of DBpedia — carsten.draschner / githubweb
- add handling for lists of doubles as features over mean — carsten.draschner / githubweb
- add handling for lists of timestamp over unix time cast as features over mean — carsten.draschner / githubweb
- added more literals to cover lists of double and lists of timestamp — carsten.draschner / githubweb
- add comment — carsten.draschner / githubweb
- do not collapse lists of timestamps by default — carsten.draschner / githubweb
- calculate mean of normalized distaances as similarity score for lists of double and unix timestamp casted lists/arrays — carsten.draschner / githubweb
- added more data to check another new similairity score case — carsten.draschner / githubweb
- offer verbos mode — carsten.draschner / githubweb
- set verbos mode — carsten.draschner / githubweb
- cache in sfe — carsten.draschner / githubweb
- execute availability evaluation on lmdb to show distribution — carsten.draschner / githubweb
#70 (Dec 6, 2021, 3:14:37 PM)
- Added hadoop-based async parser — Claus Stadler / githubweb
#69 (Nov 24, 2021, 9:09:50 PM)
- start drafting classes for new sim feature — carsten.draschner / githubweb
- first distsim on lmdb with minhash betweeen movies for promising candidates — carsten.draschner / githubweb
- play around with different feature extractors and further use DistRDF2ML modules for feature extraction — carsten.draschner / githubweb
- start wih pivot nbased feature extracting transformer — carsten.draschner / githubweb
- automatic cast dataframe to correpsonding litreal type and split features if needed by their respective datatype — carsten.draschner / githubweb
- bring all information to transform — carsten.draschner / githubweb
- make components more compact and broader documentation — carsten.draschner / githubweb
- align design to be transformer conform — carsten.draschner / githubweb
- integrate smart feature extrator into novel dasim pipeline — carsten.draschner / githubweb
- cast numeric values to doubles — carsten.draschner / githubweb
- offer first unit test for smartFeatureExtractor — carsten.draschner / githubweb
- play a bit around and structure — carsten.draschner / githubweb
- also show schema — carsten.draschner / githubweb
- started handling of different feature types — carsten.draschner / githubweb
- started soe playground for word2vec in spark — carsten.draschner / githubweb
- clean up — carsten.draschner / githubweb
- handling of categorical strings transformed over hashing and IDF (Information Content) weightning — carsten.draschner / githubweb
- string column hanfling by default pipleline as current fallback non implemented word2Vec — carsten.draschner / githubweb
- calculate similarity values and join those into one df s.t. we can later aggregate those — carsten.draschner / githubweb
- introducing option to norm similarity columns and to weight by importance (for the start) — carsten.draschner / githubweb
- introduce all weightning factors and finally aggregate similarity score weighted over all features and so on — carsten.draschner / githubweb
- reduce distsim dataframe so only unique pairs stay — carsten.draschner / githubweb
- first data gathering for semantification — carsten.draschner / githubweb
- outline todos for semantification — carsten.draschner / githubweb
- - better handling of aggregation of overall similarity value while preserving initial sim values — carsten.draschner / githubweb
- - semantification of similarity results — carsten.draschner / githubweb
- offer verbose mode — carsten.draschner / githubweb
- started with refactoring — carsten.draschner / githubweb
- bse class for trying our refactored dasim which might become basis for unit tests — carsten.draschner / githubweb
- - start gather candidate pairs with distsim — carsten.draschner / githubweb
- optimized speed of uniwue cadidates by usage of df functionalities instead of dataset options — carsten.draschner / githubweb
- - norm scale similarity score — carsten.draschner / githubweb
- removed lines of code now refactored into norm method — carsten.draschner / githubweb
- print lines for automatic retreived weighting maps — carsten.draschner / githubweb
- - offer setters for hyperparameters — carsten.draschner / githubweb
- usage of parameters in eval script — carsten.draschner / githubweb
- scala docs for DaSimEstimator — carsten.draschner / githubweb
- wrote first dasim estimator unit test to more easy call it — carsten.draschner / githubweb
- new small dataset which can be used for unit test dor dasim unit test — carsten.draschner / githubweb
- include opportunity for semantification. fixed a bug causing from too much python like slicing — carsten.draschner / githubweb
- call semantification within test — carsten.draschner / githubweb
- rremoved debug println — carsten.draschner / githubweb
- sample file and tiny changes — carsten.draschner / githubweb
- introduce fast minahsh gather candidate approach as alternative to distSim, if only partial weighning are given, fill other with zero but show message — carsten.draschner / githubweb
- play around with lmdb — carsten.draschner / githubweb
- ofer fast gather candidate option as laternative to distsim — carsten.draschner / githubweb
- more playground for parameters — carsten.draschner / githubweb
- more size prints — carsten.draschner / githubweb
- adjust considered features — carsten.draschner / githubweb
- a class which offers a comparison of gather seeds wither by object filter or by sparqlframe. to compare pros and cons — carsten.draschner / githubweb
- extended dasim eval by time measure — carsten.draschner / githubweb
- not showing intermediate df — carsten.draschner / githubweb
- Smart Feature Extraction evaluation pipline to compare against SparqlFrame — carsten.draschner / githubweb
- better interpret dataset of triple — carsten.draschner / githubweb
- make filepath args parameter — carsten.draschner / githubweb
- some work of replacing df by ds to fix bug — carsten.draschner / githubweb
- make limit seeds for eval a parameter within eval class — carsten.draschner / githubweb
- make limit seeds within Dasim class settable for eval — carsten.draschner / githubweb
- make more params available — carsten.draschner / githubweb
- started with readme changes — carsten.draschner / githubweb
- offer filter options, include setters, work on filtered ds — carsten.draschner / githubweb
- remove limit within tryouts — carsten.draschner / githubweb
- Smart Feature Extractor unit tests with testing also the novel setters and filter opportunities to better use it outside the SimE4KG approach and pipeline — carsten.draschner / githubweb
- some unit test adjustments — carsten.draschner / githubweb
- more code snippets for readme — carsten.draschner / githubweb
- adjustments in readme — GitHub / githubweb
- adjust unit test especially the file path handling — carsten.draschner / githubweb
- calculate availability weighting — carsten.draschner / githubweb
- introduced na fill in simialrit sciore if features are not avilable and give 0 as similarity score — carsten.draschner / githubweb
- introduce another sample with null features to cver this case — carsten.draschner / githubweb
- if availability is calculated print the distribution — carsten.draschner / githubweb
- outcomment show statement because if occours anyways — carsten.draschner / githubweb
- add one feature to see hat there is a valid availability distribution — carsten.draschner / githubweb
- better semantification and easier handling because of removal of parameter but storing meta data in the transformer itself — carsten.draschner / githubweb
- small call of semantification within unit test — carsten.draschner / githubweb
- add semantification example to readme — carsten.draschner / githubweb
- bring back simEdocumentation to readme — GitHub / githubweb
- missing closing quotes for code block — GitHub / githubweb
- sime4kg Databricks notebook link to readme — GitHub / githubweb
#68 (Nov 24, 2021, 2:14:47 PM)
- minor — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- parse any RDF language — Lorenz Buehmann / githubweb
- fix test — Lorenz Buehmann / githubweb
#67 (Nov 24, 2021, 2:11:12 PM)
#66 (Nov 24, 2021, 12:38:02 PM)
#65 (Nov 24, 2021, 12:23:23 PM)
#64 (Nov 24, 2021, 12:22:41 PM)
#63 (Nov 24, 2021, 4:48:52 AM)
- moved doc folder to root of repo to easy link it — carsten.draschner / githubweb
- moved docs from subfolder ml to root folder to make it linkable in github (pages) — carsten.draschner / githubweb
- had to change ml to stack in references — GitHub / githubweb
- missed one link, changed the here reference — GitHub / githubweb
- extended description — GitHub / githubweb
- layout adjustment — GitHub / githubweb
- Creates a dummy index.html — GitHub / githubweb
- Javadoc and Scaladoc 0.8.0 — Lorenz Buehmann / githubweb
- Update index.html — GitHub / githubweb
- Update overview-frame.html — GitHub / githubweb
- Update overview-summary.html — GitHub / githubweb
- update API docs — Lorenz Buehmann / githubweb
- update docs entry point for Scala docs — GitHub / githubweb
- changed header — GitHub / githubweb
- fixed typos in docs — GitHub / githubweb
- Added DeferredSeekablePushbackInputStream in order to get end of block (EOB) advertised before reading into the next block. — Claus Stadler / githubweb
- And optimized the push back input stream away again... — Claus Stadler / githubweb
- jena4 upgrade (compiles but untested) — Claus Stadler / githubweb
- Fixed a bug with wrong limit computation in InterruptingSeekableByteChannel — Claus Stadler / githubweb
- Forgot to add pom.xml — Claus Stadler / githubweb
- upgraded r2rml-jena-api — Claus Stadler / githubweb
- Added fake JenaSystem for commons-rdf to work — Claus Stadler / githubweb
- Extended RddRdfSaver with a setOutputFormat(String) method and added all missing create methods — Claus Stadler / githubweb
- Renamed rdf-centric FileInputFormat classes to include 'Rdf' in their name — Claus Stadler / githubweb
- RddRdfSaver now has built in support for console output. — Claus Stadler / githubweb
- Added a new API for loading rdf into RDDs: e.g. RDD<Dataset> rdd = RdfSourceFactoryImpl.from(sparkSession).get("someResource").asDatasets(); — Claus Stadler / githubweb
- upgraded sparqlify — Claus Stadler / githubweb
- fixed missing base url in datalake r2rml mapping — Claus Stadler / githubweb
- Added trig/quad registration; added initial ghpages setup — Claus Stadler / githubweb
- Ok, there is already a docs branch.. — Claus Stadler / githubweb
- Removed my docs resources — Claus Stadler / githubweb
- Added just-the-docs config — Claus Stadler / githubweb
- Added sansa logo — Claus Stadler / githubweb
- Moved config to docs folder (not sure if gh pick it up there) — Claus Stadler / githubweb
- Added a few pages — Claus Stadler / githubweb
- Excluded javadoc / scaladoc folders — Claus Stadler / githubweb
- Trying to get the edit page on github link working — Claus Stadler / githubweb
- fixed typo — Claus Stadler / githubweb
- Minor update — Claus Stadler / githubweb
- Added more structure — Claus Stadler / githubweb
- More work on docs — Claus Stadler / githubweb
- set docs branch to jena4 for now — Claus Stadler / githubweb
- Updated doc — Claus Stadler / githubweb
- Increased default probe count for RecordReaderGenericBase to 100 — Claus Stadler / githubweb
- Added support for Lang attribute to RdfSourceFactory. — Claus Stadler / githubweb
- Updated imports — Claus Stadler / githubweb
- Bumped to a snapshot version of aksw-commons — Claus Stadler / githubweb
- update doc links for Sansa ml especially distsim — GitHub / githubweb
- update distsim docs — GitHub / githubweb
- add docs for distrdf2ml — GitHub / githubweb
- Added operator to map named models to resource in datasets — Claus Stadler / githubweb
- Added Kryo Serializer for Binding — Claus Stadler / githubweb
- Renamed folder containing '.' — Claus Stadler / githubweb
- bump to jena 4.1.0 — Claus Stadler / githubweb
- upgrade to jena 4.2.0 — Claus Stadler / githubweb
- Added support for parallel ingestion of json arrays — Claus Stadler / githubweb
- Removed debug output — Claus Stadler / githubweb
- Added java version for aggregation using the java collector api — Claus Stadler / githubweb
- Towards adding a tarql-like operator — Claus Stadler / githubweb
- Added splittable multiline csv support to the generic parser framework — Claus Stadler / githubweb
- Removed hard coded path (need to prepare a csv test file yet) — Claus Stadler / githubweb
- Added tarql mapper — Claus Stadler / githubweb
- regex for seeking csv record offsets greatly improved (my most arcane regex to date) — Claus Stadler / githubweb
- There are now tarqlTriples and tarqlQuads methods in JavaRddRxOps. — Claus Stadler / githubweb
- Pretty much finished CSV integration; added options for setting csv formats and added test data — Claus Stadler / githubweb
- Skip header record is now respected by the RecordReaderCsv — Claus Stadler / githubweb
- Consolidated naming for rdd ops and hadoop formats — Claus Stadler / githubweb
- Added JavaRddOfBindingsOps class — Claus Stadler / githubweb
- Consolidated naming of the java rdd operators — Claus Stadler / githubweb
- Fixed use of a non serializable lambda in the ops — Claus Stadler / githubweb
- Added univocity csv parser because commons-csv apparently can't handle CR (without LF) in unquoted fields... — Claus Stadler / githubweb
- Switched csv implementation to univocity in CsvDataSources — Claus Stadler / githubweb
- Added a small rdd transformation framework — Claus Stadler / githubweb
- Improved documentation on rdd transformation chaining interfaces — Claus Stadler / githubweb
- Added documentation for chaining — Claus Stadler / githubweb
- Fixd for chaining doc — Claus Stadler / githubweb
- Moved java operators for jena to its own lightweight package — Claus Stadler / githubweb
- Moved the JavaRddOfBindingsOps from query layer to the sansa-spark-jena-java package — Claus Stadler / githubweb
- Reorganized packages in the java module — Claus Stadler / githubweb
- Added a kryo registrator to the sansa-spark-jena-java package — Claus Stadler / githubweb
- Added a call to JenaSystem.init() within mapPartitions because it seemed an output format was not found — Claus Stadler / githubweb
- Removed JenaSystem.init() because RDFLanguagesEx is the better place — Claus Stadler / githubweb
- Updated ResourceInDataset related imports — Claus Stadler / githubweb
- Modified rdd rdf writer api to allow for validation BEFORE running a job — Claus Stadler / githubweb
- Some cleanup of the revised RddRdfWriter — Claus Stadler / githubweb
- Allowed passing subclasses of Dataset/DatasetGraph etc to the RddRdfWriter — Claus Stadler / githubweb
- Changed API for RDD<Dataset> to RDD<DatasetOneNg>; this allows for enforcing only a single named graph per rdd entry (and getting the graph name) while retaining query capabilites. — Claus Stadler / githubweb
- Added serializer for DatasetOneNg — Claus Stadler / githubweb
- Fixed spelling mistake in class name — Claus Stadler / githubweb
- Fixed compile errors due to api changes in the rdf reader — Claus Stadler / githubweb
- upgraded the lower levels to jena 4.3.0-SNAPSHOT (requires jena to be built from git!) — Claus Stadler / githubweb
- added DistAD material — f.bakhshandegan / githubweb
- rdf layer now working with jenax / query layer not yet — Claus Stadler / githubweb
- Fixed query and cli modules for jenax / jena 4.3.0 — Claus Stadler / githubweb
- Disabled flink modules; all spark modules building but some tests fail — Claus Stadler / githubweb
- started with SimE4KG documentation on GitHub pages — GitHub / githubweb
- Update index.md — GitHub / githubweb
- Update index.md — GitHub / githubweb
- Update index.md — GitHub / githubweb
- Farshad DistAD support to commit to develop ReadMe — GitHub / githubweb
- SimE4KG provide Databricks notebooks link — GitHub / githubweb
- Update index.md — GitHub / githubweb
- jena branch now has its own version; upgraded to latest jena4.3.0 snapshot — Claus Stadler / githubweb
- Set version back to 0.8.0-RC2-SNAPSHOT — Claus Stadler / githubweb
#62 (Jun 16, 2021, 9:46:57 AM)
- integrate hashing as alternative to indexing of categorical strings. adjusted also unit tests and offer new setter — carsten.draschner / githubweb
- integrated getter for feature vector descriptions — carsten.draschner / githubweb
- offer semantic represenation of transformer hyperparameters — carsten.draschner / githubweb
#61 (May 31, 2021, 4:00:34 PM)
- created new file for unit test having datetime infomation and will result in partial null values which should be covered by functionalities of sva — carsten.draschner / githubweb
- changed sva sample file. now with datetime info within a second test — carsten.draschner / githubweb
- add lines to show how to store example metagraph within DistRDF2ML pipeline — carsten.draschner / githubweb
- handling and tracking of timestamp null values and offer replcment — carsten.draschner / githubweb
#60 (May 28, 2021, 12:28:10 PM)
#59 (May 26, 2021, 1:19:49 PM)
#58 (May 25, 2021, 10:36:28 AM)
- initial pipeline for platoon 3a demo — carsten.draschner / githubweb
- better support datetime timestamp and split features — carsten.draschner / githubweb
- Towards support for temporal datatypes in the schema mapper — Claus Stadler / githubweb
- SchemaMapper should now work with temporal datatypes — Claus Stadler / githubweb
- support of datetime within smart vector assembler — carsten.draschner / githubweb
- rm pl related data — carsten.draschner / githubweb
#57 (May 17, 2021, 2:06:55 PM)
- MmDistSim collect feature sets — carsten.draschner / githubweb
- MmDistSim span up exploded dataframe with feature ordering — carsten.draschner / githubweb
- caclulate jaccard for each feature as baseline — carsten.draschner / githubweb
- many changes — carsten.draschner / githubweb
- next features: — carsten.draschner / githubweb
- some changes: — carsten.draschner / githubweb
- some changes: — carsten.draschner / githubweb
- remove slowing down count for print statemnet — carsten.draschner / githubweb
- time catching — carsten.draschner / githubweb
- more time catching — carsten.draschner / githubweb
- reduce complexity by hero filter — carsten.draschner / githubweb
- some count to ensure non lazy evaluation — carsten.draschner / githubweb
- make write optional — carsten.draschner / githubweb
- total exp time — carsten.draschner / githubweb
- started with feature identifier — carsten.draschner / githubweb
- started with transformer pipelines for string based features — carsten.draschner / githubweb
- running version to get digitized dataframe but huge memory need — carsten.draschner / githubweb
- fixed bug of wrongly joined df and removed cache — carsten.draschner / githubweb
- catch more data types and handle variable feature sets over agg — carsten.draschner / githubweb
- attempts to fix oom — carsten.draschner / githubweb
- solve oom over persists and unpersists — carsten.draschner / githubweb
- pipeline runs through, adjusted prints and shows for neat output — carsten.draschner / githubweb
- removed outcommented println — carsten.draschner / githubweb
- some clean up — carsten.draschner / githubweb
- sparql change to run over full data — carsten.draschner / githubweb
- some prints for server progress — carsten.draschner / githubweb
- make query collapsable — carsten.draschner / githubweb
- created setters — carsten.draschner / githubweb
- bugfix by suffix strip in keycolumn generation — carsten.draschner / githubweb
- create seconf query string for collapsable feature columns — carsten.draschner / githubweb
- complete new procedure for smart vector assembler — carsten.draschner / githubweb
- created first test cass which calls smartvector assembler ontop of sparqlframe — carsten.draschner / githubweb
- assert over df size — carsten.draschner / githubweb
- put assert to test class — carsten.draschner / githubweb
- add setters for — carsten.draschner / githubweb
- make object attributes protected so they can be only set over setter — carsten.draschner / githubweb
- in unit test make use out of setters — carsten.draschner / githubweb
- started with DistRDF2ML evaluation class. tracking times of pipeline modules and store those to file — carsten.draschner / githubweb
- adjustments in write information — carsten.draschner / githubweb
- fix some column name handling in sva — carsten.draschner / githubweb
- label column name handling better distinct from feature columns in edge cases — carsten.draschner / githubweb
- explode of labelcolumn which is within list — carsten.draschner / githubweb
- assert for expected column names — carsten.draschner / githubweb
- soem docstrings — carsten.draschner / githubweb
- get soize after sparql frame to ensure no lazyness — carsten.draschner / githubweb
- fix write bug — carsten.draschner / githubweb
- more persist — carsten.draschner / githubweb
- remove of need of joinable df and instead direct assignment of digitized df — carsten.draschner / githubweb
- remove debug component and make word2vec model seperate — carsten.draschner / githubweb
- reduce word2vec size to 2 — carsten.draschner / githubweb
- offer sampling of word2vec traning data to reduce ram complexity — carsten.draschner / githubweb
- create full regression pipeline as an example of DistRDF2ML use — carsten.draschner / githubweb
- move and rename evaluation script of DistRDF2ML — carsten.draschner / githubweb
- create or in sva of Int and integer — carsten.draschner / githubweb
- moved — carsten.draschner / githubweb
- use test for eval — carsten.draschner / githubweb
- sample classification pipeline based on DistRDF2ML modules — carsten.draschner / githubweb
- rename method — carsten.draschner / githubweb
- adjust setter — carsten.draschner / githubweb
- create Readme for DistRDF2ML — carsten.draschner / githubweb
- created DistRDF2ML clustering pipeline — carsten.draschner / githubweb
- adjustments in DistRDF2ML readme — carsten.draschner / githubweb
- adjustments in DistRDF2ML readme - add reference to classes — carsten.draschner / githubweb
- update DistRDF2ML readme - databricks — carsten.draschner / githubweb
- created ml2graph transformer for semantification of DistRDF2ML pipeline Results — carsten.draschner / githubweb
- ml to graph offer fallback xsd type — carsten.draschner / githubweb
- create ML2Graph unit test — carsten.draschner / githubweb
- show to different opportunities to gather int value out of string within KG — carsten.draschner / githubweb
- created new tiny sample file — carsten.draschner / githubweb
- created ML to Graph transformer to make MLlib output semantic for DistSim Pipeline — carsten.draschner / githubweb
- unit tests calling ML2Granph module — carsten.draschner / githubweb
- reorder metagraph — carsten.draschner / githubweb
- show semantification of results in saomple regression pipleine — carsten.draschner / githubweb
- show semantification of results in saomple classfication pipeline — carsten.draschner / githubweb
- show semantification of results in saomple clustering pipeline — carsten.draschner / githubweb
- class for data size eval on artificial movie data — carsten.draschner / githubweb
- offer ration of indexer traning like word 2 vec — carsten.draschner / githubweb
#56 (Apr 13, 2021, 10:59:57 AM)
- first running example to transform rdf data into a native spark dataframe with native scala datatypes — carsten.draschner / githubweb
- created transformer which creates sparql to dataframe with native object like string integer ect — carsten.draschner / githubweb
- scala class object with main showcasing usage of sparqlFrame — carsten.draschner / githubweb
- annotate datatype of literals so it can be fetched by sparqlFrame — carsten.draschner / githubweb
- payaround with Sparqlify in SANSA ml — carsten.draschner / githubweb
- outcomment prints and add todos to make logs also fixed bug by remove a closing curly braket in sparql creation — carsten.draschner / githubweb
- added elements for more complete feature extracting pipeline — carsten.draschner / githubweb
- better literal identification — carsten.draschner / githubweb
- clearer warning of multiple struct types — carsten.draschner / githubweb
- clear difference in val names of auto and manual spaarql — carsten.draschner / githubweb
- move rdf2feature sparql creatoe into feature extraction package — carsten.draschner / githubweb
- added missing imports after class moving — carsten.draschner / githubweb
- move to desired package structure — carsten.draschner / githubweb
- switch to make or not make feature blocks optional over paramter — carsten.draschner / githubweb
- better handling of empty answer dataframe — carsten.draschner / githubweb
- adjusted literal identification — carsten.draschner / githubweb
- running sample pipeline — carsten.draschner / githubweb
- apply common mllib algo and it workx :) — carsten.draschner / githubweb
- more complex needed sparql statement — carsten.draschner / githubweb
- debug level error — carsten.draschner / githubweb
- changed position of sampling — carsten.draschner / githubweb
- cache in rdf2feature up and down dataframe — carsten.draschner / githubweb
- some fixes after merge — carsten.draschner / githubweb
- rename sparql creating method — carsten.draschner / githubweb
- spark setup taken from ontop to fix issue — carsten.draschner / githubweb
- vp — Lorenz Buehmann / githubweb
- VP warehouse path — Lorenz Buehmann / githubweb
- example from existing DB — Lorenz Buehmann / githubweb
- conf options — Lorenz Buehmann / githubweb
- use Hive metastore — Lorenz Buehmann / githubweb
- log — Lorenz Buehmann / githubweb
- dateTime/dateTimeStamp support — Lorenz Buehmann / githubweb
- log — Lorenz Buehmann / githubweb
- Added execSelectSpark() method to QueryExecutionSparqlifySpark — Claus Stadler / githubweb
- some rearrange of impoorts and add of jenasystem init — carsten.draschner / githubweb
- try out new version of sparklify to gain rdd of bindings — carsten.draschner / githubweb
- try out more complex sparql query with multi line optional blocks — carsten.draschner / githubweb
- change sparqlframe such that it can switch between ontop and sparqlify. also better handling for null values in columns to evaluate structtype and circumvent NPE when .get methods are called — carsten.draschner / githubweb
- try out new sparqlyfy based feature extraction in this pipeline — carsten.draschner / githubweb
- Refactored partitioning and optimized imports — Claus Stadler / githubweb
- minor — Lorenz Buehmann / githubweb
- minor code style — Lorenz Buehmann / githubweb
- partition type for datatime literals — Lorenz Buehmann / githubweb
- partitioner changes adapted — Lorenz Buehmann / githubweb
- omit Javadoc/Scaladoc in build script — Lorenz Buehmann / githubweb
- test if ontop can deal with multiline optional blocks. and its working — carsten.draschner / githubweb
- Partitioning — Lorenz Buehmann / githubweb
- Added partition-to-r2rml converter based on our r2rml-jena API — Claus Stadler / githubweb
- Renamed method, added documentation — Claus Stadler / githubweb
- Attempt to add a manual workflow trigger — Claus Stadler / githubweb
- cntd. R2RML export/import — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- Added new interfaces for unified sparqlify/ontop apis — Claus Stadler / githubweb
- More consolidation of the r2rml/virtual knowledge graph subsystem (sparqlify / ontop) — Claus Stadler / githubweb
- Update for change in r2rml lib — Claus Stadler / githubweb
- reuse code — Lorenz Buehmann / githubweb
- query execution alignment — Lorenz Buehmann / githubweb
- Update README.md — GitHub / githubweb
- comment methods — carsten.draschner / githubweb
- create first sparqlframe unit test — carsten.draschner / githubweb
- remove play around class which is not needed anymore — carsten.draschner / githubweb
- annotate literal type such that sparqlframe can handle it — carsten.draschner / githubweb
- document sparql frame — carsten.draschner / githubweb
- sample usage of pipeline elements — carsten.draschner / githubweb
- Set r2rml api version — Claus Stadler / githubweb
- query engine — Lorenz Buehmann / githubweb
- take table name fn — Lorenz Buehmann / githubweb
- Sparqlify now ported to the new the new API towards unified virtual graph handling — Claus Stadler / githubweb
- rewrite — Lorenz Buehmann / githubweb
- started developing smart vector assembler — carsten.draschner / githubweb
- added docstring — carsten.draschner / githubweb
- fixed null replacements — carsten.draschner / githubweb
- usage of smart vector assembler and fix of some strange read in toDS bug — carsten.draschner / githubweb
- extension of smart vector assembler transform logic — carsten.draschner / githubweb
- Ontop R2RML parse — Lorenz Buehmann / githubweb
- soem cleanup and some docs. now first approach working on beta level — carsten.draschner / githubweb
- minor — Lorenz Buehmann / githubweb
- changed readme for smart vector assembler — carsten.draschner / githubweb
- more owrk on R2RML — Lorenz Buehmann / githubweb
- cntd Ontop integration — Lorenz Buehmann / githubweb
- reduced tests loads — Lorenz Buehmann / githubweb
- A few improvements for the R2RML layer design — Claus Stadler / githubweb
- fixes tests — Lorenz Buehmann / githubweb
- avoid exception — Lorenz Buehmann / githubweb
- tests debugging — Lorenz Buehmann / githubweb
- Work on analytic RDD ops — Claus Stadler / githubweb
- fixes tests — Lorenz Buehmann / githubweb
- Cache — Lorenz Buehmann / githubweb
- rewrite — Lorenz Buehmann / githubweb
- fixes build errors — Lorenz Buehmann / githubweb
- fix print lines — carsten.draschner / githubweb
- clean up — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- simplified code — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- Added schema mapping system (but some parts not yet serializable) — Claus Stadler / githubweb
- first working version of the schema mapper for customizable RDD[Binding] -> DataFrame conversions — Claus Stadler / githubweb
- dist iq — Lorenz Buehmann / githubweb
- Removed needless SparkSession argument — Claus Stadler / githubweb
- Result var order is now retained in the schema mapping — Claus Stadler / githubweb
- rename — Lorenz Buehmann / githubweb
- Experimenting with schema mapper — Claus Stadler / githubweb
- first commented possible change to incooperate with claus changes — carsten.draschner / githubweb
- first commented possible change to incooperate with claus changes — carsten.draschner / githubweb
- remove print — carsten.draschner / githubweb
- string type — Lorenz Buehmann / githubweb
- Fixed no archaic import — Claus Stadler / githubweb
- Fixed a no longer available import — Claus Stadler / githubweb
- extract methods — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- test runner ext — Lorenz Buehmann / githubweb
- partition test fix — Lorenz Buehmann / githubweb
- log reduced — Lorenz Buehmann / githubweb
- merged SPARQL examples — Lorenz Buehmann / githubweb
- Added a comment about renaming RddToDframeMapper to something more specific — Claus Stadler / githubweb
- generic test suite — Lorenz Buehmann / githubweb
- test rework — Lorenz Buehmann / githubweb
- clean up — Lorenz Buehmann / githubweb
- Work on test cases — Claus Stadler / githubweb
- Adapted test runner — Claus Stadler / githubweb
- renamed test package — Lorenz Buehmann / githubweb
- Jar plugin — Lorenz Buehmann / githubweb
- SPARQL engine example improved — Lorenz Buehmann / githubweb
- all tests green so far — Claus Stadler / githubweb
- renamed test cases once more — Claus Stadler / githubweb
- fix compile error — Lorenz Buehmann / githubweb
- move out ontop cause it is unclear how to call it with new api — carsten.draschner / githubweb
- fix tests by System.setProperty and some changes in tests because of small datatype differences — carsten.draschner / githubweb
- bring ontop back into sparql frame over setter — carsten.draschner / githubweb
- updated integration tests - sparqlify succeed - ontop hangs — Claus Stadler / githubweb
- hotfix: some fecatoring of hard coded smark master set to local, remove unused sample classes, hotfix of small udf change — carsten.draschner / githubweb
- offset support — Lorenz Buehmann / githubweb
- minor changes in engine — Lorenz Buehmann / githubweb
- ser tests — Lorenz Buehmann / githubweb
- debug serializer — Lorenz Buehmann / githubweb
- Scala code in Java ... — Lorenz Buehmann / githubweb
- moved Java serializer to Scala — Lorenz Buehmann / githubweb
- some cleanup — Lorenz Buehmann / githubweb
- Added core of for another query engine for ad-hoc quries on trig data - conceptually its probably sparqlgx-sde — Claus Stadler / githubweb
- ser extended — Lorenz Buehmann / githubweb
- Renamed class — Claus Stadler / githubweb
- Added cli, deb packaing, build/version metadata generation, changed r2rml dep version to 0.9.0-SNAPSHOT — Claus Stadler / githubweb
- Fixed typo — Claus Stadler / githubweb
- Cleaned up some warning... now java/scala cyclic dependency issue -.- — Claus Stadler / githubweb
- Fixed some annotation issues — Claus Stadler / githubweb
- Added the well known compile-scala-first workaround — Claus Stadler / githubweb
- Ignored binding engine test — Claus Stadler / githubweb
- sansa command working from debian package — Claus Stadler / githubweb
- Attempt to fix broken prefix handling in TrigFileInputFormat — Claus Stadler / githubweb
- General improvements on the cli tool — Claus Stadler / githubweb
- Removed all log4j.properties files under src/main/resources — Claus Stadler / githubweb
- Moved spark-bench to separate module because it is about the only remaining lib that ships with its own log4j.properties and doesn't belong to main lib anyway — Claus Stadler / githubweb
- Increased default max record length in trig reader to 10MB — Claus Stadler / githubweb
- Attempt to bump TRL or trig reader from 4 (works under laboratory conditions) to 6/7 (works in a relevant setup)... — Claus Stadler / githubweb
- Added workaround for HADOOP-17453 and it seems that made it work... code yet needs removal of swear words.... — Claus Stadler / githubweb
- cleanup — Claus Stadler / githubweb
- reset trig reader defaults — Claus Stadler / githubweb
- registered more classes to serializer — Lorenz Buehmann / githubweb
- Improvements to the trig query; added --distinct flag for preprocessing data — Claus Stadler / githubweb
- type extractor based on R2RML — Lorenz Buehmann / githubweb
- some debug — Lorenz Buehmann / githubweb
- use enum for term types — Lorenz Buehmann / githubweb
- generic qaud writer — Lorenz Buehmann / githubweb
- Towards a utility to merge multiple trig files — Claus Stadler / githubweb
- Added extra module for test data resources (sansa-resource-testdata) — Claus Stadler / githubweb
- Added missing pom for test data — Claus Stadler / githubweb
- Improved trig record reader to dynamically allocate head and tail buffers — Claus Stadler / githubweb
- Fixed bug in trig reader — Claus Stadler / githubweb
- test cases now working again (a test resource is now shared between rdf-common and query-spark) — Claus Stadler / githubweb
- Messing around with test case file lookups... — Claus Stadler / githubweb
- test resource file now found, but prefixing data frame names with the origin rdd now fails for ontop... — Claus Stadler / githubweb
- Tests working up the inference flink where they fail — Claus Stadler / githubweb
- Fell victim to accidental overflow with int/long arithmetic after rearranging an expression... — Claus Stadler / githubweb
- Wrapping hadoop input stream with close shield in an attempt to get rid of the stream closed exception... — Claus Stadler / githubweb
- A bit of cleanup — Claus Stadler / githubweb
- More fiddling with suddenly closed streams (apparently jena closes streams in LangBase:44) — Claus Stadler / githubweb
- Solved issues related to AbstractInterruptibleChannel: RDFDataMgrRx's Invocation of .interrupt() on the parser thread closed hadoop's input stream; — Claus Stadler / githubweb
- TrigRecordReader update and fixes for compressed input — Claus Stadler / githubweb
- Added license file — Claus Stadler / githubweb
- Update LICENSE — GitHub / githubweb
- Reinstanted jens' version of the license — GitHub / githubweb
- engine setup — Lorenz Buehmann / githubweb
- docs — Lorenz Buehmann / githubweb
- cleanup — Lorenz Buehmann / githubweb
- needed code adaption for latest Ontop changes — Lorenz Buehmann / githubweb
- test loader changed for relative paths — Lorenz Buehmann / githubweb
- Ontop serialization stuff — Lorenz Buehmann / githubweb
- test output reduced — Lorenz Buehmann / githubweb
- boolean parse fix — Lorenz Buehmann / githubweb
- removed debug output — Lorenz Buehmann / githubweb
- more serialization — Lorenz Buehmann / githubweb
- update readme regarding automatic creation of spraql query — GitHub / githubweb
- Kryo pool — Lorenz Buehmann / githubweb
- kryo debugging — Lorenz Buehmann / githubweb
- debug log — Lorenz Buehmann / githubweb
- disable kryo pool for now — Lorenz Buehmann / githubweb
- version to keep it separated from dev branch — Lorenz Buehmann / githubweb
- scientific notation workaround — Lorenz Buehmann / githubweb
- try rewrite instruction ser via Spark Kryo — Lorenz Buehmann / githubweb
- remove print line and started with sample pipeline for ea use case — carsten.draschner / githubweb
- remove unwanted print statement — carsten.draschner / githubweb
- more replacements of unwanted chars in projection vars — carsten.draschner / githubweb
- refactor ehre auto sparql creation is — carsten.draschner / githubweb
- refactor place of class — carsten.draschner / githubweb
- refactor place of pipelines to examples — carsten.draschner / githubweb
- More work on the trig/distinct command — Claus Stadler / githubweb
- use more filesand more print — carsten.draschner / githubweb
- Fixed wrong reference to cli module (now include scala version) — Claus Stadler / githubweb
- set number of mappers via properties — Lorenz Buehmann / githubweb
- local eval handler — Lorenz Buehmann / githubweb
- Switched to spark.sparkContext.union — Claus Stadler / githubweb
- Updated trig/distinct tool — Claus Stadler / githubweb
- Switched to thrift serialization by default — Claus Stadler / githubweb
- set local evaluation via property — Lorenz Buehmann / githubweb
- Pinned thrift version to 0.13.0 — Claus Stadler / githubweb
- tryout pipelines for lmdb — carsten.draschner / githubweb
- added hdt plugin — Claus Stadler / githubweb
- Thrift now working (issue was due to serialization as 'string' instead of byte[] rather than thrift version conflict) — Claus Stadler / githubweb
- WarningParseMode.IGNORE — carsten.draschner / githubweb
- some debug stuff — Lorenz Buehmann / githubweb
- relient node parser — Lorenz Buehmann / githubweb
- relient node parser — Lorenz Buehmann / githubweb
- Using thrift as default serialization for nodes now — Claus Stadler / githubweb
- more relient Node parse — Lorenz Buehmann / githubweb
- Fix for serialization issues (Turns out handling of ByteBuffer by spark is via util methods - not dedicated serializers) — Claus Stadler / githubweb
- cleanup — Lorenz Buehmann / githubweb
- parse mode enum — Lorenz Buehmann / githubweb
- fixed more issues with the binary serialization — Claus Stadler / githubweb
- refactor pipelines — carsten.draschner / githubweb
- fix bug in literal detection — carsten.draschner / githubweb
- added filtering by given feature list in smart vector assembler — carsten.draschner / githubweb
- smart vector assembler sample setting of feature columns to use — carsten.draschner / githubweb
- use autosparql in lmdb use case — carsten.draschner / githubweb
- Refactored all kryo serializers into java classes and moved them to rdf-common — Claus Stadler / githubweb
- Fixed typos and bug in KryoArrayUtils.java — Claus Stadler / githubweb
- Added performance benchmark for thrift vs riot (as junit test case for now) — Claus Stadler / githubweb
- formatting — Claus Stadler / githubweb
- cleanup — Claus Stadler / githubweb
- Fixed runtime java.lang.NoClassDefFoundError: Could not initialize class net.sansa_stack.rdf.spark.model.rdd.RddOfTriplesOps$ when running in standalone/yarn Spark modes. — azary / githubweb
- Added a custom node serializer because node serialization via jena's thrift api is too slow (riot is 20x faster) — Claus Stadler / githubweb
- Updated class description of GenericNodeSerializerCustom — Claus Stadler / githubweb
- code style — Claus Stadler / githubweb
- store sample created autosparql in lmdb use case — carsten.draschner / githubweb
- use compact but representative sparql query in lmdb — carsten.draschner / githubweb
- experiment with eauc — carsten.draschner / githubweb
- disable Kryo logging in tests — Lorenz Buehmann / githubweb
- Changed a serr to logger.debug — Claus Stadler / githubweb
- fix null issue in qef setup — Lorenz Buehmann / githubweb
- Updated getting the spark session for a given RDD — Claus Stadler / githubweb
- Excluded excludes for jaxb because it breaks sparqlify's xml parser on java 9+ — Claus Stadler / githubweb
- Put a benchmark 'unit test' on ignore — Claus Stadler / githubweb
- Updated broken imports in inference test case — Claus Stadler / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update main.yml — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Added more options to the trig distinct tool — Claus Stadler / githubweb
- better handling of literal identification — carsten.draschner / githubweb
- handle non supported query engines — carsten.draschner / githubweb
- make ontop unavaible — carsten.draschner / githubweb
- reove a certain show of a debug dataframe — carsten.draschner / githubweb
- update readme to fit to recent changes — GitHub / githubweb
- fixed VP setup for Ontop — Lorenz Buehmann / githubweb
- use enum instead of string for query engine setter — Lorenz Buehmann / githubweb
- exxplicit dataset type — carsten.draschner / githubweb
- handling if no seeds can be resolved — carsten.draschner / githubweb
- hadnle set of query engine in upper and lower case — carsten.draschner / githubweb
- some R2RML model utils — Lorenz Buehmann / githubweb
- run classification on rdf kg — carsten.draschner / githubweb
- R2RML utils — Lorenz Buehmann / githubweb
- switch order of queries — carsten.draschner / githubweb
- JSQL version — Lorenz Buehmann / githubweb
- GH issue101 test — Lorenz Buehmann / githubweb
- Refactored / fixed inference test cases (now uses classpath scanning rather than messing around with files) — Claus Stadler / githubweb
- W3C test suite runner Sparqlify — Lorenz Buehmann / githubweb
- added dependency — carsten.draschner / githubweb
- optimize imports — carsten.draschner / githubweb
- new dataset to dataframe implementation for auto sparql generation — carsten.draschner / githubweb
- Removed relative reference to parent pom from inference layer... — Claus Stadler / githubweb
- get back to old df handling of literals — carsten.draschner / githubweb
- More cleanup of inference pom setup... — Claus Stadler / githubweb
- some experiments and making code more stable — carsten.draschner / githubweb
- Improvement of the schema mapper; renamed RddOf[plural]Ops to singular — Claus Stadler / githubweb
- Ported TrigRecordReader to plain java (1 compile error remaining) — Claus Stadler / githubweb
- fixed last compile error on java TrigRecordReader - now for testing... — Claus Stadler / githubweb
- Ported TrigFileInputFormat to java — Claus Stadler / githubweb
- Ported TrigRecordReader test to a nice parameterized junit test. Applied auto-formatting. — Claus Stadler / githubweb
- Trig record reader should now be able to cope with pretty much all corner cases; improved test framework; real-world data tests pending. — Claus Stadler / githubweb
- Excluded slow trig record reader tests for now — Claus Stadler / githubweb
- Commented out r2rml-sql utils to see where updates to the common lib are needed — Claus Stadler / githubweb
- Fixeda regression that causes non-encoded input to be non-splittable in the TrigRecordReader — Claus Stadler / githubweb
- add system init — carsten.draschner / githubweb
- tr out if common way works — carsten.draschner / githubweb
- remove line which not handle invalid triple — carsten.draschner / githubweb
- use only manual sparql string for test purposes — carsten.draschner / githubweb
- Replaced SqlEscaper with the new SqlCodec API (which can do both encoding and decoding of identifiers) — Claus Stadler / githubweb
- use only manual sparql string for test purposes — carsten.draschner / githubweb
- project now compiling again — Claus Stadler / githubweb
- Added support for post-processing R2RML mappings to qualify table names with database names and integration with the sparqlify system — Claus Stadler / githubweb
- test cleaned up — Lorenz Buehmann / githubweb
- enabled HTML export in ScalaTest — Lorenz Buehmann / githubweb
- Scalatest version change — Lorenz Buehmann / githubweb
- Scatatest version cleanup — Lorenz Buehmann / githubweb
- HTML export lib — Lorenz Buehmann / githubweb
- outcomment ontop block. locally running now — carsten.draschner / githubweb
- expand shortcuts readded — Lorenz Buehmann / githubweb
- Refactored TrigRecordReader into a framework with the central class RecordReaderGenericBase. — Claus Stadler / githubweb
- Turtle record reader working on some datasets but failing on others with lots of blank nodes. — Claus Stadler / githubweb
- Splittable record reader for turtle now functional — Claus Stadler / githubweb
- Aligned hadoop configuration option naming — Claus Stadler / githubweb
- test exclude — Lorenz Buehmann / githubweb
- workaround Scalatest conflicts — Lorenz Buehmann / githubweb
- Created dedicated modules sansa-hadoop-jena and sansa-kryo-jena — Claus Stadler / githubweb
- Cleaning up some leftovers from refactoring jena/hadoop/kryo — Claus Stadler / githubweb
- all tests passed locally — Claus Stadler / githubweb
- Fix for #144 — Claus Stadler / githubweb
- write debug data on test error — Lorenz Buehmann / githubweb
- clean tests — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- omit DB in test clean up — Lorenz Buehmann / githubweb
- minor — Lorenz Buehmann / githubweb
- Fix for #146; test with 10 million numbered triples did you exhibit data loss anymore — Claus Stadler / githubweb
- bumped RDF4j version — Lorenz Buehmann / githubweb
- Improved README to better point out features — GitHub / githubweb
- Rephrased features — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Significant reworking of RecordReaderGenericBase for #144; when collecting the split's flow to lists all tests succeed; otherwise there is a 'MissingBackPressure' exception. — Claus Stadler / githubweb
- Maybe the missing backpressure exception is solved now — Claus Stadler / githubweb
- Added SERVICE<rdd:perGraph> support in addition to rdd:perPartition (but still not totally happy because semantics w.r.t. to named graphs, efault graphs, datasets and partitions of those not totally clear; needs more thought) — Claus Stadler / githubweb
- Added feature to merge part files into a single file — Claus Stadler / githubweb
- kryo package — Lorenz Buehmann / githubweb
- CLI moved — Lorenz Buehmann / githubweb
- CLI moved — Lorenz Buehmann / githubweb
- Improvements for #147: Renamed methods and updated documentation — Claus Stadler / githubweb
- Fixed a compile error due to String.format(str, ...) - changed to str.format(...) — Claus Stadler / githubweb
- Added spark-cli module to sansa-stack bundle project — Claus Stadler / githubweb
- Changed guava Stopwatch to Apache StopWatch because of hadoop — Claus Stadler / githubweb
- TrigMerge should now properly use the hadoop file system — Claus Stadler / githubweb
- Fixed compile error — Claus Stadler / githubweb
- Messing around with hadoop file system uri resolution... — Claus Stadler / githubweb
- Maybe now... — Claus Stadler / githubweb
- Bumped commons-lang3 version — Claus Stadler / githubweb
- Added more logging to RecordReaderGenericBase — Claus Stadler / githubweb
- Typo — Claus Stadler / githubweb
- Changed strategy for writing partitions of datasets out using PipedInput/OutputStream approach; assuming that spark is smart then this approach — Claus Stadler / githubweb
- clean up — Lorenz Buehmann / githubweb
- Deprecated the saveAsNtriples file method; going to replace it with a more powerful fluent API — Claus Stadler / githubweb
- Potential fix for issue with lambda serialization in NodeAnalytics — Claus Stadler / githubweb
- Another fix — Claus Stadler / githubweb
- serialize DB metadata — Lorenz Buehmann / githubweb
- Removed exclude for lang3 — Claus Stadler / githubweb
- Removed guava from SchemaMapper interface in an attempt to solve a serialization issue — Claus Stadler / githubweb
- Removed unused imports — Claus Stadler / githubweb
- some logs — Lorenz Buehmann / githubweb
- more logs — Lorenz Buehmann / githubweb
- omit DB creation — Lorenz Buehmann / githubweb
- for server eval make sparql alternatives setable — carsten.draschner / githubweb
- try out without auto sparql — carsten.draschner / githubweb
- try out without auto sparql — carsten.draschner / githubweb
- make sparql query parameter — carsten.draschner / githubweb
- mapper — Lorenz Buehmann / githubweb
- more debug log — Lorenz Buehmann / githubweb
- added print lines to see needed processing time — carsten.draschner / githubweb
- catch exception which always occurs — Lorenz Buehmann / githubweb
- adjusted manual sparql query — carsten.draschner / githubweb
- no metamapping expansion — Lorenz Buehmann / githubweb
- cache and prepare config — Lorenz Buehmann / githubweb
- expand mappings — Lorenz Buehmann / githubweb
- Added first version of a fluent API to RDF Rdds rdd.configureSave()...run(). — Claus Stadler / githubweb
- Added all formats to the new OutputFormatRdfRegistry — Claus Stadler / githubweb
- Updated Sparqlify integration to use SqlCodec — Claus Stadler / githubweb
- fixed flink build — Claus Stadler / githubweb
- logging and settings — Lorenz Buehmann / githubweb
- Experimenting with save settings — Claus Stadler / githubweb
- QueryEngineFactoryBase now uses double quotes encoding for R2RML — Claus Stadler / githubweb
- reduced log — Lorenz Buehmann / githubweb
- Revision of quoting: As a rule, R2RML documents should now ALWAYS escape any SQL indentifiers with double quotes; conversly: never use backticks in the R2RML. — Claus Stadler / githubweb
- Removed debug output — Claus Stadler / githubweb
- Removed unused repos — Claus Stadler / githubweb
- Refactored the creation of StreamRDF instances for writing RDF out in order to simplify changing of strategies — Claus Stadler / githubweb
- local eval mode — Lorenz Buehmann / githubweb
- repo — Lorenz Buehmann / githubweb
- Work on RddRdfSaver (unfinished) and improving structure of operations on Datasets / Named Models — Claus Stadler / githubweb
- remove old row mapper — Lorenz Buehmann / githubweb
- version set — Lorenz Buehmann / githubweb
- Work on rdd ops — Claus Stadler / githubweb
- version fix — Lorenz Buehmann / githubweb
- Removed duplicate elephas-io declaration in hadoop-jena — Claus Stadler / githubweb
- Replacing a 'dash' from system hash code with '_' in order to yield a valid spark table name. — Claus Stadler / githubweb
- set log4j to warn for tests; another attempt to fix the naming issue with the dash — Claus Stadler / githubweb
- Added feature to convert quads to triples when saving with a triples languague — Claus Stadler / githubweb
- Fixed versions of the bundle modules — Claus Stadler / githubweb
- serialize DB metadata — GitHub / githubweb
- Removed guava stopwatch — Claus Stadler / githubweb
- Updated trig/query tool (needs generalization and consolidation later) — Claus Stadler / githubweb
- Re-added workaround for HADOOP-17453 (bugged non-zero offset reads from BZip2Codec) — Claus Stadler / githubweb
- all around quoting — Lorenz Buehmann / githubweb
- test keep — Lorenz Buehmann / githubweb
- encode string literals — Lorenz Buehmann / githubweb
- tests cleaned — Lorenz Buehmann / githubweb
- docs — Lorenz Buehmann / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- formatting — Lorenz Buehmann / githubweb
- SPARQL usage example — Lorenz Buehmann / githubweb
- less log — Lorenz Buehmann / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- pom cleanup — Lorenz Buehmann / githubweb
- Removed unused classes and imports — Claus Stadler / githubweb
- Removed an unused class — Claus Stadler / githubweb
- removed guava — Claus Stadler / githubweb
- Removed another guava stopwatch — Claus Stadler / githubweb
- stick to Guava 14 for Spark compatibility — Lorenz Buehmann / githubweb
- Fixed some warnings — Claus Stadler / githubweb
- start with semantic description — carsten.draschner / githubweb
- bumped guava — Lorenz Buehmann / githubweb
- Replaced snapshot versions of Claus's stuff with release ones — Claus Stadler / githubweb
- removed legacy GeoSpark dep — Lorenz Buehmann / githubweb
- pom update for aksw release — Claus Stadler / githubweb
- bumped gitflow plugin version / using official one again — Claus Stadler / githubweb
- removed build sections from datalake; compilation still works — Claus Stadler / githubweb
- Update versions for release — Claus Stadler / githubweb
- Update for next development version — Claus Stadler / githubweb
- Added bundle modules to parent — Claus Stadler / githubweb
- Removed empty file — Claus Stadler / githubweb
- Commented out maven badge — Claus Stadler / githubweb
- Update README.md — GitHub / githubweb
#55 (Dec 23, 2020, 2:22:30 PM)
- Minor update on IT documentation, removed intensional exception from SansaIT again — Claus Stadler / githubweb
#54 (Dec 23, 2020, 2:00:40 PM)
- minor — Lorenz Buehmann / githubweb
#53 (Dec 23, 2020, 10:06:41 AM)
#52 (Dec 22, 2020, 3:18:24 PM)
- Added temorary intensional fail to IT to investigate network issue between spawned docker containers — Claus Stadler / githubweb
#51 (Dec 22, 2020, 1:11:13 PM)
- removed hard coded master local — Lorenz Buehmann / githubweb
- inference example cleaned — Lorenz Buehmann / githubweb
- keep commons codec for Jena — Lorenz Buehmann / githubweb
#50 (Dec 22, 2020, 1:02:02 PM)
#49 (Dec 22, 2020, 11:34:43 AM)
- Reenabled Ontop test (test works locally) — Claus Stadler / githubweb
#48 (Dec 21, 2020, 4:01:14 PM)
- Added initial readme for it module — Claus Stadler / githubweb
#47 (Dec 21, 2020, 11:02:37 AM)
- removed history server setting — Lorenz Buehmann / githubweb
#46 (Dec 21, 2020, 10:51:07 AM)
- IT check should now abort if spark submit container exits — Claus Stadler / githubweb
#45 (Dec 21, 2020, 10:07:22 AM)
- Intgration test with dockerized spark-submit working for sparqlify — Claus Stadler / githubweb
#44 (Dec 18, 2020, 2:37:52 PM)
#43 (Dec 18, 2020, 1:27:16 PM)
- fixes JDK 11 issue with test loader — Lorenz Buehmann / githubweb
#42 (Dec 18, 2020, 11:28:04 AM)
- fixes SQL generation — Lorenz Buehmann / githubweb
#41 (Dec 18, 2020, 12:33:27 AM)
- hotfix after change to new jena version to explicitly call jenaSystem in ml — carsten.draschner / githubweb
#40 (Dec 17, 2020, 11:30:43 PM)
- Fixed test cases by adding JenaSystem.init — Claus Stadler / githubweb
#39 (Dec 17, 2020, 1:53:38 PM)
- Fixed dependencies in it module — Claus Stadler / githubweb
#38 (Dec 17, 2020, 12:46:15 PM)
- Successful test to start BDE spark using testcontainers — Claus Stadler / githubweb
- Updated gitignore — Claus Stadler / githubweb
- More experiments with testcontainers and spark submit — Claus Stadler / githubweb
- dist profile on sansa-stack-spark now yields correct filename for jar with dependencies. — Claus Stadler / githubweb
- first working sparklify deployment within test containers — Claus Stadler / githubweb
- Improved integration test; sparqlify server now actually tested automatically — Claus Stadler / githubweb
- Removed some println — Claus Stadler / githubweb
- More work on integration testing of ontop and sparklify — Claus Stadler / githubweb
- Sparqlify integration test working with improved boilerplate — Claus Stadler / githubweb
- Added jena init to tests — Claus Stadler / githubweb
- Another JenaSystem.init — Claus Stadler / githubweb
- Integration test now only depends on spark-core and jena (the rest comes from the jar bundle) — Claus Stadler / githubweb
#37 (Dec 17, 2020, 12:27:20 PM)
- registrator switched — Lorenz Buehmann / githubweb
#36 (Dec 16, 2020, 2:27:06 PM)
- remove redundant registrator — Lorenz Buehmann / githubweb
- allow for negative years — Lorenz Buehmann / githubweb
- fix registrator — Lorenz Buehmann / githubweb
#35 (Dec 15, 2020, 9:14:01 AM)
- Update README.md — GitHub / githubweb
- some cleanup — Lorenz Buehmann / githubweb
#34 (Dec 12, 2020, 10:54:22 PM)
#33 (Dec 11, 2020, 10:56:04 AM)
- removed deps from example POM — Lorenz Buehmann / githubweb
#32 (Dec 11, 2020, 10:49:50 AM)
- SBT cleaned — Lorenz Buehmann / githubweb
- docker tests — Lorenz Buehmann / githubweb
- examples POM cleaned — Lorenz Buehmann / githubweb
#31 (Dec 9, 2020, 3:05:44 PM)
- Fixed wrong implementation of "EntitiesMentioned" function. — f.bakhshandegan / githubweb
#30 (Dec 8, 2020, 5:30:03 PM)
- added more tests for RDFStatsTests and some related functions to stats object — f.bakhshandegan / githubweb
#29 (Dec 8, 2020, 4:25:24 PM)
#28 (Dec 8, 2020, 4:18:33 PM)
#27 (Dec 8, 2020, 2:59:02 PM)
- hotfix, udf change of Spark three does not allow typed return value. this lead to error in running the related test class, so simply removed return type. — carsten.draschner / githubweb
#26 (Dec 8, 2020, 1:55:28 PM)
#25 (Dec 8, 2020, 1:52:22 PM)
- Spark 3 changes — Lorenz Buehmann / githubweb
- Spark 3 adaptions — Lorenz Buehmann / githubweb
- Datalake connectors for Spark 3 — Lorenz Buehmann / githubweb
- Ontop bumped — Lorenz Buehmann / githubweb
- Spark 3 changes — Lorenz Buehmann / githubweb
- Spark 3 adaptions — Lorenz Buehmann / githubweb
- Datalake connectors for Spark 3 — Lorenz Buehmann / githubweb
- Ontop bumped — Lorenz Buehmann / githubweb
- POM fix — Lorenz Buehmann / githubweb
- deps — Lorenz Buehmann / githubweb
#24 (Dec 8, 2020, 1:43:28 PM)
- remove scala annotations for code to circumvent undesired code block parsing in markdown — carsten.draschner / githubweb
- git ignore ds store as artifact from mac devices — carsten.draschner / githubweb
- remove unused/unknown dependencies for cleanup of poms — carsten.draschner / githubweb
- in readme references — carsten.draschner / githubweb
- only one hash — carsten.draschner / githubweb
- change link — carsten.draschner / githubweb
- update and clean links — carsten.draschner / githubweb
- edid header to make link reference possible — carsten.draschner / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- Update README.md — GitHub / githubweb
- link subsections in head bulletpoint list — GitHub / githubweb
- update unclear sentence — GitHub / githubweb
- remove scala annotations for code to circumvent undesired code block parsing in markdown — Lorenz Buehmann / githubweb
- git ignore ds store as artifact from mac devices — Lorenz Buehmann / githubweb
- remove unused/unknown dependencies for cleanup of poms — Lorenz Buehmann / githubweb
- in readme references — Lorenz Buehmann / githubweb
- only one hash — Lorenz Buehmann / githubweb
- change link — Lorenz Buehmann / githubweb
- update and clean links — Lorenz Buehmann / githubweb
- edid header to make link reference possible — Lorenz Buehmann / githubweb
- Update README.md — Lorenz Buehmann / githubweb
- Update README.md — Lorenz Buehmann / githubweb
- Update README.md — Lorenz Buehmann / githubweb
- link subsections in head bulletpoint list — Lorenz Buehmann / githubweb
- update unclear sentence — Lorenz Buehmann / githubweb
#23 (Dec 8, 2020, 1:27:01 PM)
- Add Badges and How to Contribute sections on README file — Gezim Sejdiu / githubweb
- Make mvn_install_stack_*.sh executable scripts — Gezim Sejdiu / githubweb
#22 (Dec 8, 2020, 1:26:05 PM)
- SBT cleanup — Lorenz Buehmann / githubweb
#21 (Dec 8, 2020, 12:48:09 PM)
- remove unused deps — Lorenz Buehmann / githubweb
- typos — Lorenz Buehmann / githubweb
#20 (Dec 8, 2020, 12:33:23 PM)
- owl api versions fix — Lorenz Buehmann / githubweb
#19 (Dec 8, 2020, 12:13:23 PM)
- removed unused dependencies — Lorenz Buehmann / githubweb
#18 (Dec 8, 2020, 11:34:41 AM)
- removed SBT file — Lorenz Buehmann / githubweb
#17 (Dec 8, 2020, 9:19:31 AM)
#16 (Dec 7, 2020, 1:33:54 PM)
- fixed some repo issues — Lorenz Buehmann / githubweb
- SBT draft — Lorenz Buehmann / githubweb
- remove SBT file — Lorenz Buehmann / githubweb
#15 (Dec 7, 2020, 1:17:53 PM)
- fix bug in resource loading — carsten.draschner / githubweb
- ignore log files — carsten.draschner / githubweb
- delete tests which of modules planned to be deleted from release 0.8 — carsten.draschner / githubweb
- delete tests whressources ich of modules planned to be deleted from release 0.8 — carsten.draschner / githubweb
- delete classes and folders of modules planned to be deleted from release 0.8 — carsten.draschner / githubweb
- sansa ml examples flink delete not maintained modules — carsten.draschner / githubweb
- sansa ml examples spark delete not maintained modules — carsten.draschner / githubweb
- sansa ml examples spark delete files for not maintained modules — carsten.draschner / githubweb
- sansa ml examples spark delete files for not maintained modules — carsten.draschner / githubweb
- sansa ml common delete files for not maintained modules — carsten.draschner / githubweb
- sansa ml flink delete classes for not maintained modules — carsten.draschner / githubweb
- new README.md for sansa ml — carsten.draschner / githubweb
#14 (Dec 6, 2020, 5:06:24 PM)
- First Approach of automatic feature extracting sparql creation based on python implementation, working on small dataset. now needs refactoring in terms vor camelcase, move of fractions to seperated functions and make it callable and not only calling in main — carsten.draschner / githubweb
- fixed bug in for loop and filtering bug also removed some print and show calls which were there for debug purposes — carsten.draschner / githubweb
- removed further prints and especcially refactored out the row to query line into a seperate method — carsten.draschner / githubweb
- refactored to camel case — carsten.draschner / githubweb
- removed commented lines and nade print stements with f string more clean and oneliner — carsten.draschner / githubweb
- changed call behavior to use config instead of hard coded setup. also provided small sample files in ressources — carsten.draschner / githubweb
- also added master as configurable parameter, maybe needed for standalone on sparkservers — carsten.draschner / githubweb
- writeout sparql into file given by config file, and renaaming inputfile name var — carsten.draschner / githubweb
- added scala doc strings — carsten.draschner / githubweb
- added first scala test checking if desired projecition variable have been gathered — carsten.draschner / githubweb
- removed not necessary lines and added minor docstring — carsten.draschner / githubweb
- created artificial number random walk behavior opportunity — carsten.draschner / githubweb
- handling of rdf lang by file ending — carsten.draschner / githubweb
- fix bug in split by dot — carsten.draschner / githubweb
- changed main class for rdf2feature — carsten.draschner / githubweb
- added spark 3 profile — carsten.draschner / githubweb
- rename spark appname to rdf2feature — carsten.draschner / githubweb
- fix suggested by lorenz to fix guave problem over insert filter — carsten.draschner / githubweb
- use config for outputpath — carsten.draschner / githubweb
- cache read in df — carsten.draschner / githubweb
- tab code block — carsten.draschner / githubweb
- show intermediate used and processed dataframes — carsten.draschner / githubweb
- added intermediate step to tranverfer list of node for seeds to list of strings for seeds — carsten.draschner / githubweb
- replaced sparqlquery usage by desired resulting seeds as string to test if this can run on server. and it does! — carsten.draschner / githubweb
- using sparql with one more spark config — carsten.draschner / githubweb
- make hard coded seeds usable over config — carsten.draschner / githubweb
- duplicate profile removed — Lorenz Buehmann / githubweb
- ignore ds store coming from mac os devices — carsten.draschner / githubweb
- use sparqlQuery for gaining seeds instead of using if from hardcoded list — carsten.draschner / githubweb
- a branch with Ontop shaded Guava dependency — Lorenz Buehmann / githubweb
- OWL API issues tests — Lorenz Buehmann / githubweb
- serialization tests — Lorenz Buehmann / githubweb
- extended test — Lorenz Buehmann / githubweb
- Kryo registrator for Ontop — Lorenz Buehmann / githubweb
- Use Ontop Kryo registrator — Lorenz Buehmann / githubweb
- simplified RDF lang detection — Lorenz Buehmann / githubweb
- partitioner log — Lorenz Buehmann / githubweb
- remove usage of hard coded seeds, commented not needed prints and shows and solved minor bug with cutoff seeds, removed artifact from suboptimal spark master setting — carsten.draschner / githubweb
- fixed bug with seed cutoff and provide opportunity to sort seeds by outgoing links — carsten.draschner / githubweb
- bug fixed in the seed generator sparql — f.bakhshandegan / githubweb
- deleted extra log lines and fixed some typos — f.bakhshandegan / githubweb
#13 (Dec 3, 2020, 1:37:35 PM)
- Updates OWL API version and makes required changes — Patrick Westphal / githubweb
- Fixed overridden ontop version — Patrick Westphal / githubweb
- Scala code simplifications — Lorenz Buehmann / githubweb
#12 (Dec 3, 2020, 8:55:09 AM)
#11 (Dec 1, 2020, 12:39:02 PM)
- a branch with Ontop shaded Guava dependency — Lorenz Buehmann / githubweb
- serialization tests — Lorenz Buehmann / githubweb
- extended test — Lorenz Buehmann / githubweb
- Kryo registrator for Ontop — Lorenz Buehmann / githubweb
- OWLOntology Kryo serialization — Lorenz Buehmann / githubweb
#10 (Nov 30, 2020, 12:05:21 PM)
- misc — Lorenz Buehmann / githubweb
- remove deprecation — Lorenz Buehmann / githubweb
#9 (Nov 27, 2020, 11:26:17 AM)
#8 (Nov 27, 2020, 11:21:17 AM)
- mvn install from source scripts — Lorenz Buehmann / githubweb
#7 (Nov 27, 2020, 10:38:19 AM)
- aligned artifact names — Lorenz Buehmann / githubweb