Started 2 yr 8 mo ago
Took 15 min on master

Success Build #398 (Oct 7, 2020, 2:20:34 PM)

Changes
  1. folder structure, minimal worksheet and object, nt file in ressource (commit: e4c7714) (details / githubweb)
  2. copied rdf reader to see if the more complex techstack works, and it does after changing project to jdk 8 (commit: 6c2cccc) (details / githubweb)
  3. Outsource readin of rdf file into own function from initial run method. make function verbose (commit: 56993cb) (details / githubweb)
  4. naive first methods for semantic similarity estimation, not runable: need to be corrected (commit: b903364) (details / githubweb)
  5. some code cleanup and getting at least random evaluation running to see if workflow works (commit: 7fbfe30) (details / githubweb)
  6. start triple transformation such that we will get dataformat which fits spark implemented min hash for jaccard sim (commit: 38136e0) (details / githubweb)
  7. start collecting features, tests with some print lines to ensure code methods do the right stuff (commit: 20d632c) (details / githubweb)
  8. collect all features to later transfer them to indexes, this works and is tested by test calls and print lines this correlates to step 4 (commit: 4707589) (details / githubweb)
  9. created feature map for features to int as step five in minhash pipeline (commit: 542d084) (details / githubweb)
  10. val for total number features as step six, needed for feature representation (commit: 1585843) (details / githubweb)
  11. baseline tryout scala script for minhash feature representation transformation process. needed to see how scala functions and data types can be used. works on strings instead of nodes with uri. many main ideas are working (commit: 4f4a39c) (details / githubweb)
  12. transformation of rdf data to needed min hash data is done, some example code snippets from spark minHash run through. self written Vectorization used (commit: f1f8785) (details / githubweb)
  13. dense scala functions for naive transformation of triple representation to feature map (commit: c309471) (details / githubweb)
  14. new sample and started with more dense code for data transformation (commit: c231b2e) (details / githubweb)
  15. created idea for nodeIndexer class for feature transformation (commit: 1c7bc95) (details / githubweb)
  16. created node indexer as class to get from node to int and vie versa. tested in minhash tryout. structure inspired by spark vectorizers (commit: c178237) (details / githubweb)
  17. use new nodeindexer class to transform key to int (commit: 1bdec41) (details / githubweb)
  18. call and show result of node indexer (commit: 5f9a383) (details / githubweb)
  19. remove test point class (commit: 95e206b) (details / githubweb)
  20. main idea for a class to get from triples RDD[Triple] a Map[Node, Iterable[Seq[Node]]] which curresponds to a map from uri to features. currently a problem in transform because of non seriasable but workaround is available for quick sessions (commit: 6d83709) (details / githubweb)
  21. call node featoure factory in a working mode (commit: 3a0a63f) (details / githubweb)
  22. A complete pipeline working on text basis inspired by databricks tutorial. (commit: 602f610) (details / githubweb)
  23. more comments and more verbose, also df show without truncinate (commit: 07543d1) (details / githubweb)
  24. created small sample nt file with movie idea for documentation and having very short uris (commit: 9e1b0ea) (details / githubweb)
  25. Pipeline now working more verbose and call are more clean defined, stack is working (commit: 4ebae78) (details / githubweb)
  26. some new imports, new read in, experiment results as triple output with write out. export strangly as folder (commit: 1b04138) (details / githubweb)
  27. added all feature creation modes, added comments to explain some thoughts of development, changed save procedure, changed strings for final exported nt file (commit: c187350) (details / githubweb)
  28. added outputs as comments like in stackoverflow and added better usage of sparse vector reusage in approxNearNeigh (commit: 0a755aa) (details / githubweb)
  29. added comments for corresponding command line outputs. especially the dataframe repesentations so code explains more whats happening without need to run it (commit: bea5497) (details / githubweb)
  30. implemented in pipeline a jaccard similarity calculation (commit: 90c125e) (details / githubweb)
  31. added some more comments (commit: 4798645) (details / githubweb)
  32. first idea for a module in a pipeline to generate from rdf rdd triples a df (commit: 17dafa8) (details / githubweb)
  33. some new trz outs and comments but main idea is in over text pipeline (commit: 3612031) (details / githubweb)
  34. future idea for class RDF minhash (commit: 857ae92) (details / githubweb)
  35. class that could become part of pipelined semantic similaritz estimation (commit: 4050c3b) (details / githubweb)
  36. comment because strange kind of test (commit: 953705e) (details / githubweb)
  37. idea to create DF (commit: 68e0236) (details / githubweb)
  38. we do not need this anymore. we wont create a central class but multiple. each for one approach (commit: 331a673) (details / githubweb)
  39. we might create a node indixer in future but currently we reuse the nlp stack inspired pipeline (commit: 906fcbb) (details / githubweb)
  40. in general this is now an old approach because we switched to nlp based pipeline (commit: dff713b) (details / githubweb)
  41. added some input descriptions and added rodriguezegenhofer, batet and tversky but some questions to these formulas stored in todos (commit: 34096bc) (details / githubweb)
  42. new modular feature extractor for similarity estations. tested and working! (commit: c8d96af) (details / githubweb)
  43. modular pipeline for minhash. spark session start, file readin and feature extraction added and working (commit: 0db5b8c) (details / githubweb)
  44. ideas for further approaches siliar to jaccard noted (commit: 9644606) (details / githubweb)
  45. integrated count Vectorizer and made all important parameters a block and not hardcoded in constructer or method calls if they should be avaulable later in cmd call (commit: bc731ec) (details / githubweb)
  46. added minhash with reformatting columns for easier usability and clearer reading. tested and working (commit: 373ebd2) (details / githubweb)
  47. older node based approach. deleted because will be replaced by RDF_Feature_Extractor (commit: fea7b85) (details / githubweb)
  48. older alternative feature extraction approach. deleted because will be replaced by RDF_Feature_Extractor (commit: 296dfbd) (details / githubweb)
  49. old dummy class. not used anymore (commit: a6fcbac) (details / githubweb)
  50. module for metagraph factory. the central inforamtion for an experiment got created now for each pair are next up (commit: 625e080) (details / githubweb)
  51. replaced this tryout workflow bz over text pipeline and clean version will be in run minhash (commit: 51c9b74) (details / githubweb)
  52. rearranged parameter, removed stringsfrom central uri of experiment, and fixed bug RDD creation. now tested and working (commit: 756615c) (details / githubweb)
  53. added line for metagraph creeation. working and tested but not implemented the triples for each pair od elements with its similarity value (commit: cff6273) (details / githubweb)
  54. added creation of triples also for small comparison elements, tested and done (commit: 9c89721) (details / githubweb)
  55. store rdf representation of minhash similarity assesment. tested and working (commit: a71f4f0) (details / githubweb)
  56. placeholder which is now replaced by minhash (commit: 20c0000) (details / githubweb)
  57. added datetime to outputpath so no conflicting output paths should occur (commit: 07990dc) (details / githubweb)
  58. noted todo that relation might be changed to something different in future (commit: 4cef2ee) (details / githubweb)
  59. wrote first complete approach for a jaccard modular model corresponding to minhash from spark to make pipeline interchangeable (commit: 9baafb5) (details / githubweb)
  60. fixed bug that lit cannot hadle Vector but typedLit can. also wrong df was handed over because withColumn works not inplace. now everthing is tested and working for jaccard (commit: 325f3db) (details / githubweb)
  61. created complete running pipeline for jaccard with minor changes compared to Minhash. only changes needed where in some parameters, especially the minhash vs kaccard parameters and the call of jaccardmodel vs minhashmodel. tested and running. (commit: 5385cd3) (details / githubweb)
  62. added missing parameter for number hash tables and changed name to more generic name so later superclass can easier handle differences in different simialrity estimators (commit: c1fb902) (details / githubweb)
  63. remove outcommented parts from minhash (commit: d1d39fa) (details / githubweb)
  64. created very generic superclass for several similarity estioations so reusablity is enlarged. tversky uses this already. tested and working! (commit: 5d18240) (details / githubweb)
  65. tversky model implemented as current best example of reusing code for several Similarity estions. tested and working over Tversky in run (commit: ba9be76) (details / githubweb)
  66. run of tversky with new reusability. tested and working. next up making this pipeline also more reusable in code perspective and reusing generic code (commit: d5df096) (details / githubweb)
  67. ideas for making pipeline more generic (commit: 59e1c98) (details / githubweb)
  68. added keep column option to switch between behavior known from spark min hash and behavior which is needed for metagraph creation. also changed esrimation udf to protected (commit: ca9c3eb) (details / githubweb)
  69. changed complete jaccard to reusing generic similarity estimator code. tested and working. added keep column option to switch between behavior known from spark min hash and behavior which is needed for metagraph creation. also changed estimation udf to protected (commit: 3947af8) (details / githubweb)
  70. changed jaccard pipeline to more generic one like tversky. tested and working (commit: 5631787) (details / githubweb)
  71. chanhed all fitting elements to protected. so not callable from outside. also build in keep column for metagraph vs minhash behavior. also added options for filtering and ordering results as intended (commit: c96afaa) (details / githubweb)
  72. createt batet distance. everthing what is needed is adjusted and now optimized compared to minhashOverpipeline cecause log is now inside udf. alternative still in comment (commit: ba1b6fb) (details / githubweb)
  73. created braun blaquet model. pretty similar to jaccard (commit: 1706977) (details / githubweb)
  74. removed non necessary code which was left in comment (commit: 60b8478) (details / githubweb)
  75. created chiai model can be used as jaccardModel etc (commit: 6ac10a5) (details / githubweb)
  76. created ROdriguez egenhofer on basis of tversky... can be called similar but now you have only alpha but not betha (commit: 560916f) (details / githubweb)
  77. created simpson model. similar to braun blanquet. only subsumer different (commit: 63430bd) (details / githubweb)
  78. tversky pipeline now with keep column option used for now working storage also for nn estiamtion to meta rdf graph (commit: 75ccc0d) (details / githubweb)
  79. added typed to parameters in the beginning and changed approxNearestNeigbors st. it has another column so metagraph creation can handle it (commit: 97964f0) (details / githubweb)
  80. added types to parameters (commit: bb325ca) (details / githubweb)
  81. removed betha which was left from tversky (commit: 60249af) (details / githubweb)
  82. rename generic superclass so it is alligned with the _Model name domain (commit: 4b75c29) (details / githubweb)
  83. new error handling and new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 336aa1d) (details / githubweb)
  84. new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 14065e1) (details / githubweb)
  85. new set opportunitiy. now over this so it is more alligned with the mllib behavior and different error handling (commit: fe41671) (details / githubweb)
  86. new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 34f27b9) (details / githubweb)
  87. different call of set parameters and also some more output for further discussions (commit: bd9d7a6) (details / githubweb)
  88. different call of set parameters (commit: d78fbf5) (details / githubweb)
  89. different call of set parameters (commit: 62c2c93) (details / githubweb)
  90. added dice model as close model to jaccard (commit: 7799390) (details / githubweb)
  91. added checks for nan bug. also inserted default alpha and beta so devide by zeo can not occur by default. (commit: 6a0e4df) (details / githubweb)
  92. print lines of all major parts so run can be inspected more easily (commit: f8f5dba) (details / githubweb)
  93. call of column name checks to look if default values fit if they are not net (commit: cf2447b) (details / githubweb)
  94. object type specificaition (commit: efc8d73) (details / githubweb)
  95. added column check method (commit: 5ba4483) (details / githubweb)
  96. Feature extractor now working on basis of dataframe which is read in by spark.read.rdf (commit: 3c3493c) (details / githubweb)
  97. rename to model (commit: 439d730) (details / githubweb)
  98. major pipeline calling for all similarity experiments. started to optmize for sansa server execution. creates csv with important information (commit: a2b7e0f) (details / githubweb)
  99. optimize featureExtractor for parallel computing (commit: 770d75b) (details / githubweb)
  100. optimization to run on spark server over cmd line tools (commit: 7e30bcb) (details / githubweb)
  101. mainly allow hyperparameter evaluation (commit: 1359951) (details / githubweb)
  102. clearer structure, more aligned to scala camelCase and new handling of default values (commit: f53a3a8) (details / githubweb)
  103. parameteras are setable over config file. now with cmd calling only argument needed to specify path to config (commit: 2a1d3b4) (details / githubweb)
  104. not needed anymore becausse the implementation in sansa which was reused here does not allign with paper (commit: b6c5b8e) (details / githubweb)
  105. some new try outs (commit: 4d079d3) (details / githubweb)
  106. fixed bugs, alligned scala style and set default value handling (commit: 9a107f9) (details / githubweb)
  107. started showcasing object for minimal calls needed for example for simpleML (commit: c1ab861) (details / githubweb)
  108. first placeholder class for possible node indexer which can work as alternative to read in rdf into spark as df (commit: 6709bdb) (details / githubweb)
  109. chnaged to use shade dependecy which creates huge jar including all needed packages. needed for server export so all imports run as intended (commit: 8257357) (details / githubweb)
  110. changed default value of features column (commit: b793bc5) (details / githubweb)
  111. scala camelCase style started (commit: 038305b) (details / githubweb)
  112. allign model software code design (commit: 2c7822e) (details / githubweb)
  113. collected all currently implemented similarity estiomations as minimal calls (commit: 2c041b6) (details / githubweb)
  114. added alpha and beta (commit: 15aded4) (details / githubweb)
  115. camelcase for alpha and beta (commit: 0a39073) (details / githubweb)
  116. typo in beta (commit: 0ace453) (details / githubweb)
  117. changed main class to experiment call (commit: 0f4c19a) (details / githubweb)
  118. config resolver created by farshad to handle dynamically the file path of local and hdfs paths (commit: a919c19) (details / githubweb)
  119. move outputpath to config, readin chnages by usage of config resolver so hdfs and local is usable, currently not working (commit: 6c60d80) (details / githubweb)
  120. created object to test server usage with minimal complexity (commit: 6a8c228) (details / githubweb)
  121. working evaluation pipeline (commit: 0519fb8) (details / githubweb)
  122. file lister implemented by farshad so both local and hdfs files are listable (commit: 07147f7) (details / githubweb)
  123. Fixed wrong method calls in similarity examples (commit: 5ffa145) (details / githubweb)
  124. Made FeatureExtractorModel inherit frm Transformer (commit: eebee24) (details / githubweb)
  125. change to CamelCase (commit: 1070594) (details / githubweb)
  126. added number of runs as parameter for more stable processing times calculation (commit: 7e6c7f9) (details / githubweb)
  127. minor try outs (commit: 904f594) (details / githubweb)
  128. changes for camelCase (commit: f2ec52b) (details / githubweb)
  129. file was only test class (commit: 15dad37) (details / githubweb)
  130. this was first attempt but is now resolved in more cleaner substructures (commit: 000a9f4) (details / githubweb)
  131. better place now for certain try outs (commit: ca4d9f5) (details / githubweb)
  132. tversky added and key retrieval aligned. (commit: 2027a67) (details / githubweb)
  133. bug fixing if both feature vectors are empty (commit: 86896bf) (details / githubweb)
  134. if feature vector union emtpty return 0 (commit: 6dea37e) (details / githubweb)
  135. key generation once in the beginning and length of dataframes in prints (commit: 09bab6c) (details / githubweb)
  136. added further examples in minimal calls like subset estimation or stacked approaches (commit: a1ad081) (details / githubweb)
  137. tests with filter option for only considering movies (commit: 0558c53) (details / githubweb)
  138. added stacked option and give the option to stop at specified points in pipeline to try out parts of pipeline (commit: 41f8653) (details / githubweb)
  139. some new autoimports (commit: 04d048e) (details / githubweb)
  140. remove unused comment (commit: bc089d4) (details / githubweb)
  141. autoimports reorganize and limit dataframe show to 10 rows to test if this ressults in out of memory (commit: d2d9ef6) (details / githubweb)
  142. removed some command line print typos and rearranged minor things (commit: aecc3a9) (details / githubweb)
  143. not needed, merged into FeatureExtractorModel. we now have overloaded transform for Dataset/Dataframe and for RDD Triple Node (commit: 831bb80) (details / githubweb)
  144. added comments aligned to scala doc and limit dataframe output in cmd print (commit: bdfa143) (details / githubweb)
  145. created docstrings aligned to scala doc and merged overloaded ttanform to make use of RDD Triple Node based read in possible (commit: b0b715d) (details / githubweb)
  146. created scala doc aligned docstrings (commit: dcd8248) (details / githubweb)
  147. scala docstrings to describe shortlz what this class is about (commit: 04eaa21) (details / githubweb)
  148. refactor to camelCase (commit: bc3be7d) (details / githubweb)
  149. changed the name for one relation (commit: d3a6f7c) (details / githubweb)
  150. refactor name (commit: 1b10415) (details / githubweb)
  151. refactor name (commit: 77739ee) (details / githubweb)
  152. camelcase refactor (commit: ff273bd) (details / githubweb)
  153. camelcase refactor (commit: f8621ec) (details / githubweb)
  154. refactor camel case and switch to dataframe from rdd read in (commit: 7f6e75c) (details / githubweb)
  155. move local[*] to conf and not in code, quick and dirty filtering for hands on try out. started with logging instead of printing (commit: 77c8060) (details / githubweb)
  156. seperated filter part for relevant uris (commit: 9de95e2) (details / githubweb)
  157. prepared first ReadMe.md (commit: 5ec68f9) (details / githubweb)
  158. remove of not needed code (commit: 21e59a5) (details / githubweb)
  159. renamed package to more suited and camelcase name (commit: b8bdeea) (details / githubweb)
  160. started with unit tests for semantic simialrity estimation (commit: 2593ad9) (details / githubweb)
  161. moved spark setup inside each run so overhead is present in each run (commit: a1b3f16) (details / githubweb)
  162. optimized layout (commit: c7c434f) (details / githubweb)
  163. tests changed from show to count because it is faster (commit: 4d96d04) (details / githubweb)
  164. added information s.t. metagraph creator can easily fetch that all are similarity estimators (commit: e428ff4) (details / githubweb)
  165. changed default column labeling from underscore to camel case (commit: 5f87664) (details / githubweb)
  166. created minhash on basis of apache spark minhash lsh and added behavior as in other generic similarity estimators to allow better follow up handling of comulmns for e.g. metagraph creation (commit: d0cb5fd) (details / githubweb)
  167. created novel method for metagraph creation using better literal creation etc and clearer parameters (commit: 2e3a137) (details / githubweb)
  168. cleaned shape and tryouts for other datasets (commit: f5e6984) (details / githubweb)
  169. minimal calls now with novel minHash and novel metagraph creation (commit: ad65179) (details / githubweb)
  170. new default value for column labeling (commit: 6a9de1c) (details / githubweb)
  171. removed because not needed anymore. we created a new file for full pipeline with all estimators instead having dedicated one for each model (commit: 34d6453) (details / githubweb)
  172. new approach to have one concise pipeline for easy usage instead of needed construction of each pipeline (commit: ddf28c1) (details / githubweb)
  173. nin hash depricated and now moved to similarityPipeline (commit: 87e7b6b) (details / githubweb)
  174. changes for novel metagraph creation (commit: bf3c6d6) (details / githubweb)
  175. replaces show by count to make tests faster (commit: f5958f2) (details / githubweb)
  176. added comments and parameters, pipeline is working (commit: f328796) (details / githubweb)
  177. added another alternative to start main (commit: c9588a5) (details / githubweb)
  178. jaccard now merge in similarity Pipeline (commit: 9a025f1) (details / githubweb)
  179. keeps track of experiment number when unning over extensive hyperparamter grid and printing this in cmd line (commit: 8718283) (details / githubweb)
  180. added num hash tables option (commit: 27a2e54) (details / githubweb)
  181. extended tests and give each pupeline module its own test (commit: 7dd3931) (details / githubweb)
  182. scala docs retest (commit: f084a7b) (details / githubweb)
  183. Create index.md (commit: 76c9cc0) (details / githubweb)
  184. Create index.html (commit: bc630af) (details / githubweb)
  185. Update index.md (commit: 3ff4fff) (details / githubweb)
  186. Delete index.html (commit: 99eafc4) (details / githubweb)
  187. Update index.md (commit: db60062) (details / githubweb)
  188. Update index.md (commit: e18a5a7) (details / githubweb)
  189. Update index.md (commit: 644b19e) (details / githubweb)
  190. experiments still entry point (commit: 9616163) (details / githubweb)
  191. added more models to experments and fixed bug in counting total experiments (commit: caf057b) (details / githubweb)
  192. moved movie file in subfolder (commit: b487432) (details / githubweb)
  193. sample parameter setup for similarity evaluation (commit: 20275c2) (details / githubweb)
  194. Update .travis.yml (commit: b9c1099) (details / githubweb)
  195. Update .travis.yml (commit: 78e6c43) (details / githubweb)
  196. Update .travis.yml (commit: 5a5f6fb) (details / githubweb)
  197. Create main.yml (commit: e413943) (details / githubweb)

Push event to branch develop at 2:20:27 PM on Oct 7, 2020

Revision: e6040f54784ff74912485c18b964305fe11a86f4
  • origin/develop
Test Result (no failures)

    Module Builds

     ML API - Common (didn’t run)
     ML API - Common (didn’t run)
    Success ML API - Common35 sec
     ML API - Apache Flink (didn’t run)
     ML API - Apache Flink (didn’t run)
    Success ML API - Apache Flink17 sec
     ML API - Parent (didn’t run)
     ML API - Parent (didn’t run)
    Success ML API - Parent11 sec
     ML API - Apache Spark (didn’t run)
     ML API - Apache Spark (didn’t run)
    Success ML API - Apache Spark13 min
     SANSA ML Tests (didn’t run)
     SANSA ML Tests (didn’t run)
     ml-common (didn’t run)
     ML API - Apache Flink (didn’t run)
     ML API - Parent (didn’t run)
     ML API - Apache Spark (didn’t run)
     ml-tests (didn’t run)
     inference-spark (didn’t run)