Research

More than 300 models and algorithms have been developed by us. Discover some that we offer as open source.

TAMANDARÉ-0

This model, a precursor to all our research, is the result of over 20 years of development by José Damico, even before he founded SciCrop. Its aim was to create the foundational groundwork by establishing an information ontology architecture based on internet data. With it, the concept of the Semantic Web was made possible, where network data transforms into contextualized information. This process enabled the evolution of using open data for the creation and training of models.

Repository: https://scholar.google.com.br/scholar?oi=bibs&cluster=11885297119059764586&btnI=1&hl=pt-BR

TAMANDARÉ-1

Evolution of our generic model for agribusiness. We built a specific ontology for agribusiness, allowing the models to contextualize the knowledge areas of this sector. The training of the models began to understand terms and concepts of agribusiness, bringing greater accuracy in responses related to this topic.

Repository: N/A

CANA-1

Our model trained for the sugarcane crop. Specific concepts about mills, sugarcane production, sugar, and derivatives were trained in a smaller model to allow for more precise answers within this context. We utilized our LLM research with special use of Tamandaré, which already incorporated an ontology for agribusiness, and combined it with the open and much more mature Llama model.

Repository: https://huggingface.co/infinitestack/tinyllama-sugarcane

SAM2VEC

Application of Segment Anything Model for converting fields and plots into vectors.

Repository: https://github.com/Scicrop/sam2vec

JAVA SENTENCE-BERT EMBEDDING & RAG ENGINE

This project demonstrates how to integrate modern AI models with legacy Java systems using ONNX. Although most AI development today occurs in Python, many companies still heavily rely on Java ecosystems. This solution bridges this gap, enabling the seamless generation of embeddings and the retrieval of documents using popular transformer models.

Repository: https://github.com/Scicrop/javaSentenceBertEmbedding

CSV2PARQUET

Csv2Parquet is a Java-based library designed to simplify the conversion of CSV files to the Parquet format, dynamically generate Avro schemas, and perform comprehensive Parquet file analysis. This tool is optimized for performance and scalability, making it ideal for processing large datasets.

Repository: https://github.com/Scicrop/csv2parquet

CANOPY HEIGHT MODEL

We have developed a model to estimate canopy top height anywhere on Earth. The model estimates canopy top height for each pixel of Sentinel-2 imagery and was trained using sparse GEDI LIDAR data as reference. In this fork, we have fixed some minor bugs and added some automation for canopy estimation in Brazil’s biomes. Now you can choose the AOI (Area of Interest) of the location where you want to predict canopy height. We have also added GPU parallelization support for inference.

Repository: https://github.com/Scicrop/brazil-canopy-height-model